What is Text Mining?

Text mining refers to the process of deriving high-quality information from text. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). More on Wikipedia.
I like to see text mining as a machine learning over text. This is why you should know something about ML before taping into the text mining field.

Basic Mathematics for Machine Learning

If you want to understand text mining you should know some mathematics first. You need to not be afraid of mathematical notation and proofs. In the minimum you should know something about Linear Algebra (great notes) and probability (notes). If you want most of the basic mathematics you will need in form of one serie of video lectures you should take a look here
You should also know some statistics. There are ton of easily accessible videos on Khan Academy which can teach you some basics.

Machine Learning Tutorials

Machine learning is taught at a lot of universities and you can find a lot of materials on the course websites.
Most basic methods are covered by Andrew More: here, another good reference is course by Raymond Mooney.
If you need something more advanced you should check these: chapters from Machine Learning book, machine learning theory and check out handout notes from Standford cours.
There is also whole term of Standford Machine Learning Course available as a video at the Youtube.

Text Mining Lectures

I found great video lectures of text mining. Two most basic and must see are about text classification and text information extraction. You can find videos from whole Autumn school of machine learning over text and images too. It goes from basic concepts to some advanced stuff.

Books on Text Mining and Machine Learning

Really great reading list is compiled on Measuring Measures blog, but it is actually very theoretically oriented and not so much text mining centered. Another good list is at Hacker News

Speaking of books you should take a look on:
There are numerous of great books available free online. I think that Introduction to Information Retrieval is great and you can download it for free or buy at Amazon or elsewhere.

About this site

My name is David Filip (Twitter) and I made this site as a compilation of my bookmarks. If you want to contact you can write to davidfilip@gmail.com.