What is Text Mining?
Text mining refers to the process of deriving high-quality information from text. Typical text mining tasks include text categorization,
text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity
relation modeling (i.e., learning relations between named entities). More on
Wikipedia.
I like to see text mining as a machine learning over text. This is why you should know something about ML before taping into the text mining field.
Basic Mathematics for Machine Learning
If you want to understand text mining you should know some mathematics first. You need to not be afraid of mathematical notation and proofs. In the minimum you should know something about Linear Algebra
(
great notes) and probability (
notes). If you want
most of the basic mathematics you will need in form of one serie of video lectures you should take a look
here
You should also know some statistics. There are ton of easily accessible videos on
Khan Academy which can teach you some basics.
Machine Learning Tutorials
Machine learning is taught at a lot of universities and you can find a lot of materials on the course websites.
Most basic methods are covered by Andrew More:
here, another good reference is
course by Raymond Mooney.
If you need something more advanced you should check these:
chapters from Machine Learning book,
machine learning theory and check out
handout notes from Standford cours.
There is also whole term of Standford Machine Learning Course available as a video at the
Youtube.
Text Mining Lectures
I found great video lectures of text mining. Two most basic and must see are about
text classification and
text information extraction. You can find videos from whole
Autumn school of machine learning over text and images too. It goes from basic concepts to some advanced stuff.
Books on Text Mining and Machine Learning
Really great reading list is compiled on
Measuring Measures blog,
but it is actually very theoretically oriented and not so much text mining centered. Another good list is at
Hacker News
Speaking of books you should take a look on:
There are numerous of great books available free online. I think that Introduction to Information Retrieval is great and you can
download it for free or buy at Amazon or elsewhere.
About this site
My name is David Filip (
Twitter) and I made this site as a compilation of my bookmarks. If you want to contact you can write to
davidfilip@gmail.com.