Advanced Topics of Text Mining (IATM)
The lecture introduces the fundamentals as well as selected advanced topics from the domain of text mining.

fundamentals of data modeling and preprocessing, in particular for textual data

statistical and algorithmic foundations of the analysis methods

basics of computer linguistics and natural language processing for processing textual data (e.g. morphological analysis, partofspeech tagging)

selected and current focus topics such as classification, cluster analysis, sequential pattern mining, association rule mining, topic modeling with emphasis on the application to textual data
Summer Term 2017 Topics
The focus for this term will be on algorithms for text clustering and topic modeling.
Note: We will only touch deep learning, machine translation, and NLP as side topics this year. To discuss "advanced topics" in the given amount of time we need to specialize. Deep approaches are a candidate for focus in the next iteration of this class.
While the lecture will briefly introduce the fundamentals of text modeling for the algorithms, students that intend to attend the class are encouraged to refresh their knowledge of:
 Linear algebra, in particular vector spaces and matrix operations
 Data structures for retrieval, in particular search trees
 Data mining fundamentals such as kmeans clustering and optimization techniques such as EM
Location:
The lecture is over.Recommended prerequisites:
The following prerequisites are strongly recommended, but not formally required: Algorithmen und Datenstrukturen (IAD)
 Knowledge Discovery in Databases (IKDD)
Literature:
General introductory textbooks (preliminary list):
 J. H. Friedman, R. Tibshirani, and T. Hastie: The Elements of Statistical Learning, 2001.
 C. D. Manning, P. Raghavan, and H. Schütze: Introduction to Information Retrieval, 2008.
 J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2014.
 B. Liu: Web Data Mining. Springer, 2011.
 C. C. Aggarwal, and CX. Zhai: Mining text data. Springer, 2012.
Note that these textbooks only briefly touch the advanced topics that we want to discuss in this lecture.