Dr. Jannik Strötgen
After more than six years, I left the DBS group (September 30, 2015). This page will not be updated anymore.
I am now part of the Max-Planck-Institute for Informatics in the group of Prof. Gerhard Weikum. You can find my new page here.
- Office hours during semester: Monday, 1-2 pm, or by appointment
- Office: INF 348, room 12d
- Email: stroetgen(at)informatik.uni-heidelberg(dot)de
News
2015-09-30: HeidelTime 2.0 with automatically created resources for more than 200 languages; see our HeidelTime's GitHub page for details.
2015-08-03: Our "Tiwoli - today in world literature" app is now bilingual with quotes from German and English literature. It's available for iOS (thanks Thomas) and Android. Check out what's happening in world literature today, tomorrow or on any other day, or the project web site for further details: tiwoli.wannauchimmer.de
2015-07-29: EMNLP short paper accepted (with Michael Gertz) - stay tuned for further information.
2015-03-02: PhD thesis successfully defended (summa cum laude).
2014-04-03: We are editing a special issue of Information Processing and Management on "Time and Information Retrieval" (with Leon Derczynski, Ricardo Campos, and Omar Alonso).
About
I studied Computational Linguistics and Economics at the Ruprecht-Karls-University Heidelberg and received my Magister Artium in June 2009. Currently, I am working as a researcher at the Institute of Computer Science in the Database Systems Research group of Prof. Dr. Michael Gertz. In March 2015, I finished my PhD study by successfully defending my PhD thesis on "Domain-sensitive Temporal Tagging for Event-centric Information Retrieval" (summa cum laude).
Research Interests
- Information Extraction (IE)
- Temporal Tagging (HeidelTime, @ GitHub, online demo)
- Information Retrieval (IR)
- Temporal and Spatial IR and IE
- Natural Language Processing
- Text Mining
- Digital Humanities (heureCLÉA project, Jahrestage project with Tiwoli / formerly Literjahrtur)
Short Curriculum Vitae
- Since 06/2009 Research assistant at the Institute of Computer Science at Heidelberg University
- 04/2010 - 03/2015 PhD student at the Institute of Computer Science at Heidelberg University
- 09/2007 - 04/2009 & 05/2004 - 09/2006 Student worker at Fraunhofer Institute for Algorithms and Scientific Computing SCAI
- 04/2007 - 03/2008 Student tutor at the Department of Computational Linguistics, Heidelberg University
- 10/2006 - 04/2007 Internship at GlaxoSmithKline (Upper Merion, PA, USA) in the IT Research and Development department, in the Knowledge and Discovery Systems group
- 10/2001 - 06/2009 Study of Computational Linguistics and Economics at Heidelberg University
Theses
- Jannik Strötgen.
Domain-sensitive Temporal Tagging for Event-centric Information Retrieval.
PhD thesis, Institute of Computer Science, Heidelberg University, 2015.
(submitted: December 18, 2014; defended: March 2, 2015; published: March 10, 2015)
[pdf] [bibtex] [introduction (picture)] - Jannik Strötgen.
UTEMPL - Aufbau und Evaluierung einer UIMA-basierten Textmining Pipeline für biomedizinische Literatur.
Magisterarbeit, Department of Computational Linguistics, Heidelberg University, 2009.
[pdf] [bibtex]
Publications
(see acm, dblp, google scholar, and acl anthology; extended abstracts are counted separately)
2015
- Jannik Strötgen and Michael Gertz.
A Baseline Temporal Tagger for all Languages.
Accepted at: Conference on Empirical Methods in Natural Language Processing (EMNLP'15), Lisbon, Portugal, September 17-21, 2015. (short paper)
[pdf] [bibtex] - Thomas Bögel, Jannik Strötgen, and Michael Gertz.
A Hybrid Approach to Extract Temporal Signals from Narratives.
Accepted at: International Conference of the German Society for Computational Linguistics and Language Technology (GSCL'15), Duisburg-Essen, Germany, 2015. (short paper)
[pdf] [bibtex] - Leon Derczynski, Jannik Strötgen, Ricardo Campos, and Omar Alonso.
Time and Information Retrieval: Introduction to the Special Issue.
To appear in: Information Processing Management, 2015.
[pdf] [bibtex] [in press online version] - Evelyn Gius, Janina Jacke, Jan-Christoph Meister, Thomas Bögel, and Jannik Strötgen.
Beyond Pragmatics: Disciplinary Profits of Interdisciplinary Approaches.
(extended abstract for oral presentation; short paper)
DH 2015: Annual Conference of the Alliance of Digital Humanities Organizations. Sydney, Australia, June 29 - July 3, 2015.
[pdf] [bibtex] - Frank Fischer and Jannik Strötgen.
When Does German Literature Take Place? – On the Analysis of Temporal Expressions in Large Corpora.
(extended abstract for oral presentation; short paper)
DH 2015: Annual Conference of the Alliance of Digital Humanities Organizations. Sydney, Australia, June 29 - July 3, 2015.
[pdf] [bibtex] [Android App] [iOS App] - Bilel Moulahi, Jannik Strötgen, Michael Gertz, and Lynda Tamine-Lechani.
HeidelToul: A Baseline Approach for Cross-document Event Ordering.
In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval'15) (together with NAACL-HLT'15). Pages 325-329, Denver, Colorado, June 4-5, 2015.
[pdf] [bibtex] - Andreas Spitz, Jannik Strötgen, Thomas Bögel, and Michael Gertz.
Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model.
In: Proceedings of the Temporal Web Analytics Workshop (TempWeb'15) (together with WWW'15). Pages 1375-1380, Florence, Italy, May 18, 2015.
[pdf] [bibtex] - Frank Fischer and Jannik Strötgen.
Wann findet die deutsche Literatur statt? Zur Untersuchung von Zeitausdrucken in großen Korpora.
(extended abstract for oral presentation)
DHd 2015: Digital Humanities im deutschsprachigen Raum. Graz, Austria, February 23-27, 2015.
[pdf] [bibtex] [slides] [App] - Thomas Bögel, Michael Gertz, Evelyn Gius, Janina Jacke, Jan Christoph Meister, Marco Petris, and Jannik Strötgen.
Gleiche Textdaten, unterschiedliche Erkenntnisziele? Zum Potential vermeintlich widersprüchlicher Zugänge zu Textanalyse.
(extended abstract for oral presentation)
DHd 2015: Digital Humanities im deutschsprachigen Raum. Graz, Austria, February 23-27, 2015.
[pdf] [bibtex] - Thomas Bögel, Marco Petris, Jannik Strötgen, and Michael Gertz.
An End-to-End Integration of Automatic Annotations into CATMA.
(extended abstract for poster presentation)
DHd 2015: Digital Humanities im deutschsprachigen Raum. Graz, Austria, February 23-27, 2015.
[pdf] [bibtex]
2014
- Giulio Manfredi, Jannik Strötgen, Julian Zell, and Michael Gertz.
HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags.
In: Proceedings of the 4th International Workshop EVALITA-2014. Pages 39-43, Pisa, Italy, December 11, 2014.
(winner of the temporal tagging subtask)
[pdf] [bibtex] - Thomas Bögel, Jannik Strötgen, and Michael Gertz.
Computational Narratology: Extracting Tense Clusters from Narrative Texts.
In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC'14). Pages 950-955, Reykjavik, Iceland, May 21-31, 2014.
[pdf] [bibtex] - Jannik Strötgen, Thomas Bögel, Julian Zell, Ayser Armiti, Tran Van Canh, and Michael Gertz.
Extending HeidelTime for Temporal Expressions Referring to Historic Dates.
In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC'14). Pages 2390-2397, Reykjavik, Iceland, May 21-31, 2014.
[pdf] [bibtex] - Hui Li, Jannik Strötgen, Julian Zell, and Michael Gertz.
Chinese Temporal Tagging with HeidelTime.
In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL'14). Pages 133-137, Gothenburg, Sweden, April 16-30, 2014. (short paper)
[pdf] [bibtex] - Jannik Strötgen, Ayser Armiti, Tran Van Canh, Julian Zell, and Michael Gertz.
Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese.
In: ACM Transactions on Asian Language Information Processing (TALIP), 13(1), pages 1-21, 2014, ACM. - Thomas Bögel, Jannik Strötgen, and Michael Gertz.
A Flexible NLP Pipeline for Computational Narratology.
(extended abstract for poster presentation)
DHd 2014: Digital Humanities im deutschsprachigen Raum. Passau, Germany, March 25-28, 2014.
[pdf] [bibtex]
2013
- Jannik Strötgen and Michael Gertz.
Proximity2-aware Ranking for Textual, Temporal, and Geographic Queries.
In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM'13). Pages 739-744, San Francisco, CA, October 27 - November 1, 2013. (short paper)
[pdf] [bibtex] [extended version]
- Jannik Strötgen and Michael Gertz.
Multilingual and Cross-domain Temporal Tagging.
In: Language Resources and Evaluation, 47(2), pages 269-298, 2013, Springer. (DOI: 10.1007/s10579-012-9179-y; published online May 8, 2012)
[pdf] [bibtex] [local access] - Jannik Strötgen, Julian Zell, and Michael Gertz.
HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3.
In: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013) (together with *SEM 2013 and NAACL 2013). Pages 15-19, Atlanta, GA, USA, June 13-15, 2013.
(winner of temporal tagging subtask for English and Spanish)
[pdf] [bibtex] - Christian Kapp, Jannik Strötgen, and Michael Gertz.
EvenPers: Event-based Person Exploration and Correlation.
In: BTW 2013: 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web. Pages 519-522, Magdeburg, Germany, March 11-15, 2013. (demo paper)
[pdf] [bibtex]
2012
- Brita Keller, Jannik Strötgen, and Michael Gertz.
Event-centric Document Similarity for Biomedical Literature.
In: SMBM 2012: 5th International Symposium on Semantic Mining in Biomedicine. Pages 72-79, Zurich, Switzerland, September 3-4, 2012. (short paper)
[pdf] [bibtex] - Jannik Strötgen and Michael Gertz.
Event-centric Search and Exploration in Document Collections.
In: JCDL 2012: ACM/IEEE Joint Conference on Digital Libraries. Pages 223-232, Washington, DC, June 10-14, 2012.
(nominated for Best Student Paper)
- Jannik Strötgen and Michael Gertz.
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards.
In: LREC 2012: The 8th International Conference on Language Resources and Evaluation. Pages 3746-3753, Istanbul, Turkey, May 21-27, 2012. ELRA.
[pdf] [bibtex] - Jannik Strötgen, Omar Alonso, and Michael Gertz.
Identification of Top Relevant Temporal Expressions in Documents.
In: TempWeb 2012: The 2nd Temporal Web Analytics Workshop (together with WWW 2012). Pages 33-40, Lyon, France, April 17, 2012. ACM.
- Jannik Strötgen, Omar Alonso, and Michael Gertz.
Retro: Time-based Exploration of Product Reviews.
In: ECIR 2012: 34th European Conference on Information Retrieval. Pages 581-582, Barcelona, Spain, April 1-5, 2012. Springer. (demo paper)
[pdf] [bibtex]
2011
- Jannik Strötgen and Michael Gertz.
WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions.
In GSCL 2011: Conference of the German Society for Computational Linguistics and Language Technology. Pages 129-134, Hamburg, Germany, September 28-30, 2011.
[pdf] [bibtex] - Jannik Strötgen, Michael Gertz, and Conny Junghans.
An Event-centric Model for Multilingual Document Similarity.
In SIGIR '11: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Pages 953-962, Beijing, China, July 24-28, 2011. ACM.
- Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz.
Temporal Information Retrieval: Challenges and Opportunities.
In TWAW 2011: Proceedings of 1st International Temporal Web Analytics Workshop (together with WWW 2011). Pages 1-8, Hyderabad, India, March 28, 2011.
[pdf] [bibtex]
2010
- Jannik Strötgen and Michael Gertz.
TimeTrails: A System for Exploring Spatio-Temporal Information in Documents.
In VLDB 2010: Proceedings of the 36th International Conference on Very Large Data Bases. Pages 1569-1572, Singapore, September 13-18, 2010. VLDB Endowment. (demo paper)
[pdf] [bibtex] - Jannik Strötgen and Michael Gertz.
HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions.
In: SemEval-2010: Proceedings of the 5th International Workshop on Semantic Evaluation (together with ACL 2010). Pages 321-324, Uppsala, Sweden, July 15-16, 2010. ACL.
(winner of temporal tagging subtask for English)
[pdf] [bibtex] [acl anthology] - Jannik Strötgen, Michael Gertz, and Pavel Popov.
Extraction and Exploration of Spatio-Temporal Information in Documents.
In: GIR '10: Proceedings of the 6th Workshop On Geographic Information Retrieval, Zurich, Switzerland, February 18-19, 2010. ACM.
[pdf] [bibtex] [doi]
Before
- Jannik Strötgen, Juliane Fluck, and Anke Holler.
Dependenz-basierte Relationsextraktion mit der UIMA-basierten Text-Mining Pipeline UTEMPL.
In: Christian Chiarcos, Richard Eckart de Castilho, and Manfred Stede (eds.), From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference 2009. Pages: 125-136, Narr Verlag, Tübingen, Germany, 2009.[pdf] [bibtex] - Joachim Wermter, Juliane Fluck, Jannik Strötgen, Stefan Geißler, and Udo Hahn.
Recognizing Noun Phrases in Biomedical Text: An Evaluation of Lab Prototypes and Commercial Chunkers.
In: SMBM 2005: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine. Pages: 25-33, Hinxton, England, 2005.[pdf] [bibtex]
Posters
- Jannik Strötgen and Michael Gertz.
A Baseline Temporal Tagger for All Languages.
EMNLP 2015, Lisbon, Portugal. [pdf]
- Thomas Bögel, Marco Petris, Jannik Strötgen, and Michael Gertz.
An End-to-End Integration of Automatic Annotations into CATMA.
DhD 2015, Graz, Austria. [pdf]
- Giulio Manfredi, Jannik Strötgen, Julian Zell, and Michael Gertz.
HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags.
EVALITA 2014, Pisa, Italy. [pdf]
- Thomas Bögel, Jannik Strötgen, and Michael Gertz.
Computational Narratology: Extracting Tense Clusters from Narrative Texts.
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland. [pdf]
- Jannik Strötgen, Thomas Bögel, Julian Zell, Ayser Armiti, Tran Van Canh, and Michael Gertz.
Extending HeidelTime for Temporal Expressions Referring to Historic Dates.
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland. [pdf]
- Hui Li, Jannik Strötgen, Julian Zell and Michael Gertz.
Chinese Temporal Tagging with HeidelTime
EACL 2014: 14th Conference of the European Chapter of the ACL, Gothenburg, Sweden. [pdf]
- Thomas Bögel, Jannik Strötgen, and Michael Gertz.
A Flexible NLP Pipeline for Computational Narratology
DHd 2014: 1. Jahrestagung der Digital Humanities im deutschsprachigen Raum, Passau, Germany. [pdf]
- Jannik Strötgen, Thomas Bögel, and Michael Gertz.
Annotating Temporal Phenomena in Literary Text in the Context of the heureCLÉA Project.
Herrenhausen Conference on the Humanities in the Digital Age, Hannover, Germany. [pdf]
- Jannik Strötgen, Julian Zell, and Michael Gertz.
HeidelTime at TempEval-3: Tuning English and Developing Spanish Resources.
SemEval 2013: 7th International Workshop on Semantic Evaluation (with NAACL and *SEM), Atlanta, GA, USA. [pdf]
- Christian Kapp, Jannik Strötgen, and Michael Gertz.
EvenPers: Event-based Person Exploration and Correlation.
BTW 2013: Datenbanksysteme für Business, Technologie und Web, Magdeburg, Germany. [pdf]
- Jannik Strötgen and Michael Gertz.
HeidelTime: Temporal Tagging on Different Domains.
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey. [pdf]
- Jannik Strötgen, Omar Alonso and Michael Gertz.
Retro: Time-based Exploration of Product Reviews.
ECIR 2012: 34th European Conference on Information Retrieval, Barcelona, Spain. [pdf]
- Jannik Strötgen and Michael Gertz.
TimeTrails: A System for Exploring Spatio-Temporal Information in Documents.
VLDB 2010: 36th International Conference on Very Large Databases, Singapore. [pdf]
- Jannik Strötgen and Michael Gertz
HeidelTime: High Quality Extraction and Normalization of Temporal Expressions.
SemEval 2010: 5th International Workshop on Semantic Evaluation (with ACL), Uppsala, Sweden. [pdf]
- Jannik Strötgen, Verena Meyer, Roman Klinger, and Juliane Fluck.
UTEMPL - The UIMA based Text Mining Pipeline.
Text Mining Symposium 2008, Bonn, Germany. [pdf]
Talks
Temporal Tagging for Text Mining Applications.
Bosch Corporate Research, Renningen, Germany, July 28, 2015.
(invited talk, Dr. Michael Hanselmann)
Event-centric Information Retreival.
MPI, Saarbrücken, Germany, Juni 23, 2015.
(invited talk, Prof. Dr. Gerhard Weikum)
"Literjahrtur" – eine Kalender-App für Android mit Daten aus einem weltliterarischen Korpus.
DARIAH-DE Workshop "Warum nicht mal mobil? Apps in den Digital Humanities", Darmstadt, Germany, April 24, 2015.
(invited talk, together with Frank Fischer)
Multilingual and Domain-sensitive Temporal Tagging for Text Analytics.
SAP Innovation@SSI&SolMan Presentation Series, April 29, 2015.
(invited talk, Michael Laux, SAP)
"Literjahrtur" – eine Kalender-App für Android mit Daten aus einem weltliterarischen Korpus.
DARIAH-DE Workshop "Warum nicht mal mobil? Apps in den Digital Humanities", Darmstadt, Germany, April 24, 2015.
(invited talk, together with Frank Fischer)
Domain-sensitive Temporal Tagging for Event-centric Information Retrieval.
UKP Colloquial, Darmstadt, Germany, April 16, 2015. [pdf]
(invited talk, Prof. Dr. Iryna Gurevych)
Domain-sensitive Temporal Tagging for Event-centric Information Retrieval.
PhD defense, Heidelberg University, Heidelberg, Germany, March 2, 2015.
Wann findet die deutsche Literatur statt? (together with Frank Fischer)
DHd 2015, Graz, Austria, February 27, 2015. [pdf]
Domain-sensitive Temporal Tagging for Event-centric Information Retrieval.
NLP Lab at DFKI 2015, Berlin, Germany, February 2, 2015.
(invited talk, Prof. Dr. Hans Uszkoreit)
Proximity2-aware Ranking for Textual, Temporal, and Geographic Queries.
CIKM 2013, San Francisco, CA, USA, October 28, 2013.
Domain-sensitive Temporal Tagging for Event-centric Information Retrieval.
WBI Forschungsseminar, HU Berlin, Germany, June 24, 2013.
(invited talk, Prof. Dr. Ulf Leser)
HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3.
SemEval 2013, Atlanta, GA, USA, June 14, 2013.
Event-centric Information Extraction and Retrieval to Explore Document Collections.
Guest Lecture: Web Search and Information Retrieval, Mannheim, Germany, May 7, 2013.
(invited talk, Prof. Dr. Simone Paolo Ponzetto)
Event-centric Document Similarity for Biomedical Literature.
SMBM 2012, Zurich, Switzerland, September 4, 2012.
Event-centric Information Extraction and Information Retrieval to Explore Document Collections.
DIMA Colloquium, Berlin, Germany, July 9, 2012.
(invited talk, Prof. Dr. Volker Markl)
Multilingual and Cross-domain Temporal Tagging with HeidelTime.
LOEWE Colloquium, Frankfurt, Germany, June 28, 2012.
(invited talk, Prof. Dr. Alexander Mehler)
Event-centric Search and Exploration in Document Collections.
JCDL 2012, Washington, D.C., USA, June 12, 2012.
Identification of Top Relevant Temporal Expressions in Documents.
TempWeb 2012, Lyon, France, April 17, 2012.
WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions.
GSCL 2011, Hamburg, Germany, September 30, 2011.
An Event-centric Model for Multilingual Document Similarity.
SIGIR 2011, Beijing, China, July 27, 2011.
Annotating Spatio-Temporal Information in Documents.
Workshop: Name Classification and Grounding in Multilingual Corpora, Zurich, Switzerland, June 8, 2010. [pdf]
(invited talk, Prof. Dr. Martin Volk)
Extraction and Exploration of Spatio-Temporal Information in Documents.
GIR 2010, Zurich, Switzerland, February 19, 2010.
Dependency-based Relation Extraction Using the UIMA-based Text Mining Pipeline UTEMPL.
GSCL 2009, Potsdam, Germany, October 2, 2009.
Combining Linguistic and Symbolic Clues to Improve Multi-word Term Extraction from Free Text. (together with Stefan Geißler and Sebastian Kreß).
Tekom 2005, Wiesbaden, Germany, November 10, 2005.
Other Stuff
- Jahrestage project with Frank Fischer on studying when German literature takes place. We also published the Android app Tiwoli to explore what's happening in German literature today, tomorrow, or on any other day.
- The accompanying material to our at DH 2014 tutorial on A Collaborative, Indeterministic and partly Automatized Approach to Text Annotation" (with Thomas Bögel, Evelyn Gius, and Marco Petris) can be found here.
- Heidelberg theme for Latex Beamer presentations [beamerthemeHeidelberg.sty] (see the invited talk "Annotating Spatio-Temporal Information in Documents" to get an impression of beamer theme Heidelberg)
- Gnuplot Tutorial - Slides of my presentation "Einführung in gnuplot" during winter term 2010/11 (pdf - pdf handout)