Dr. Erich Schubert

now at Technical University of Dortmund


News

About

I did my PhD in the database systems group at the Ludwig-Maximilians-Universität München before I joined the Database Systems Research group of Prof. Dr. Michael Gertz as a Post-Doc. My thesis was on generalizing outlier detection, and I did some research on change detection on large-scale textual data streams.

I am a lead author of the ELKI data mining toolkit.

Research Interests

  • Data Mining & Text Mining
  • Event Detection and Analysis
  • Clustering and Outlier Detection
  • Information Retrieval & Information Extraction
  • Network Analysis & Graph Algorithms
  • Machine Learning
See also: Google ScholarDBLPORCID  – ACM Digital LibrarySemantic ScholarAminerScopus

Publications

2018

  • Erich Schubert, Andreas Spitz, and Michael Gertz.
    Exploring Significant Interactions in Live News.
    In: Proceedings of the 2nd International Workshop on Recent Trends in News Information Retrieval (NewsIR'18) co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France. 2018, 39–44
    [open-access (CEUR-WS)] [bibtex]
  • Erich Schubert, and Michael Gertz.
    Improving the Cluster Structure Extracted from OPTICS Plots.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA), Mannheim, Germany. 2018, 318–329
    [open-access (CEUR-WS)] [code] [bibtex]
  • Erich Schubert, Sibylle Hess, and Katharina Morik.
    The Relationship of DBSCAN to Matrix Factorization and Spectral Clustering.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA), Mannheim, Germany. 2018, 330–334
    [open-access (CEUR-WS)] [bibtex]
  • Michael E. Houle, Erich Schubert, and Arthur Zimek.
    On the Correlation Between Local Intrinsic Dimensionality and Outlierness.
    In: Proceedings of the 11th International Conference on Similarity Search and Applications (SISAP), Lima, Peru. 2018, to appear
    [bibtex]
  • Erich Schubert, and Michael Gertz.
    Numerically Stable Parallel Computation of (Co-)Variance.
    In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management (SSDBM), Bolzano-Bozen, Italy. 2018, 10:1–10:12, SSDBM 2018 best paper award
    [slides (pdf)] [manuscript (pdf)] [DOI:10.1145/3221269.3223036] [bibtex]

2017

  • Evelyn Kirner, Erich Schubert, and Arthur Zimek.
    Good and Bad Neighborhood Approximations for Outlier Detection Ensembles.
    In: Proceedings of the 10th International Conference on Similarity Search and Applications (SISAP), Munich, Germany. 2017, 173–187
    [slides (pdf)] [manuscript (pdf)] [code] [DOI:10.1007/978-3-319-68474-1_12] [bibtex]
  • Erich Schubert, and Michael Gertz.
    Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection - A Remedy Against the Curse of Dimensionality?.
    In: Proceedings of the 10th International Conference on Similarity Search and Applications (SISAP), Munich, Germany. 2017, 188–203
    [slides (pdf)] [manuscript (pdf)] [code] [DOI:10.1007/978-3-319-68474-1_13] [bibtex]
  • Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, and Michael Gertz.
    Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding.
    In: CoRR abs/1708.03569. 2017
    [open-access (arXiv)] [online] [bibtex]
  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    The (black) art of runtime evaluation: Are we comparing algorithms or implementations?.
    In: Knowledge and Information Systems (KAIS) 52 (2). 2017, 341–378, Online first 2016, paginated 2017
    [authorized access (Springer)] [DOI:10.1007/s10115-016-1004-2] [bibtex]
  • Guillaume Casanova, Elias Englmeier, Michael E. Houle, Peer Kröger, Michael Nett, Erich Schubert, and Arthur Zimek.
    Dimensional Testing for Reverse k-Nearest Neighbor Search.
    In: Proceedings of the VLDB Endowment 10 (7). 2017, 769–780
    [pdf] [DOI:10.14778/3067421.3067426] [bibtex]
  • Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu.
    DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN.
    In: ACM Transactions on Database Systems (TODS) 42 (3). 2017, 19:1–19:21
    [authorized access (ACM)] [DOI:10.1145/3068335] [bibtex]
  • Arthur Zimek, and Erich Schubert.
    Outlier Detection.
    In: Ling Liu, and M. Tamer Özsu (eds.), Encyclopedia of Database Systems. 2017, 5, online first, to appear 2018
    [DOI:10.1007/978-1-4899-7993-3_80719-1]

2016

  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA), Potsdam, Germany. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)] [data and results]
  • Laurent Amsaleg, Michael E. Houle, and Erich Schubert (eds.).
    Similarity Search and Applications - 9th International Conference, SISAP 2016, Tokyo, Japan, October 24-26, 2016. Proceedings.
    Lecture Notes in Computer Science 9939. 2016
    [conference homepage] [DOI:10.1007/978-3-319-46759-7] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM), Budapest, Hungary. 2016, 8:1–8:12
    [authorized access (ACM)] [preprint (pdf)] [DOI:10.1145/2949689.2949699] [bibtex]
  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study.
    In: Data Mining and Knowledge Discovery 30 (4). 2016, 891–927, Awarded “ACM Computing Reviews Notable Books and Articles 2016”
    [authorized access (Springer)] [data and results] [DOI:10.1007/s10618-015-0444-8] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    Scalable Detection of Emerging Topics and Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA), Potsdam, Germany. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)]

2015

  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles.
    In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam. 2015, 19–36
    [preprint (pdf)] [slides (pdf)] [code] [DOI:10.1007/978-3-319-18123-3_2] [bibtex]
  • Erich Schubert, Michael Weiler, and Arthur Zimek.
    Outlier Detection and Trend Detection: Two Sides of the Same Coin.
    In: 1st International Workshop on Event Analytics using Social Media Data at the 15th IEEE International Conference on Data Mining (ICDM), Atlantic City, NJ. 2015, 40–46
    [preprint (pdf)] [DOI:10.1109/ICDMW.2015.79] [bibtex]
  • Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, and Arthur Zimek.
    A Framework for Clustering Uncertain Data.
    In: Proceedings of the VLDB Endowment 8 (12). 2015, 1976–1979
    [open-access (VLDB)] [code] [DOI:10.14778/2824032.2824115] [bibtex]
  • Erich Schubert, and OpenStreetMap Contributors.
    Fast Reverse Geocoder using OpenStreetMap data.
    Open Data LMU. 2015
    [code] [data]

2014

  • Xuan Hong Dang, Ira Assent, Raymond T. Ng, Arthur Zimek, and Erich Schubert.
    Discriminative Features for Identifying and Interpreting Outliers.
    In: Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, IL. 2014, 88–99
    [preprint (pdf)] [DOI:10.1109/ICDE.2014.6816642] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds.
    In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), New York, NY. 2014, 871–880, Included in Wang, Wei. “Data Science for Social Good - 2014 KDD Highlights.” AAAI. 2015.
    [authorized access (ACM)] [preprint (pdf)] [slides (pdf)] [online demo (static)] [DOI:10.1145/2623330.2623740] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Generalized Outlier Detection with Flexible Kernel Density Estimates.
    In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA. 2014, 542–550
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611973440.63] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Local Outlier Detection Reconsidered: a Generalized View on Locality with Applications to Spatial, Video, and Network Outlier Detection.
    In: Data Mining and Knowledge Discovery 28 (1). 2014, 190–237, Online 2012, paginated 2014
    [authorized access (Springer)] [code] [DOI:10.1007/s10618-012-0300-z] [bibtex]

2013

  • Elke Achtert, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Interactive Data Mining with 3D-Parallel-Coordinate-Trees.
    In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY. 2013, 1009–1012
    [ELKI] [authorized access (ACM)] [DOI:10.1145/2463676.2463696] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Geodetic Distance Queries on R-Trees for Indexing Geographic Data.
    In: Proceedings of the 13th International Symposium on Spatial and Temporal Databases (SSTD), Munich, Germany. 2013, 146–164
    [code] [DOI:10.1007/978-3-642-40235-7_9] [bibtex]
  • Erich Schubert.
    Generalized and Efficient Outlier Detection for Spatial, Temporal, and High-Dimensional Data Mining.
    PhD thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2013
    [Universitätsbibliothek] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Gold Coast, Australia. 2013
    [slides (pdf)]

2012

  • Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Clusterings – Metrics and Visual Support.
    In: Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC. 2012, 1285–1288
    [ELKI] [DOI:10.1109/ICDE.2012.128] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Arbitrarily Oriented Subspaces.
    In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium. 2012, 379–388
    [code] [DOI:10.1109/ICDM.2012.21] [bibtex]
  • Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel.
    On Evaluation of Outlier Rankings and Outlier Scores.
    In: Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA. 2012, 1047–1058
    [code] [DOI:10.1137/1.9781611972825.90] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data.
    In: Statistical Analysis and Data Mining 5 (5). 2012, 363–387, Included in the “most accessed papers from Statistical Analysis and Data Mining” 2014–2016
    [more information] [DOI:10.1002/sam.11161] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 12th International Conference on Data Mining (ICDM), Brussels, Belgium. 2012
    [slides (pdf)] [DOI:10.1109/ICDM.2012.9]

2011

  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Multiple Clustering Solutions.
    In: 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece. 2011, 55–66
    [open-access (CEUR-WS)] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Interpreting and Unifying Outlier Scores.
    In: Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ. 2011, 13–24
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611972818.2] [bibtex]
  • Elke Achtert, Ahmed Hettab, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Spatial Outlier Detection: Data, Algorithms, Visualizations.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 512–516, Best Demonstration Paper Award
    [ELKI] [DOI:10.1007/978-3-642-22922-0_41] [bibtex]
  • Thomas Bernecker, Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Quality of Similarity Rankings in Time Series.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 422–440
    [DOI:10.1007/978-3-642-22922-0_25] [bibtex]

2010

  • Elke Achtert, Hans-Peter Kriegel, Lisa Reichert, Erich Schubert, Remigius Wojdanowski, and Arthur Zimek.
    Visual Evaluation of Outlier Detection Models.
    In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications (DASFAA), Tsukuba, Japan. 2010, 396–399
    [ELKI] [poster] [DOI:10.1007/978-3-642-12098-5_34] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search Using the Ideas of Ranking and Top-k Retrieval.
    In: Proceedings of the 26th International Conference on Data Engineering (ICDE) Workshop on Ranking in Databases (DBRank), Long Beach, CA. 2010, 4–9
    [more information] [DOI:10.1109/ICDEW.2010.5452771] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 555–564
    [preprint (pdf)] [more information] [DOI:10.1007/978-3-642-13818-8_38] [bibtex]
  • Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 482–500
    [preprint (pdf)] [supplementary material] [DOI:10.1007/978-3-642-13818-8_34] [bibtex]
  • Ines Färber, Stephan Günnemann, Hans-Peter Kriegel, Peer Kröger, Emmanuel Müller, Erich Schubert, Thomas Seidl, and Arthur Zimek.
    On Using Class-Labels in Evaluation of Clusterings.
    In: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC. 2010
    [pdf]

2009

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    LoOP: Local Outlier Probabilities.
    In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China. 2009, 1649–1652
    [pdf] [authorized access (ACM)] [code] [DOI:10.1145/1645953.1646195] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data.
    In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Bangkok, Thailand. 2009, 831–838
    [pdf] [slides] [code] [DOI:10.1007/978-3-642-01307-2_86] [bibtex]
  • Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.
    In: Proceedings of the 11th International Symposium on Spatial and Temporal Databases (SSTD), Aalborg, Denmark. 2009, 436–440
    [ELKI] [pdf] [poster] [DOI:10.1007/978-3-642-02982-0_35] [bibtex]

2008

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms.
    In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China. 2008, 418–435
    [preprint (pdf)] [code] [DOI:10.1007/978-3-540-69497-7_27] [bibtex]
  • Erich Schubert.
    Statistical Approaches for Robustifying Correlation Clustering Algorithms.
    Diploma thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2008

2005

  • Erich Schubert, Sebastian Schaffert, and François Bry.
    Structure-Preserving Difference Search for XML Documents.
    In: Proceedings of the Extreme Markup Languages 2005 Conference, Montreal, Quebec, Canada. 2005
    [open-access] [code] [EE] [bibtex]
  • Patrick F. Riley, and Erich Schubert.
    mReplay: Mobile Sports Replay and Fan Democracy.
    In: Axmedis 2005: Proceedings of the 1st International conference on Automated production of Cross Media content for Multi-channel distribution. 2005
    [DOI:10.1400/41109]
  • Erich Schubert.
    Structure Preserving Difference Search in Semistructured Data.
    Project thesis (undergraduate), Ludwig-Maximilians-Universität München, Munich, Germany. 2005