Ashish Chouhan

Office:

Institute of Computer Science

Im Neuenheimer Feld 205
69120 Heidelberg, Germany
Room 1.333 (first floor)

Email: chouhan(at)informatik.uni-heidelberg(dot)de

Office hours: By appointment


News

2024-04-30: Our work LexDrafter: Terminology Drafting for Legislative Documents using Retrieval Augmented Generation has been accepted with an oral presentation at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) to be held in Torino (Italy) on 20-25 May, 2024! See here for the paper, or check out the code on GitHub.


2022-10-06: Our work Eur-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain has been accepted to the main track of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP'22) to be held in Abu Dhabi! See here for the paper, or check out the code on GitHub and the public version of our dataset.


About

I completed my bachelor's degree in 2014 from Rashtrasant Tukadoji Maharaj Nagpur University, India and joined HSBC Technology, Pune, India, as a Software Engineer. During my work at HSBC Technology, I created applications in the banking domain, specifically in the sub-domain of demand deposits, overdrafts, term deposits, and statements and advice. After working in IT for 3.5 years, I left HSBC Technology as a Senior Software Engineer and started my Master's at SRH Hochschule Heidelberg in Applied Computer Science in 2018. During my Master, I worked as a working student in SAP SE, where my responsibility was to create a process automation tool using GitHub webhooks for Open Source Program Office. I finished my Master's in April 2020, where, for my thesis project, I created a FIF framework that analyzes the user stories created during a project kickoff and applied topic modelling, feature engineering, and natural language inference to provide a matrix of potential machine learning algorithm that can be trained for a set of features from the data warehouse to answer the user stories. From April 2020 till January 2023, I worked as an Academic Researcher at SRH Hochschule Heidelberg, where I taught Master students concepts dealing with sequence data, text mining, natural language processing, machine learning operations, and data management concepts comprising data profiling, data cleaning, and ETL job creation. In March 2021, I joined Prof. Dr. Michael Gertz's Data Science Research Group as an Extern Doctoral Researcher at the Faculty of Mathematics and Computer Science. In February 2023, I joined the Data Science Research Group as a Full-time Doctoral Researcher.


Reviewing Activities

Conferences:

Journals:


Teaching & Supervision

Courses:

  • Lecture Assistant for "Natural Language Processing with Transformers" (Winter 2023)

Co-supervised Seminars:

  • "Retrieval Augmented Generation" (Summer 2024)
  • "Modern Information Retrieval" (Summer 2023)

Supervised (under-)graduate Student Practicals:

  • "Unlocking Legal Insights: Analyzing LinkedIn Posts and Documents" (Summer 2024)
  • "Corpus Exploration via Conversation with Clustered arXiv Abstracts" (Summer 2024)
  • "PubMedXplorer: Concepts Exploration in PubMed Abstracts" (Winter 2023)
  • "Multi-Aspect PubMed Corpus Exploration" (Summer 2023)

Supervised Master Theses (co-supervision with Michael Gertz):

  • Dennis Geiselmann: "Context-Aware Dense Retrieval" (Winter 2023)
  • Jason Pyanowski: "Document Expansion by Query Prediction" (Winter 2023)

Supervised Undergraduate Theses (co-supervision with Michael Gertz):

  • Vivian Kazakova: "A Topic Modeling Framework for PubMed Data Analysis" (Summer 2023) 


Research Interests

  • Natural Language Processing
  • Retrieval Augmented Generation (RAG)
  • Question-Answering Systems
  • Corpus Management and Exploration


Publications

2024

  • Chouhan, Ashish, and Gertz, Michael.
    LexDrafter: Terminology Drafting for Legislative Documents Using Retrieval Augmented Generation".
    In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 10448–10458
    [arXiv] [code] [online]

2022

  • Aumiller, Dennis, Chouhan, Ashish, and Gertz, Michael.
    EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain.
    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 7626–7639
    [arXiv] [code] [online]
  • Dix, Marcel, Chouhan, Ashish, Sinha, Madhushree, Singh, Akhil, Bhattarai, Suraj, Narkhede, Shweta, and Prabhune, Ajinkya.
    An AI-based Alarm Prediction in Industrial Process Control Systems.
    In: Proceedings of 2022 IEEE International Conference on Big Data and Smart Computing (BigComp). 2022, 242-245
    [online]
  • Tripathy, Sarthak Manas, Chouhan, Ashish, Dix, Marcel, Kotriwala, Arzam, Klöpper, Benjamin, and Prabhune, Ajinkya.
    Explaining Anomalies in Industrial Multivariate Time-series Data with the help of eXplainable AI.
    In: Proceedings of 2022 IEEE International Conference on Big Data and Smart Computing (BigComp). 2022, 226-233
    [online]

2021

  • Chouhan, Ashish, Sangireddy, Venkata Siva Rami Reddy, Sridharababu, Sarath Kumar, Fatmi, Syed Sameer Iqbal, Selvaraj, Dinesh Kumar, Kanasani, Anil Kumar, Schneider, Christoph, Knupp, Florian, Tumanova, Iuliia, and Prabhune, Ajinkya.
    HTIE: A Hierarchical Task Identification Framework for E-mails.
    In: Proceedings of 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService). 2021, 78-86
    [online]
  • Chouhan, Ashish, Prabhune, Ajinkya, Raj, Ankit, Chandra, Darshan, Subramanya, Sindhu, Asangi, Mahaveer, and Thottempudi, Sree Ganesh.
    Shotifier: A Binary Shot Conversion Classifier Pipeline for Football Forwards.
    In: Proceedings of 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). 2021, 156-163
    [online]
  • Dix, Marcel, Chouhan, Ashish, Ganguly, Srishti, Pradhan, Sripada, Saraswat, Devika, Agrawal, Surbhi, and Prabhune, Ajinkya.
    Anomaly detection in the time-series data of industrial plants using neural network architectures.
    In: Proceedings of 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService). 2021, 222-228
    [online]
  • Lalwani, Riya, Chouhan, Ashish, John, Varun, Sonar, Prashant, Mahajan, Aakash, Pendyala, Naresh, Streicher, Alexander, Prabhune, Ajinkya", editor="Roll, Ido, McNamara, Danielle, Sosnovsky, Sergey, Luckin, Rose, and Dimitrova, Vania.
    I-Mouse: A Framework for Player Assistance in Adaptive Serious Games.
    In: Proceedings of 2021 Artificial Intelligence in Education (AIED). 2021, 234–238
    [online]

2020

  • Chouhan, Ashish, Prabhune, Ajinkya, Prabhuraj, Paneesh, and Chaudhari, Hitesh.
    DWreck: A Data Wrecker Framework for Generating Unclean Datasets.
    In: Proceedings of the 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService). 2020, 78-87
    [online]
  • Kurian, John Joy, Menezes, Deborah Zenobia Rachael, Ronanki, Avinash, Sharma, Gaurang, Prasad, Sandeep Krishna, Chouhan, Ashish, and Prabhune, Ajinkya.
    EnFVe: An Ensemble Fact Verification Pipeline.
    In: Proceedings of the 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 2020, 80-89
    [online]

2019

  • Prabhune, Ajinkya, and Chouhan, Ashish.
    FIF: A NLP-Based Feature Identification Framework for Data Warehouses.
    In: Proceedings of the 2019 IEEE/WIC/ACM International Conference on Web Intelligence. 2019, 276–281
    [online]