Publications
My research work resulted in the following papers, as listed on GScholar.
- Open Information Extraction (Long Conference [Paper])
May ‘24
Venue: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING ‘24, Italy) (remotely presented on May 22)
Fauzan F & Thanmay J (co-first authors), Pulkit M, Mansi R “Leveraging Linguistically Enhanced Embeddings for Open Information Extraction”, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10365–10379- Proposes novel embedding methods that increase performance by 24.9%, 27.3% and 14.9% on Precision, Recall and F1 scores, being the first to integrate linguistic features with a Seq2Seq PLM (here, T5) for the task.
- Contributes a synthetic dataset that boosted performance by 73.7% and 37.9% on Recall and F1 scores over the Seq2Seq version of the current largest annotated dataset, the latter which we show to be very flawed.
- First to use a feature that significantly reduces compute resources by using 72% less tags compared to its counterpart, while maintaining the same performance boost!
- Extends Amazon’s work on structured prediction tasks to include OIE, thus being the first to study how SP pre-training affects OIE performance.
- This work was supervised by Prof. Mansi Radke.
- Visual QA Benchmark for MLLMs [Paper]
September ‘24
Venue: NeurIPS (D&B) ‘24
A global team of 75 co-authors led by MBZUAI. “CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark” (Camera-ready will be published soon)- Creates a question-answering benchmark (CVQA) that stringently tests cultural understanding of multilingual multimodal LLMs.
- Evaluating LLMs in the Legal Domain [Paper]
December ‘23
Venue: Natural Legal Language Processing (NLLP) Workshop, co-located with EMNLP 2023 (orally presented on December 7 in Singapore)
Thanmay J, Fauzan F, Luqman F “Large Language Models are legal but they are not: Making the case for a powerful LegalLLM”, Proceedings of the Natural Legal Language Processing Workshop 2023, pages 223–229- Benchmarked performance of various LLMs over the LEDGAR dataset by zero-shot prompting.