Publications

Yoo Yeon Sung, Eve Fleisig, Yu Hou, Ishan Upadhyay, and Jordan Boyd-Graber (2025). Granular Benchmark for Evaluating Model Calibration against Human Calibration. Association for Computational Linguistics (ACL 2025). Main.

PDF Dataset Code Video

Yoo Yeon Sung, Hannah Kim, Dan Zhang (2025). VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures. ACM CHI Human-centered Evaluation and Auditing of Language Models (HEAL@CHI Workshop 2025).

Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, and Jordan Boyd-Graber (2025). Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness. Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025). Main.

PDF Dataset Code Video

Tasnim Kabir, Yoo Yeon Sung, Saptarashmi Bandyopadhyay, Hao Zou, Abhranil Chandra, and Jordan Lee Boyd-Graber (2024). You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions. Empirical Methods in Natural Language Processing (EMNLP 2024). Main.

Yoo Yeon Sung, Jordan Boyd-Graber, and Naeemul Hassan (2023). Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines. Empirical Methods in Natural Language Processing (EMNLP 2023). Main.

PDF Dataset Code Video

Yoo Yeon Sung, Ishani Mondal, and Jordan Boyd-Graber (2023). How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation. Arxiv.

Yoo Yeon Sung and Seoung Bum Kim (2020). Topical Keyphrase Extraction with Hierarchical Semantic Networks. Decision Support Systems (DSS 2020).