Yoo Yeon Sung (성유연)

Yoo Yeon Sung (성유연)

Ph.D. Candidate in College of Information

University of Maryland

Hi, I’m Yoo Yeon (“You-yawn”)!

I am a fifth-year Ph.D. candidate at the College of Information at University of Maryland, College Park. I am fortunate to be advised by Jordan Boyd-Graber and Naeemul Hassan. I got a M.S. degree in Industrial Management Engineering from Korea university, and B.A. degree in English Literature and Language from Chung-Ang University. I was a visiting researcher at KAIST AI Graduate school, advised by Jaegul Choo. My current research is in Human-Centered NLP and Responsible AI.


Recent News

  • March 2025: VeriLA paper is accepted to HEAL@CHI! This work introduces human-centered evaluation framework for verifying LLM Agent failures, which systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans.
  • February 2025: GRACE paper preprint is out! This work compares LLM calibration with human calibration to support future trustworthy AI assistance
  • January 2025: AdvScore paper accepted to NAACL 2025 Main (Meta review score: 5/5)! This work was awarded the MetaAI Dynabench Grant: “A Leaderboard and Competition for Human–computer Adversarial Question Answering”
  • November 2024: Hosted online QANTA data collection (Round 2). Official webpage
  • November 2024: QB2NQ paper accepted to EMNLP 2024 Main!
  • October 2024: Hosted in-person QANTA data collection at MIT and Berkeley for Human-grounded LLM calibration evaluation! Authoring interface
  • May 2024: Started summer internship at Megagon Labs!
  • November 2023: Not All Fake News is Written paper accepted to EMNLP 2023 Main!

Research

My research focuses on the human-centered evaluation of LLMs. Specifically, I work on creating benchmarks, developing evaluation metrics, and fine-tuning language models to better distinguish LLMs from humans or enhance human-AI complementarity. I enjoy conducting large-scale human user studies where humans interact with LLMs, enabling observing of how well LLMs jointly process tasks with humans, as humans, and assessing their ability to align with human reasoning and decision-making processes. By uncovering LLM vulnerabilities in comparison to humans, I aim to contribute to the development of safe language systems that align with human behavior and values.

My recent work include:

  • Developed a human-centered evaluation framework for accountable human-agent interactive systems
  • Created adversarial benchmark to evaluate language model calibration against human calibration
  • Formulated human-grounded metrics to assess a benchmark’s adversarial robustness

Currently, I am working on:

  • Designing human-driven adversarial benchmarks to expose VLM vulnerabilities
  • Personalizing LLMs for users in misinformation tasks
  • Investigating how humans and LLMs cooperate in competitive environments

The keywords that excite me the most are: Human-AI alignment, human-centered LLM evaluation, and AI robustness and reliability. Since human behavior and data are inherent to my research questions, I highly value a human-grounded approach to building, measuring, and interacting with language systems.

Publications

(2025). VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures. ACM CHI Human-centered Evaluation and Auditing of Language Models (HEAL@CHI Workshop 2025).

Arxiv

(2025). Granular Benchmark for Evaluating Model Calibration against Human Calibration.

Arxiv

(2025). Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness. Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025). Main.

Arxiv

(2024). You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions. Empirical Methods in Natural Language Processing (EMNLP 2024). Main.

PDF Code

(2023). Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines. Empirical Methods in Natural Language Processing (EMNLP 2023). Main.

PDF Dataset Code Video