Yoo Yeon Sung (성유연)

Yoo Yeon Sung (성유연)

Ph.D. Candidate in College of Information

University of Maryland

Hi, I’m Yoo Yeon (“You-yawn”)!

I am a fifth-year Ph.D. candidate at the College of Information at University of Maryland, College Park. I am fortunate to be advised by Jordan Boyd-Graber and Naeemul Hassan. I got a M.S. degree in Industrial Management Engineering from Korea university, and B.A. degree in English Literature and Language from Chung-Ang University. I was a visiting researcher at KAIST AI Graduate school, advised by Jaegul Choo. My current research is in Human-Centered NLP and Responsible AI.


Recent News

  • January 2025: AdvScore paper accepted to NAACL Main (Meta review score: 5/5)!
  • November 2024: Hosted online QANTA data collection (Round 2) for Human-grounded LLM calibration evaluation. Official webpage
  • November 2024: QB2NQ paper accepted to EMNLP Main!
  • October 2024: Hosted in-person QANTA data collection at MIT and Berkeley for Human-grounded LLM calibration evaluation. Authoring interface
  • May 2024: Started summer internship at Megagon Labs!
  • November 2023: Not All Fake News is Written paper accepted to EMNLP Main!

Areas of Interest

  1. Human-AI Alignment: Developing and testing robust human-AI interactive systems.

  2. Human-Centered LLM Evaluation: Creating evaluation frameworks to assess AI reliability and general capability.

  3. Robust Benchmark Datasets: Building adversarial datasets with human incentives and validation.

  4. Combating Misinformation: Developing interactive systems to detect and mitigate misinformation.


Current Research Focus

I strive to develop AI systems that align closely with human needs. Ultimately, I want to create AI models that not only achieve high accuracy but also bring positive social impact by enhancing reliability and fostering supportive, human-complementary interactions. My specific interests are:

  1. Benchmark Dataset Creation: Develop datasets that evaluate language models based on human-centered standards.
  2. Robustness Evaluation Metrics: Design metrics that assess LLM robustness through the lens of human capabilities.
  3. Human-AI Interaction Testing: Evaluate human-AI interactive systems that support and complement human users effectively.

Publications

(2025). Granular Benchmark for Evaluating Model Calibration against Human Calibration.

Arxiv

(2025). Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness. Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL). Main.

Arxiv

(2024). You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions. Empirical Methods in Natural Language Processing (EMNLP). Main.

PDF Code

(2023). Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines. Empirical Methods in Natural Language Processing (EMNLP). Main.

PDF Dataset Code Video