About Me
I’m a Master’s student in Computational Linguistics at the University of Washington in Seattle. I’m broadly interested in computational linguistics and NLP, especially in how linguistic structure and meaning can be modeled and used to build better language technologies.
More recently, I’ve become particularly interested in trustworthy LLMs: how we can evaluate and improve models’ reliability in real-world settings, with a focus on hallucination, fairness, and socio-cultural bias.
I previously worked at Korea Electronics Technology Institute(KETI), NCSOFT, and SK Telecom. Those experiences made me care about how language models behave in the real world, not just how accurate they are, but how reliable and fair they can be.
I hold a B.A. in Linguistics & Cognitive Science and a double major in Language Science in Artificial Intelligence from Hankuk University of Foreign Studies, where I was advised by Prof. Jeesun Nam.
News
- Sep 2025 Started M.S. in Computational Linguistics at University of Washington.
Education
- Master of Science in Computational Linguistics
- Bachelor of Arts in Linguistics & Cognitive Science
- Bachelor of Language Science in Artificial Intelligence (Double Major)
- Leave of Absence - Military Service (Feb 2019 - Sep 2020)
Work Experience
I worked on building evaluation resources for Korean LLMs with an emphasis on linguistically grounded reliability: how models maintain grounded claims, interpret indirect meaning, and respond appropriately under pragmatic pressure. Rather than scoring only “correct/incorrect,” I focused on identifying discourse-level failure patterns and the linguistic triggers behind them.
- Designed prompt suites and rubric-based evaluations for groundedness/hallucination, with failure-mode categories tied to discourse phenomena, such as attribution, consistency, and hedging.
- Developed annotation guidelines to assess socio-cultural bias and pragmatic harm, emphasizing appropriateness in Korean register and stance.
- Validated a Korean multimodal dialogue dataset by aligning utterance-level pragmatic functions, such as agreement, refusal, and hesitation, with nonverbal cue labels.
Key insight: many reliability issues are not isolated “wrong facts,” but emerge from how meaning is constructed across turns, including implicature, underspecification, and discourse coherence.
I supported safety-focused evaluation for conversational AI by building test cases that treat risk as a linguistic and interactional phenomenon. In addition to single-turn risk detection, I evaluated context-aware harmfulness in multi-turn settings where intent becomes clearer through discourse trajectories.
- Created semantics-informed risk probes for single-turn inputs, including cases where meaning depends on polysemy or indirect phrasing.
- Developed multi-turn scenarios to capture escalation patterns, such as justification, concretization, and actionable requests, and measured how risk accumulates over dialogue context.
- Analyzed evasion behaviors after refusals, such as repair, paraphrase, and euphemisms, and incorporated these patterns into testing protocols.
Key insight: risk is often discourse-driven. What looks harmless in isolation can become unsafe when reference resolution, implicature, and turn-by-turn intent shifts are taken into account.
I contributed to a persona-based chatbot by focusing on two linguistically central requirements for Korean dialogue: (1) robustness to ambiguity in colloquial, non-canonical user inputs, and (2) sociolinguistic coverage across honorific/register variation.
- Built a linguistically grounded ambiguity taxonomy, including ellipsis, anaphora, spacing/noise, and informal variants, to standardize how ambiguous inputs are handled.
- Applied text normalization and preprocessing guided by the taxonomy to reduce interpretation noise and improve NLU robustness.
- Led annotation efforts to ensure sociolinguistic coverage, including honorific levels, informal slang, and code-mixing, so the system behaves consistently across styles.
Key insight: in Korean conversational products, performance depends not only on intent classification accuracy but also on pragmatic consistency under register shifts.
Research Experience
At DICORA Lab, I worked on corpus-driven projects where the main goal was to turn raw text into usable linguistic resources. I was involved in data construction, guideline iteration, and quality control across domains, including finance and dialogue NLU.
- Supported corpus collection and preprocessing pipelines, emphasizing consistent unit definitions, such as sentence and utterance, and annotation-ready formatting.
- Assisted in annotation design and QA routines to improve label clarity and reduce ambiguity in training data.
- Helped construct NLU datasets for chatbots with attention to semantic coverage and linguistic variation.
Projects
I worked on a team project that evaluated English-to-Chinese dialogue summarization on XSAMSum. We compared fine-tuned baselines with zero-shot agentic pipelines using local open-weight LLMs, and analyzed whether semantic intermediate representations could improve interpretability.
- Compared Direct, Translate-then-Summarize, Summarize-then-Translate, and Semantic pipeline configurations.
- Evaluated generated Chinese summaries using ROUGE, BERTScore, and OmniScore.
- Analyzed semantic representation errors to identify information bottlenecks, schema instability, and model-specific failure patterns.
- Found that multi-agent decomposition does not automatically improve performance, but can make model failures more inspectable.
Key insight: more complex agentic pipelines are not necessarily better, but structured intermediate representations can reveal where models lose or distort information.
This thesis explored why sentiment in financial news cannot be captured by word lists alone. I modeled sentiment compositionally by focusing on how attribute nouns, such as interest rates and costs, interact with directional predicates, such as rise and fall, where polarity shifts depending on the semantics of the attribute.
- Designed pattern-based rules grounded in lexical semantics and compositionality, such as “costs rise” versus “stock prices rise.”
- Built a small, interpretable pipeline to extract attribute-predicate relations from financial text and analyzed common error patterns.
- Reflected on limitations and extensions, such as industry metadata and context windows, for more robust interpretation.
- Recognized with the Outstanding Undergraduate Thesis Award.
Relevant Coursework
- LING 473: Basics for Computational Linguistics
- LING 566: Introduction to Syntax for Computational Linguistics
- LING 570: Shallow Processing Techniques for NLP
- LING 571: Deep Processing Techniques for NLP
- LING 572: Advanced Statistical Methods in NLP
- LING 573: NLP Systems and Applications
- LING 575: Data Matters (Topics in NLP)
- Machine Learning for Language Analysis
- Big Data and Sentiment Analysis
- Language Information Processing
- Natural Language Data
- Corpus Analysis and Dictionary
- Intro to Linguistics and Language Technology
- Computer and Linguistics
- Syntactic Analysis
- Phonetics
- Pragmatics
- Language Typology
- Language and Logic
- Introduction to Linguistics
- Forensic Linguistics
- Grammar in Korean as Foreign Language
- Introduction to Cognitive Science
- Cognitive Psychology
- Neurolinguistics
- Linguistics and Psychological Experiments
- Language and Human Beings
- CSE 373: Data Structures and Algorithms
- Language and Database
- Programming Languages and Laboratory
- Introduction to Programming Languages
- Essential Programming for Linguistics
- Programming for Language Analysis
- Probability and Statistics
- Computer Mathematics
- Statistics for Language Analysis