About

Hi, I’m an incoming PhD student at the University of Michigan, co-advised by Prof. Lu Wang and Prof. Honglak Lee. I earned my MS from KAIST advised by Prof. Minjoon Seo. My research focuses on making large language models (LLMs) more efficient and reliable, with a particular interest in how they acquire and utilize knowledge under real-world constraints.

Previously, I spent eight years in industry as a research scientist at Samsung Research and ESTsoft, where I worked on applied LLM systems including Galaxy AI, with a focus on LLM pre-/post-training, information retrieval, and semantic parsing. I also interned at Microsoft Research, where I work with Yeyun Gong, Lei Ji, and Qi Chen, and was honored with the Stars of Tomorrow award at Microsoft Research Asia.

My research interests lie in making large language models more efficient and practical:

Work Experience

Microsoft Research
Microsoft Research
Research Intern
2024.09 – 2025.06 Beijing, China / Vancouver, BC
Samsung Research
Samsung Research
Research Scientist
2020.01 – 2024.09 Seoul, Korea
ESTsoft
ESTsoft
Research Scientist
2016.05 – 2019.12 Seoul, Korea

Publications

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections
Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, Yeyun Gong
ACL 2026 Findings
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin, Lei Ji, Xiao Liu, Yeyun Gong
ICML 2025
Generative Prompt Internalization
Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo
NAACL 2025 Oral
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim, Juyoung Suk, ..., Haebin Shin, ..., Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo
NAACL 2025 Best Paper
KTRL+ F: Knowledge-Augmented In-Document Search
Hanseok Oh*, Haebin Shin*, Miyoung Ko, Hyunji Lee, Minjoon Seo
NAACL 2024
DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections
Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, Yeyun Gong
ACL 2026 Findings
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin, Lei Ji, Xiao Liu, Yeyun Gong
ICML 2025
Exploring Adversarial Robustness in Classification tasks using DNA Language Models
Hyunwoo Yoo, Haebin Shin, Kaidi Xu, Gail Rosen
ICML 2025 GenBio workshop
Can Large Language Models Classify and Generate Antimicrobial Resistance Genes?
Hyunwoo Yoo, Haebin Shin, Gail Rosen
ACL 2025 BioNLP workshop
Generative Prompt Internalization
Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo
NAACL 2025 Oral
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim, Juyoung Suk, ..., Haebin Shin, ..., Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo
NAACL 2025 Best Paper
InstructIR: A Benchmark for Instruction Following of Information Retrieval Models
Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo
ACL 2024 KnowledgeNLP workshop
KTRL+ F: Knowledge-Augmented In-Document Search
Hanseok Oh*, Haebin Shin*, Miyoung Ko, Hyunji Lee, Minjoon Seo
NAACL 2024
Intuitive access to smartphone settings using relevance model trained by contrastive learning
Joonyoung Kim, Kangwook Lee, Haebin Shin, Hurnjoo Lee, Sechun Kang, Byunguk Choi, Dong Shin, Joohyung Lee
AAAI 2023 — Innovative Applications of Artificial Intelligence (IAAI-23)
Learning to embed multi-modal contexts for situated conversational agents
Haeju Lee, Oh Joon Kwon, Yunseon Choi, Minho Park, Ran Han, Yoonhyung Kim, Jinhyeon Kim, Youngjune Lee, Haebin Shin, Kangwook Lee, Kee-Eung Kim
NAACL 2022 Findings
Tackling situated multi-modal task-oriented dialogs with a single transformer model
Haeju Lee, Oh Joon Kwon, Yunseon Choi, Jinhyeon Kim, Youngjune Lee, Ran Han, Yoonhyung Kim, Minho Park, Kangwook Lee, Haebin Shin, Kee-Eung Kim
AAAI 2022 DSTC10 workshop

Selected Honors & Awards

Stars of Tomorrow Microsoft Research Asia
Internship Award of Excellence
2025.04
1st Place — AAAI 2022 DSTC10 Challenge (SIMMC 2.0) AAAI 2022
Competition on Situated Interactive Multimodal Conversational AI
2022.01
1st Place — NeurIPS 2020 NLC2CMD Challenge NeurIPS 2020
Competition on Natural Language to Bash Command translation; 1st (Efficiency), 4th (Accuracy)
2020.12
1st Place — Award by Ministry of Science and ICT The Government of Korea
National competition on fake news detection, part of the nation’s most prestigious AI R&D Challenge
2017.12
1st Place — Award by Ministy of Culture, Sports and Tourism The Government of Korea
National competition on developing a QA system for Korean language resources
2015.08

Patents

Apparatus for joining data and method for controlling thereof
US Patent Application No. 18/299,413 Filed 2023
Apparatus for detecting contextually-anomalous sentence in document, method therefor, and computer-readable recording medium having program for performing same method recorded thereon
US Patent No. 11727703 Granted 2023
Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method
KR Patent No. 10-1939209 Granted 2019
Motion control method for station type terminal
KR Patent No. 10-1601763 Granted 2016