About

Hi, I’m a graduate student at KAIST, specializing in natural language processing (NLP) and large language models (LLMs), advised by Prof. Minjoon Seo. I am also a research intern at Microsoft Research, where I work with Yeyun Gong, Lei Ji, and Qi Chen. I was honored with the Stars of Tomorrow award at Microsoft Research Asia.

Previously, I worked as a research scientist at Samsung Research and ESTsoft, contributing to applied LLM systems for Galaxy AI. I conducted research on foundation models, primarily on LLM pre-/post-training, information retrieval, and semantic parsing.

My research interests lie in making large language models more efficient and practical:

Efficient training and inference methods for LLMs, including cost-efficient inference (NAACL 2025), distillation (ICML 2025, NAACL 2025), and data mixture optimization (DynamixSFT)
Real-world impact, advancing agent adaptability (NAACL 2025), scientific discovery (GenBio 2025, BioNLP 2025), and real-time information access (NAACL 2024).

Work Experience

Microsoft Research

Research Intern

2024.09 – Present Beijing, China / Vancouver, BC

Samsung Research

Researcher

2020.01 – 2024.09 Seoul, Korea

ESTsoft

Researcher

2016.05 – 2019.12 Seoul, Korea

Selected Publications

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Qi Chen, Yeyun Gong

Preprint

Paper

Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling

Haebin Shin, Lei Ji, Xiao Liu, Yeyun Gong

ICML 2025

Paper

Generative Prompt Internalization

Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo

NAACL 2025 Oral

Paper Code

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Seungone Kim, Juyoung Suk, ..., Haebin Shin, ..., Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo

NAACL 2025 Best Paper

Paper Code Dataset

InstructIR: A Benchmark for Instruction Following of Information Retrieval Models

Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo

ACL 2024 KnowledgeNLP workshop

Paper Code Dataset

KTRL+ F: Knowledge-Augmented In-Document Search

Hanseok Oh*, Haebin Shin*, Miyoung Ko, Hyunji Lee, Minjoon Seo

NAACL 2024

Paper Code

Education

Korea Advanced Institute of Science & Technology (KAIST)

M.S. in Graduate School of AI

2022 – Present

Sogang University

B.S. in Computer Science

2013 – 2020

Selected Honors & Awards

Stars of Tomorrow Microsoft Research Asia

Internship Award of Excellence

2025.04

1st Place — NeurIPS 2020 NLC2CMD Challenge NeurIPS 2020

Competition on Natural Language to Bash Command translation; 1st (Efficiency), 4th (Accuracy)

2020.12

1st Place — Award by Ministry of Science and ICT The Government of Korea

National competition on fake news detection, part of the nation’s most prestigious AI R&D Challenge

2017.12

1st Place — Award by Ministy of Culture, Sports and Tourism The Government of Korea

National competition on developing a QA system for Korean language resources

2015.08

University Presidential Award Sogang University

University’s highest honor for outstanding student research achievement

2015.01

Patents

Apparatus for joining data and method for controlling thereof

US Patent Application No. 18/299,413 Filed 2023

Link

Apparatus for detecting contextually-anomalous sentence in document, method therefor, and computer-readable recording medium having program for performing same method recorded thereon

US Patent No. 11727703 Granted 2023

Link

Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method

KR Patent No. 10-1939209 Granted 2019

Link

Motion control method for station type terminal

KR Patent No. 10-1601763 Granted 2016

Link