Welcome to the Data Systems Lab (Big Data Lab)

Welcome to the Data Systems Lab (Big Data Lab) at the POSTECH. Our data systems lab focuses on STAR (namely, Systems, Theory, and ARtificial intelligence) supported by major grants such as Star Lab. We have been endeavoring to solve challenging and real problems in computer/data science.

Building Architects Through Systems. We train our students to become system architects—not just coders—by having them design and build large-scale data infrastructure from the ground up. In an age where AI can generate code and pass test suites, the ability to make principled architectural decisions that generalize beyond any test remains an irreplaceably human skill, and it is the core of what we cultivate.

[Big News] Our lab has been awarded over $6 million USD for the Global AI Frontier Lab between Korea and NYU. I am the director of this international program, which includes 8 additional professors from KAIST and Sungkyunkwan University. Many of our students will be dispatched to NYU each year to collaborate with world-leading researchers.

We work at the intersection of data systems and natural language processing (NLP), with a team spanning systems researchers, AI scientists and AI engineers.

● Enterprise Super Agents & Multi-Modal RAG
We build Enterprise Super Agents—autonomous, data-centric systems powered by Knowledge Graphs and advanced Multi-Modal RAG. Our architecture seamlessly integrates text, tabular, and visual intelligence to move beyond simple search into complex reasoning.
• Short Term: Deployment of high-velocity, high-trust RAG stacks and advanced analytic engines that deliver verifiable, multi-modal insights in real-time.
• Long Term: A cost-efficient, data-centric platform designed for scale, evolving toward AGI-level reasoning and agentic autonomy across the enterprise ecosystem.

● AI Database & Semantic Predicates
We design and implement AI-native databases that treat semantic predicates, vector search, and knowledge-graph reasoning as first-class query operators across text, tabular, time-series, image, and log data, while addressing ambiguity, reliability, and cost-modeling challenges.
Short term: robust semantic filtering and multi-modal query processing with clear performance guarantees and explainable behavior.
Long term: a self-optimizing, AI-native data platform where learned operators, RAG, and agents are deeply integrated into storage, indexing, and query processing on the path toward AGI-level data systems.

● Next-Gen LLM Inference & Reasoning Optimization
We are launching a new research thrust dedicated to LLM inference optimization in the Reasoning Era. As AI shifts from simple text generation to complex, multi-step reasoning, we focus on breaking the computational bottlenecks of high-latency and resource-heavy agentic workflows.

● Embodied AI & Robot Manipulation
We are advancing Robot Foundation Models toward adaptive, human-inspired intelligence. In the short term, we focus on personalizing mobile manipulation, developing predictive world models for closed-loop planning, and solving critical VLA gaps like long-horizon drift and contact precision. In the long term, we aim to build a General Robot Brain—a system combining VLAs, agentic planning, and semantic memory to enable reliable, long-horizon reasoning and failure recovery in any unstructured real-world environment.

● Natural Language Interfaces to Data
We’re building conversational database interfaces, letting users query and reason over data using natural language—marrying robust data-system backends with cutting-edge NLP.

● Self-Optimizing Data Systems
Our systems auto-adapt to workload and data distribution, delivering top performance without manual tuning, via AI-driven optimization and continuous feedback.

News

May 1, 2026

Two papers—one focusing on robotics and the other on agentic Retrieval-Augmented Generation (RAG)—have been accepted to ICML 2026.

April 29, 2026

Prof. Han delivered a keynote speech titled 'The Long Journey from Subgraphs to Autonomous Agents: Evolution of Graph Intelligence' at both DASFAA 2026 and the 2026 Data Intelligence Workshop.

April 27, 2026

Meet TurboLynx: The world's fastest embedded graph DBMS. Explore our features and get started today at https://turbolynx.io.

February 25, 2026

Five students have secured overseas internships and joint research positions at Google (Systems Research Group), Microsoft, Snowflake, UIUC, and NYU for this semester.

February 24, 2026

One paper is accepted to ICDE.

February 23, 2026

We heartily welcome Sunho Cha to our lab. He graduated with the highest distinction from POSTECH.

January 26, 2026

One paper is accepted to ICLR 2026. When it comes to benchmarking complex QA for text and tables, SPARTA is unrivaled.

January 16, 2026

A paper about our new graph analytics engine TurboLynx is accepted to VLDB 2026. This is a ultra-fast analytics engine for schemaless graphs and tables!

December 17, 2025

Professor Han was selected as a general member of the National Academy of Engineering (NAEK) in 2025.

... see all News

Join us

연구참여 · Research Internship

We are currently hiring interns (연구 참여학생 모집). An amazing opportunity for undergrads to experience world-class Big Data research and prepare for graduate school.

You could be a rising-STAR of the future.

자세히 보기
Collaborators
Oracle SAP Samsung