- KAIST | AI College
- Yonsei University | College of Computing
[Lectures]
- Python 기초
- if 조건문, while 반복문
- 함수, 자료형, 변수, 라이브러리
- 지역 변수, 전역 변수, 리스트, 튜플
- 리스트, 튜플, 문자열, 사전, 집합
- 문자열, 집합, 사전, 모듈, 그래픽 객체
- 사전형, 집합, 텍스트, 이미지, 그래픽
- 텍스트, 이미지, 그래픽, 클래스
- 전체 복습 (4/24 ~ 5/7)
- 실습문제 풀이 (4/24 ~ 5/7)
[Practices]
Data Processing, Data Analysis, Data Visualization
[Session 1.] w/Professor Hwang
- W3 - 9/16
- [CVPR 2025] RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models (note) (link)
- [CVPR 2024] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks (note) (link)
- [CVPR 2025] Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution (note) (link)
- [ICASSP 2023] Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation (note) (link)
- W4 - 9/23
- [CVPR 2025] LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning (note) (link)
- [CVPR 2025] The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models? (note) (link)
- [CVPR 2025] Cross-modal Information Flow in Multimodal Large Language Models (link)
- [CVPR 2025] Tackling View-Dependent Semantics in 3D Language Gaussian Splatting (note) (link)
- W5 - 9/30
- W7 - 10/14
- [ICLR 2025] Reducing Hallucinations in Large Vision Language Models via Latent Space Steering (link)
- [CVPR 2025] MBQ: Modality-Balanced Quantization for Large Vision-Language Models (link)
- [IEEE Transactions on Pattern Analysis and Machine Intelligence] Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts (link)
- W7 - 10/15
- W9 - 10/28
- [CVPR 2025] LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models (link)
- [NSDI 2024] DistMM: Accelerating Distributed Multimodal Model Training (link)
- [Microsoft 2025] ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving (link)
- [SIGCOMM 2024] NetLLM: Adapting Large Language Models for Networking (note) (link)
[Session 2.] w/Professor Park
- W10 - 11/4
- [ICLR 2024] Uni3D: Exploring Unified 3D Representation at Scale (link)
- [ACL 2025] Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues (link)
- [NeurIPS 2025] The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense (link)
- [CVPR 2024] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation (link)
- W11 - 11/11
- [ICCV 2025] TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers (link)
- [ICML 2025] Diving into Self-Evolving Training for Multimodal Reasoning (link)
- [NeurIPS 2025] GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents (link)
- [NeurIPS 2024] HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data (link)
- W12 - 11/18
- [arXiv 2024] Can AI Perceive Physical Danger and Intervene? (link)
- [ECCV 2024] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (link)
- [ECCV 2024] On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models (link)
- [ICLR 2025] LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (link)
- W13 - 11/25
- [CHI 2024] OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs (link)
- [EMNLP 2025] ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering (link)
- [arXiv 2025] Multimodal Safety Evaluation in Generative Agent Social Simulations (link)
- [CVPR 2025] Multi-subject Open-set Personalization in Video Generation (link)
- W14 - 12/2
- [RSS 2025] π_0: A Vision-Language-Action Flow Model for General Robot Control (link)
- [NeurIPS 2025] PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding (link)
- [ICLR 2023] Uni-Mol: A Universal 3D Molecular Representation Learning Framework (link)
- [arXiv 2024] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models (link)
Candidate Award-Winning Papers
[Useful References]
- Lectures of other univ., e.g., Language Models (https://stanford-cs324.github.io)
- Blog by other researchers, e.g., in OpenAI (https://lilianweng.github.io)
- E-mail subscription for recently issued papers (https://nlp.elvissaravia.com)
[Session 1] w/Professor Yeo
- W1 - 9/4
- W2 - 9/11
- [ACL 2025] LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts (note) (link)
- [ACL 2025] Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models (note) (link)
- ✅[ACL 2025] Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral (note) (link)
- W3 - 9/18
- ✅[ACL 2025] From REAL to SYNTHETIC: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding (note) (link)
- ✅[EMNLP 2024] Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting (note) (link)
- [ACL 2025] Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users (note) (link)
- ✅[ACL 2024] Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration (note) (link)
- ✅[ICLR 2025] Safety Alignment Should be Made More Than Just a Few Tokens Deep (note) (link)
- ✅[ACL 2024] Having Beer after Prayer? Measuring Cultural Bias in Large Language Models (note) (link)
- ✅[ACL 2025] Mixtures of In-Context Learners (note) (link)
- ✅[ACL 2025] Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively (note) (link)
- W4 - 9/25
- [ACL 2025] A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive (note) (link)
- ✅[ACL 2024] Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models (note) (link)
- ✅[ACL 2025] Do LLMs Understand Dialogues? A Case Study on Dialogue Act (note) (link)
- ✅[ACL 2024] Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (note) (link)
- ✅[ACL 2025] Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs (note) (link)
- ✅[ACL 2024] Mission: Impossible Language Models (note) (link)
- ✅[ACL 2025] HALoGEN: Fantastic LLM Hallucinations and Where to find them (note) (link)
- ✅[ACL 2025] FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation (note) (link)
- W5 - 10/2 (Online)
- [ACL 2025] Byte Latent Transformer: Patches Scale Better Than Tokens (note) (link)
- ✅[ACL 2025] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling (note) (link)
- [ACL 2025] Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability (note) (link)
- [ACL 2025] Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs (note) (link)
- [EMNLP 2024] Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge (note) (link)
- [ACL 2025] Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention (note) (link)
- [ACL 2025] Language Models Resist Alignment: Evidence From Data Compression (note) (link)
- W6 - 10/9 (Online)
- [ACL 2025] A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens (note) (link)
- [EMNLP 2024] A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models (note) (link)
- ✅[ACL 2025] BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages (note) (link)
- [ACL 2025] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (note) (link)
- ✅[ACL 2025] Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems (note) (link)
- [ACL 2025] Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details? (note) (link)
- [EMNLP 2024] Towards Robust Speech Representation Learning for Thousands of Languages (note) (link)
- [ACL 2025] Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models (note) (link)
- ✅[ACL 2024] L-Eval: Instituting Standardized Evaluation for Long Context Language Models (note) (link)
- [ACL 2025] Steering off Course: Reliability Challenges in Steering Language Models (note) (link)
- [ACL 2024] Steering Llama 2 via Contrastive Activation Addition (note) (link)
[Session 2] w/Professor Lee
- W7 - 10/16 - Benchmarking & Evaluation
- ✅[ACL 2025] YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering (link)
- ✅[ACL 2025] CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges (link)
- ✅[ICLR 2025] Beyond Scalar Reward Model: Learning Generative Judge from Preference Data (link)
- ✅[ACL 2025] M-RewardBench: Evaluating Reward Models in Multilingual Settings (link)
- ✅[COLM 2025] Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (link)
- W8 - 10/23 - Reasoning Enhancement
- ✅[ICLR 2025] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
- ✅[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction (link)
- ✅[NeurIPS 2025] Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning (link)
- ✅[arXiv 2025] Less is More: Recursive Reasoning with Tiny Networks (link)
- ✅[NeurIPS 2025] ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs (link)
- ✅[arXiv 2025] Learning to Reason without External Rewards (link)
- ✅[arXiv 2025] Deep Think with Confidence (link)
- W9 - 10/30 - Information Retrieval & Text Mining
- ✅[arXiv 2025] BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search (link)
- ✅[COLM 2025] DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning (link)
- ✅[arXiv 2025] On the Theoretical Limitations of Embedding-Based Retrieval (link)
- ✅[COLM 2025] EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline (link)
- ✅[arXiv 2025] FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS (note) (link)
- ✅MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
- ✅[arXiv 2025] Towards Better Instruction Following Retrieval Models (link)
- W10 - 11/6 (Online) - Decision Making & Insight Generation
- ✅[NeurIPS 2025] Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop (link)
- [NeurIPS 2024] From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection
- ✅[ACL 2025] INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent (link)
- ✅[ACL 2025] LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection (link)
- ✅[NeurIPS 2024] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making (link)
- ✅[ICDM 2025] MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents (link)
- ✅[NeurIPS 2025] s3: You Don't Need That Much Data to Train a Search Agent via RL (link)
- ✅[ICLR 2025] DeLLMa: Decision Making Under Uncertainty with Large Language Models (link)
- W11 - 11/13 - Dialogue & Interactive System
- ✅[ACL 2025] In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents (link)
- ✅[NAACL 2025] Hello Again! LLM-powered Personalized Agent for Long-term Dialogue (link)
- ✅[COLM 2025] Don’t Lie to Your Friends: Learning What You Know from Collaborative Self-Play (link)
- ✅[NAACL 2025] Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In (link)
- ✅[NAACL 2025] AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents (link)
- ✅[ACL 2025] Caution for the Environment: LLM Agents are Susceptible to Environmental Distractions (link)
- ✅[ICLR 2025] Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence (link)
[Session 3] w/Professor Kim
- W12 - 11/20
- [NeurIPS 2025] CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding (link)
- [NeurIPS 2024] RHO-1: Not All Tokens Are What You Need (link)
- [ICML 2024] Debating with More Persuasive LLMs Leads to More Truthful Answers (link)
- [ICLR 2025] A Probabilistic Perspective on Unlearning and Alignment for Large Language Models (link)
- [ICML 2025] Medical Large Language Model Benchmarks Should Prioritize Construct Validity (link)
- W13 - 11/27
- [ICML 2025] COLLABLLM: From Passive Responders to Active Collaborators (link)
- [ICLR 2025] Your Mixture-Of-Experts LLM is Secretly An Embedding Model For Free (link)
- [ICLR 2025] Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering (link)
- [NeurIPS 2025] Large Language Diffusion Models (link)
- [ICLR 2025] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models (link)
- [ICLR 2025] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents (link)
- W14 - 12/4
- [ICML 2025] SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? (link)
- [ICLR 2025] HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models (link)
- [NeurIPS 2024] LLM Evaluators Recognize and Favor Their Own Generations (link)
- [ICLR 2025] Self-Play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models (link)
- [NeurIPS 2023] Jailbroken: How Does LLM Safety Training Fail? (link)
- [NeurIPS 2025] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) (link)
- [ICML 2023] A Watermark for Large Language Models (link)
- W15 - 12/11
- [NeurIPS 2025] Reverse Engineering Human Preferences with Reinforcement Learning (link)
- [ICLR 2025] MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions (note) (link)
- [NeurIPS 2024] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (link)
- [ICML 2025] AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models (link)
- [ICML 2025] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (link)
- [NeurIPS 2024] Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts (link)
- [ICML 2024] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (link)
- [ICLR 2025] PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding (link)
- [ICLR 2025] Do as We Do, Not as You Think: the Conformity of Large Language Models (link)
- W16 - 12/18
- [ICLR 2025] On the Role of Attention Heads in Large Language Model Safety (link)
- [ICLR 2025] From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions (link)
- [NeurIPS 2025] Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (link)
- [ICML 2025] Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition (link)
- [ICML 2025] Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems (link)
- [ICML 2025] Inference Scaling for Long-Context Retrieval Augmented Generation (link)
- [ICLR 2024] Frozen Transformers in Language Models Are Effective Visual Encoder Layers (link)
- [NeurIPS 2025] OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts (link)
[Session 1] w/Professor Jeong
- W2 - 3/10
- W3 - 3/17
- W4 - 3/24
- W5 - 3/31
- [MLSys 2025] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving (link)
- [FAST 2026] SolidAttention: Low-Latency SSD-based Serving on Memory-Constrained PCs (link)
- [ASPLOS 2025] MetaSapiens: Real-Time Neural Rendering with Efficiency-Aware Pruning and Accelerated Foveated Rendering (link)
[Session 2] w/Professor Lee
- W6 - 4/7
- [ICLR 2026] ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models (link)
- [SIGCOMM 2025] DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models (link)
- [NDSS 2026] DNN Latency Sequencing: Extracting DNN Architectures from Intel SGX Enclaves with Single-Stepping Attacks (link)
- W7 - 4/14
- [ASPLOS 2025] OS2G: A High-Performance DPU Offloading Architecture for GPU-based Deep Learning with Object Storage (link)
- [MLSys 2025] ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments (note) (link)
- [EuroSys 2025] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion (note) (link)
[Session 3] w/Professor Song