Skip to content

Latest commit

 

History

History
275 lines (244 loc) · 26.5 KB

File metadata and controls

275 lines (244 loc) · 26.5 KB

🎓AI 강의노트

  • KAIST | AI College
  • Yonsei University | College of Computing

1. [Basic] Python Programming

[Lectures]

[Practices]

2. [Basic] Python DP / DA / DV

Data Processing, Data Analysis, Data Visualization

3. [Advanced] Recent Advances in Multimodal Deep Learning

[Session 1.] w/Professor Hwang

  • W3 - 9/16
    • [CVPR 2025] RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models (note) (link)
    • [CVPR 2024] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks (note) (link)
    • [CVPR 2025] Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution (note) (link)
    • [ICASSP 2023] Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation (note) (link)
  • W4 - 9/23
    • [CVPR 2025] LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning (note) (link)
    • [CVPR 2025] The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models? (note) (link)
    • [CVPR 2025] Cross-modal Information Flow in Multimodal Large Language Models (link)
    • [CVPR 2025] Tackling View-Dependent Semantics in 3D Language Gaussian Splatting (note) (link)
  • W5 - 9/30
    • [ICLR 2023] Contrastive Audio-Visual Masked Autoencoder (note) (link)
    • [CVPR 2025] Towards Zero-shot Anomaly Detection and Reasoning with Multimodal Large Language Models (link)
    • [CVPR 2025] Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration (link)
  • W7 - 10/14
    • [ICLR 2025] Reducing Hallucinations in Large Vision Language Models via Latent Space Steering (link)
    • [CVPR 2025] MBQ: Modality-Balanced Quantization for Large Vision-Language Models (link)
    • [IEEE Transactions on Pattern Analysis and Machine Intelligence] Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts (link)
  • W7 - 10/15
    • [arXiv 2023] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (link)
    • [ICCV 2025] V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models (link)
  • W9 - 10/28
    • [CVPR 2025] LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models (link)
    • [NSDI 2024] DistMM: Accelerating Distributed Multimodal Model Training (link)
    • [Microsoft 2025] ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving (link)
    • [SIGCOMM 2024] NetLLM: Adapting Large Language Models for Networking (note) (link)

[Session 2.] w/Professor Park

  • W10 - 11/4
    • [ICLR 2024] Uni3D: Exploring Unified 3D Representation at Scale (link)
    • [ACL 2025] Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues (link)
    • [NeurIPS 2025] The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense (link)
    • [CVPR 2024] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation (link)
  • W11 - 11/11
    • [ICCV 2025] TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers (link)
    • [ICML 2025] Diving into Self-Evolving Training for Multimodal Reasoning (link)
    • [NeurIPS 2025] GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents (link)
    • [NeurIPS 2024] HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data (link)
  • W12 - 11/18
    • [arXiv 2024] Can AI Perceive Physical Danger and Intervene? (link)
    • [ECCV 2024] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (link)
    • [ECCV 2024] On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models (link)
    • [ICLR 2025] LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (link)
  • W13 - 11/25
    • [CHI 2024] OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs (link)
    • [EMNLP 2025] ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering (link)
    • [arXiv 2025] Multimodal Safety Evaluation in Generative Agent Social Simulations (link)
    • [CVPR 2025] Multi-subject Open-set Personalization in Video Generation (link)
  • W14 - 12/2
    • [RSS 2025] π_0: A Vision-Language-Action Flow Model for General Robot Control (link)
    • [NeurIPS 2025] PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding (link)
    • [ICLR 2023] Uni-Mol: A Universal 3D Molecular Representation Learning Framework (link)
    • [arXiv 2024] MoE-LLaVA: Mixture of Experts for Large Vision-Language Models (link)

4. [Advanced] Recent Advances in LLMs

Candidate Award-Winning Papers

[Useful References]

[Session 1] w/Professor Yeo

  • W1 - 9/4
  • W2 - 9/11
    • [ACL 2025] LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts (note) (link)
    • [ACL 2025] Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models (note) (link)
    • ✅[ACL 2025] Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral (note) (link)
  • W3 - 9/18
    • ✅[ACL 2025] From REAL to SYNTHETIC: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding (note) (link)
    • ✅[EMNLP 2024] Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting (note) (link)
    • [ACL 2025] Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users (note) (link)
    • ✅[ACL 2024] Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration (note) (link)
    • ✅[ICLR 2025] Safety Alignment Should be Made More Than Just a Few Tokens Deep (note) (link)
    • ✅[ACL 2024] Having Beer after Prayer? Measuring Cultural Bias in Large Language Models (note) (link)
    • ✅[ACL 2025] Mixtures of In-Context Learners (note) (link)
    • ✅[ACL 2025] Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively (note) (link)
  • W4 - 9/25
    • [ACL 2025] A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive (note) (link)
    • ✅[ACL 2024] Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models (note) (link)
    • ✅[ACL 2025] Do LLMs Understand Dialogues? A Case Study on Dialogue Act (note) (link)
    • ✅[ACL 2024] Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (note) (link)
    • ✅[ACL 2025] Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs (note) (link)
    • ✅[ACL 2024] Mission: Impossible Language Models (note) (link)
    • ✅[ACL 2025] HALoGEN: Fantastic LLM Hallucinations and Where to find them (note) (link)
    • ✅[ACL 2025] FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation (note) (link)
  • W5 - 10/2 (Online)
    • [ACL 2025] Byte Latent Transformer: Patches Scale Better Than Tokens (note) (link)
    • ✅[ACL 2025] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling (note) (link)
    • [ACL 2025] Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability (note) (link)
    • [ACL 2025] Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs (note) (link)
    • [EMNLP 2024] Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge (note) (link)
    • [ACL 2025] Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention (note) (link)
    • [ACL 2025] Language Models Resist Alignment: Evidence From Data Compression (note) (link)
  • W6 - 10/9 (Online)
    • [ACL 2025] A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens (note) (link)
    • [EMNLP 2024] A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models (note) (link)
    • ✅[ACL 2025] BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages (note) (link)
    • [ACL 2025] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (note) (link)
    • ✅[ACL 2025] Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems (note) (link)
    • [ACL 2025] Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details? (note) (link)
    • [EMNLP 2024] Towards Robust Speech Representation Learning for Thousands of Languages (note) (link)
    • [ACL 2025] Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models (note) (link)
    • ✅[ACL 2024] L-Eval: Instituting Standardized Evaluation for Long Context Language Models (note) (link)
    • [ACL 2025] Steering off Course: Reliability Challenges in Steering Language Models (note) (link)
    • [ACL 2024] Steering Llama 2 via Contrastive Activation Addition (note) (link)

[Session 2] w/Professor Lee

  • W7 - 10/16 - Benchmarking & Evaluation
    • ✅[ACL 2025] YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering (link)
    • ✅[ACL 2025] CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges (link)
    • ✅[ICLR 2025] Beyond Scalar Reward Model: Learning Generative Judge from Preference Data (link)
    • ✅[ACL 2025] M-RewardBench: Evaluating Reward Models in Multilingual Settings (link)
    • ✅[COLM 2025] Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (link)
  • W8 - 10/23 - Reasoning Enhancement
    • ✅[ICLR 2025] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
    • ✅[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction (link)
    • ✅[NeurIPS 2025] Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning (link)
    • ✅[arXiv 2025] Less is More: Recursive Reasoning with Tiny Networks (link)
    • ✅[NeurIPS 2025] ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs (link)
    • ✅[arXiv 2025] Learning to Reason without External Rewards (link)
    • ✅[arXiv 2025] Deep Think with Confidence (link)
  • W9 - 10/30 - Information Retrieval & Text Mining
    • ✅[arXiv 2025] BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search (link)
    • ✅[COLM 2025] DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning (link)
    • ✅[arXiv 2025] On the Theoretical Limitations of Embedding-Based Retrieval (link)
    • ✅[COLM 2025] EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline (link)
    • ✅[arXiv 2025] FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS (note) (link)
    • ✅MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
    • ✅[arXiv 2025] Towards Better Instruction Following Retrieval Models (link)
  • W10 - 11/6 (Online) - Decision Making & Insight Generation
    • ✅[NeurIPS 2025] Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop (link)
    • [NeurIPS 2024] From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection
    • ✅[ACL 2025] INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent (link)
    • ✅[ACL 2025] LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection (link)
    • ✅[NeurIPS 2024] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making (link)
    • ✅[ICDM 2025] MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents (link)
    • ✅[NeurIPS 2025] s3: You Don't Need That Much Data to Train a Search Agent via RL (link)
    • ✅[ICLR 2025] DeLLMa: Decision Making Under Uncertainty with Large Language Models (link)
  • W11 - 11/13 - Dialogue & Interactive System
    • ✅[ACL 2025] In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents (link)
    • ✅[NAACL 2025] Hello Again! LLM-powered Personalized Agent for Long-term Dialogue (link)
    • ✅[COLM 2025] Don’t Lie to Your Friends: Learning What You Know from Collaborative Self-Play (link)
    • ✅[NAACL 2025] Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In (link)
    • ✅[NAACL 2025] AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents (link)
    • ✅[ACL 2025] Caution for the Environment: LLM Agents are Susceptible to Environmental Distractions (link)
    • ✅[ICLR 2025] Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence (link)

[Session 3] w/Professor Kim

  • W12 - 11/20
    • [NeurIPS 2025] CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding (link)
    • [NeurIPS 2024] RHO-1: Not All Tokens Are What You Need (link)
    • [ICML 2024] Debating with More Persuasive LLMs Leads to More Truthful Answers (link)
    • [ICLR 2025] A Probabilistic Perspective on Unlearning and Alignment for Large Language Models (link)
    • [ICML 2025] Medical Large Language Model Benchmarks Should Prioritize Construct Validity (link)
  • W13 - 11/27
    • [ICML 2025] COLLABLLM: From Passive Responders to Active Collaborators (link)
    • [ICLR 2025] Your Mixture-Of-Experts LLM is Secretly An Embedding Model For Free (link)
    • [ICLR 2025] Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering (link)
    • [NeurIPS 2025] Large Language Diffusion Models (link)
    • [ICLR 2025] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models (link)
    • [ICLR 2025] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents (link)
  • W14 - 12/4
    • [ICML 2025] SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? (link)
    • [ICLR 2025] HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models (link)
    • [NeurIPS 2024] LLM Evaluators Recognize and Favor Their Own Generations (link)
    • [ICLR 2025] Self-Play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models (link)
    • [NeurIPS 2023] Jailbroken: How Does LLM Safety Training Fail? (link)
    • [NeurIPS 2025] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) (link)
    • [ICML 2023] A Watermark for Large Language Models (link)
  • W15 - 12/11
    • [NeurIPS 2025] Reverse Engineering Human Preferences with Reinforcement Learning (link)
    • [ICLR 2025] MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions (note) (link)
    • [NeurIPS 2024] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (link)
    • [ICML 2025] AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models (link)
    • [ICML 2025] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (link)
    • [NeurIPS 2024] Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts (link)
    • [ICML 2024] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (link)
    • [ICLR 2025] PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding (link)
    • [ICLR 2025] Do as We Do, Not as You Think: the Conformity of Large Language Models (link)
  • W16 - 12/18
    • [ICLR 2025] On the Role of Attention Heads in Large Language Model Safety (link)
    • [ICLR 2025] From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions (link)
    • [NeurIPS 2025] Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (link)
    • [ICML 2025] Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition (link)
    • [ICML 2025] Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems (link)
    • [ICML 2025] Inference Scaling for Long-Context Retrieval Augmented Generation (link)
    • [ICLR 2024] Frozen Transformers in Language Models Are Effective Visual Encoder Layers (link)
    • [NeurIPS 2025] OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts (link)

5. [Advanced] Recent Advances in AI Systems

[Session 1] w/Professor Jeong

  • W2 - 3/10
    • [SOSP 2025] KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models (link)
    • [OSDI 2025] NEUTRINO: Fine-grained GPU Kernel Profiling via Programmable Probing (link)
  • W3 - 3/17
    • [NeurIPS 2025] AttentionPredictor: Temporal Patterns Matter for KV Cache Compression (link)
    • [MLSys 2025] ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation (link)
  • W4 - 3/24
    • [ICLR 2025] SCoRe: Training Language Models to Self-Correct via Reinforcement Learning (link)
    • [ASPLOS 2025] POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference (link)
    • [SOSP 2025] SAND: A New Programming Abstraction for Video-based Deep Learning (link)
  • W5 - 3/31
    • [MLSys 2025] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving (link)
    • [FAST 2026] SolidAttention: Low-Latency SSD-based Serving on Memory-Constrained PCs (link)
    • [ASPLOS 2025] MetaSapiens: Real-Time Neural Rendering with Efficiency-Aware Pruning and Accelerated Foveated Rendering (link)

[Session 2] w/Professor Lee

  • W6 - 4/7
    • [ICLR 2026] ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models (link)
    • [SIGCOMM 2025] DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models (link)
    • [NDSS 2026] DNN Latency Sequencing: Extracting DNN Architectures from Intel SGX Enclaves with Single-Stepping Attacks (link)
  • W7 - 4/14
    • [ASPLOS 2025] OS2G: A High-Performance DPU Offloading Architecture for GPU-based Deep Learning with Object Storage (link)
    • [MLSys 2025] ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments (note) (link)
    • [EuroSys 2025] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion (note) (link)

[Session 3] w/Professor Song

  • W8 - 4/21
    • [ICCV 2025] Beyond Text-Visual Attention Exploiting Visual Cues for Effective Token Pruning in VLMs (link)
    • [ICML 2025] Targeted Unlearning with Single Layer Unlearning Gradient (link)
    • [NeurIPS 2025] InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation (link)