💡 In the world of data science, tables are one of the most fundamental data structures. Understanding and extracting meaningful insights from tabular data is crucial across various domains such as finance, healthcare, and marketing. This repository aims to be a comprehensive collection of resources, research papers, tools, and tutorials focused on Table Understanding.
✨ Awesome-Table-Understanding is a curated list of resources dedicated to the field of Table Understanding.
🔥 This project is currently under development. Feel free to ⭐ (STAR) and 🔭 (WATCH) it to stay updated on the latest developments.
If you notice any missing papers from the list, please feel free to email me or submit a pull request. I will gladly add it! Additionally, if you find any mis-categorized items, please let me know.
-
[DEB] Large language models for data discovery and integration: Challenges and opportunities. [paper] ⭐ [Must Read]
-
[SIGMOD'23] Table Discovery in Data Lakes: State-of-the-art and Future Direction
-
[ACL'23] Transformers for Tabular Data Representation: A Survey of Models and Applications.
-
[Arxiv'26] LakeHopper: Cross Data Lakes Column Type Annotation through Model Adaptation. [paper]
-
[Arxiv'26] CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning. [paper]
-
[Arxiv'26] MMFCTUB: Multi-Modal Financial Credit Table Understanding Benchmark. [paper]
-
[Arxiv'26] HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery. [paper]
-
[VLDB'25] Cents: A Flexible and Cost-Effective Framework for LLM-Based Table Understanding. [paper]
-
[SIGMOD'25] Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System. [paper]
-
[SIGMOD'26] Retrieve-and-Verify: A Table Context Selection Framework for Accurate Column Annotations. [paper]
-
[SIGMOD'25] A Quantum-Leap into Schema Matching: Beyond 1-to-1 Matchings. [paper]
-
[SIGMOD'25] Progressive Entity Matching: A Design Space Exploration. [paper]
-
[SIGMOD'26] 3dSAGER: Geospatial Entity Resolution over 3D Objects. [paper]
-
[VLDB'25] Birdie: Natural Language-Driven Table Discovery Using Differentiate Search Index. [paper]
-
[VLDB'25] TableCopilot: A Table Assistant Empowered by Natural Language Conditional Table Discovery. [paper]
-
[VLDB'25] Magneto: Combining Small and Large Language Models for Schema Matching. [paper]
-
[VLDB'25] OmniMatch: Joinability Discovery in Data Products. [paper]
-
[VLDB'25] LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data Lakes. [paper]
-
[VLDB'25] Data Discovery in Data Lakes: Operations, Indexes, Systems. [paper]
-
[VLDB'25] Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables. [paper]
-
[VLDB'25] TabulaX: Leveraging Large Language Models for Multi-Class Table Transformations. [paper]
-
[ICDE'25] TabSketchFM: Sketch-Based Tabular Representation Learning for Data Discovery Over Data Lakes. [paper]
-
[ICDE'25] LineageX: A Column Lineage Extraction System for SQL. [paper]
-
[ICDE'25] MISS: An Incomplete Tabular Data Representation System with Missing Mechanism Learning. [paper]
-
[ICDE'25] Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey (Extended Abstract). [paper]
-
[CVPR'25] SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis. [paper]
-
[AAAI'25] HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding. [paper]
-
[WWW'25] 2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models. [paper]
-
[CIKM'25] TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models. [paper]
-
[SIGIR'25] NLCTables: A Dataset for Marrying Natural Language Conditions with Table Discovery. [paper] [arxiv]
-
[ICDE'25] LIFTus: An Adaptive Multi-Aspect Column Representation Learning for Table Union Search. [paper]
-
[ICDE'25] Joinable Search Over Multi-Source Spatial Datasets: Overlap, Coverage, and Efficiency. [paper]
-
[TKDE'25] Snoopy: Effective and Efficient Semantic Join Discovery via Proxy Columns. [paper]
-
[WWW'25 Companion] Evaluating Column Type Annotation Models and Benchmarks. [paper]
-
[Arxiv'25] An Efficient Proximity Graph-based Approach to Table Union Search. [paper]
-
[Arxiv'25] EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes. [paper]
-
[Arxiv'25] Robust LLM-based Column Type Annotation via Prompt Augmentation with LoRA Tuning. [paper]
-
[Arxiv'25] Something's Fishy In The Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks. [paper]
-
[Arxiv'25] MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark. [paper]
-
[Arxiv'25] Table-R1: Region-based Reinforcement Learning for Table Understanding. [paper]
-
[Arxiv'25] TableMaster: A Recipe to Advance Table Understanding with Language Models. [paper]
-
[TWEB'24] DaCo: Matching Tabular Data to Knowledge Graph with Effective Core Column Set Discovery.
-
[Arxiv'24] HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding.
-
[VLDB'24] ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models. [code]
-
[VLDB'24] Observatory: Characterizing Embeddings of Relational Tables. [code] ⭐ [Must Read]
-
[VLDB'24] Chorus: Foundation Models for Unified Data Discovery and Exploration.
-
[VLDB'24 TaDA] ALT-GEN: Benchmarking Table Union Search using Large Language Models.
-
[NAACL'24] TableLlama: Towards Open Large Generalist Models for Tables.
-
[ICDE'24] KGLink: A column type annotation method that combines knowledge graph and pre-trained language model. [code]
-
[SIGMOD'24] Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation. [code]
-
[SIGMOD'24] Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks. [code]
-
[Arxiv'23] AdaTyper: Adaptive Semantic Column Type Detection.
-
[NIPS'23] HYTREL: Hypergraph-enhanced Tabular Data Representation Learning. [code]
-
[VLDB'23] RECA: Related Tables Enhanced Column Semantic Type Annotation Framework. [code]
-
[VLDB'23] DeepJoin: Joinable Table Discovery with Pre-trained Language Models. [code]
-
[VLDB'23] Starmie: Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning. [code]
-
[SIGMOD'23] Steered Training Data Generation for Learned Semantic Type Detection.
-
[SIGMOD'23] SANTOS: Relationship-based Semantic Table Union Search. [code]
-
[ICDE'23] Towards Explainable Table Interpretation Using Multi-view Explanations. [code]
-
[ACL'23 Findings] Automatic Table Union Search with Tabular Representation Learning.
-
[SIGMOD'22] DODUO: Annotating Columns with Pre-trained Language Models. [code] ⭐ [Must Read]
-
[NAACL'21] TABBIE: Pretrained Representations of Tabular Data. [code]
-
[WWW'21] TCN: Table Convolutional Network for Web Table Interpretation.
-
[VLDB'20] Sato: contextual semantic type detection in tables. [code] ⭐ [Must Read]
-
[VLDB'20] TURL: Table Understanding through Representation Learning. [code] ⭐ [Must Read]
-
[KDD'19] Sherlock: A deep learning approach to semantic data type detection.
-
WikiTables-TURL: Table Understanding through Representation Learning. [website][🤗 Hugging Face]
-
GitTables: A Large-Scale Corpus of Relational Tables. [paper] [website][🤗 Hugging Face]
-
SOTAB: Web Data Commons - Schema.org Table Annotation Benchmark. [website][🤗 Hugging Face]
- 2nd International Workshop on Tabular Data Analysis (TaDA) [website]