Exploration of structurally and semantically aware chunking strategies for Retrieval-Augmented Generation (RAG) systems.
- Naive sentence chunking
- Semantic valley chunking
- Spectral graph segmentation
- Heat kernel segmentation
- Information bottleneck chunking
Evaluate how chunking strategies affect retrieval quality in RAG pipelines.
- Python
- Sentence Transformers
- NumPy
- HuggingFace datasets