PDF RAG

A Python package for building RAG (Retrieval-Augmented Generation) applications using PDFs, ChromaDB, and Ollama.

Project Structure

.
├── pdf_rag
│   ├── document_processor.py
│   ├── __init__.py
│   ├── llm_interface.py
│   ├── main.py
│   └── vector_store.py
├── README.md
├── requirements.txt
├── setup.py
├── test_package.py
└── test.py

2 directories, 10 files

Installation

Create and activate a virtual environment:

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows
venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activate

Install the package:
```
pip install -e .
```

Install Ollama on Linux:

Follow the steps below to install Ollama on a Linux system.

# Download the Ollama installer
curl -fsSL https://ollama.com/install.sh | sh

# Verify the installation
ollama --version

Download models in Ollama:

To download specific models such as llama3 and deepseek-R1, use the following commands:

# Download the llama3 model
ollama pull llama3

# Download the deepseek-R1 model
ollama pull deepseek-R1

base Usage

from pdf_rag import PDFRAGApplication

# Initialize the application
rag = PDFRAGApplication(model_name= "deepseek-r1")

# Load a PDF
rag.load_pdf("your_document.pdf")

# Query the system
response = rag.query("What is this document about?")
print(response)

Testing

Run the test.py script to see how the module works with ChromaDB:

import chromadb
chroma_client = chromadb.Client()

# switch `create_collection` to `get_or_create_collection` to avoid creating a new collection every time
collection = chroma_client.get_or_create_collection(name="my_collection")

# switch `add` to `upsert` to avoid adding the same documents every time
collection.upsert(
    documents=[
        "This is a document about pineapple",
        "This is a document about oranges"
    ],
    ids=["id1", "id2"]
)

results = collection.query(
    query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
    n_results=2 # how many results to return
)

print(results)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF RAG

Project Structure

Installation

base Usage

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pdf_rag		pdf_rag
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
test_package.py		test_package.py

Folders and files

Latest commit

History

Repository files navigation

PDF RAG

Project Structure

Installation

base Usage

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages