Skip to content

abbasi0abolfazl/Rag

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF RAG

A Python package for building RAG (Retrieval-Augmented Generation) applications using PDFs, ChromaDB, and Ollama.

Project Structure

.
├── pdf_rag
│   ├── document_processor.py
│   ├── __init__.py
│   ├── llm_interface.py
│   ├── main.py
│   └── vector_store.py
├── README.md
├── requirements.txt
├── setup.py
├── test_package.py
└── test.py

2 directories, 10 files

Installation

  1. Create and activate a virtual environment:

    # Create a virtual environment
    python -m venv venv
    
    # Activate the virtual environment
    # On Windows
    venv\Scripts\activate
    # On Unix or MacOS
    source venv/bin/activate
  2. Install the package:

    pip install -e .
  3. Install Ollama on Linux:

    Follow the steps below to install Ollama on a Linux system.

    # Download the Ollama installer
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Verify the installation
    ollama --version
  4. Download models in Ollama:

    To download specific models such as llama3 and deepseek-R1, use the following commands:

    # Download the llama3 model
    ollama pull llama3
    
    # Download the deepseek-R1 model
    ollama pull deepseek-R1

base Usage

from pdf_rag import PDFRAGApplication

# Initialize the application
rag = PDFRAGApplication(model_name= "deepseek-r1")

# Load a PDF
rag.load_pdf("your_document.pdf")

# Query the system
response = rag.query("What is this document about?")
print(response)

Testing

Run the test.py script to see how the module works with ChromaDB:

import chromadb
chroma_client = chromadb.Client()

# switch `create_collection` to `get_or_create_collection` to avoid creating a new collection every time
collection = chroma_client.get_or_create_collection(name="my_collection")

# switch `add` to `upsert` to avoid adding the same documents every time
collection.upsert(
    documents=[
        "This is a document about pineapple",
        "This is a document about oranges"
    ],
    ids=["id1", "id2"]
)

results = collection.query(
    query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
    n_results=2 # how many results to return
)

print(results)

About

This project is a RAG system that reads PDF files and answers questions using Ollama models. Its goal is to extract information and provide accurate responses to users.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%