This is a simple app that utilizes a Retrieval-Augmented Generation (RAG) approach to answer questions based on the context provided in a CSV file. The app uses Pinecone for vector storage, LangChain for document processing and retrieval, and OpenAI embeddings for contextual understanding.
streamlit_app/
├── app.py # Main Streamlit application
├── utils.py # Helper functions (e.g., vectorstore creation)
├── context.csv # Your data file
├── .streamlit/ # Config folder for Streamlit
│ └── secrets.toml # Store API keys
├── requirements.txt # Project dependencies
To run the app, first install the necessary libraries using the requirements.txt file.
pip install -r requirements.txtYou'll need to securely store your API keys in the .streamlit/secrets.toml file. Create the file and add the following:
OPENAI_API_KEY = "your_openai_key"
ATHINA_API_KEY = "your_athina_key"
PINECONE_API_KEY = "your_pinecone_key"These keys will be used to authenticate the API calls to OpenAI, Athina, and Pinecone.
Once the dependencies are installed and the API keys are set, you can run the app:
streamlit run app.py- Upload a CSV File: The app accepts a CSV file, which will be used as the source of context for answering questions.
- Ask Questions: Enter your question in the provided input box to get a response based on the uploaded context.
- Evaluate: Use the "Evaluate" button to run an evaluation on the question-answer pairs and view the results in a DataFrame.
This is the main Streamlit application that handles the UI and user interactions. It includes:
- File upload functionality for CSV context.
- Text input for asking questions.
- Evaluation trigger to assess the model's performance.
Contains helper functions for:
- Setting up the environment.
- Loading and splitting documents.
- Creating Pinecone vector store.
- Creating the retriever and RAG chain.
- Preparing and running the evaluation.
The CSV file that contains your context data. The app expects this file to have columns of text that will be used as the context for answering queries.
A configuration file that securely stores your API keys. It ensures that sensitive data (API keys) are not exposed in the code.
Lists all the required Python libraries for the app to run, including Streamlit, LangChain, Pinecone, etc.
- Environment Setup: The
setup_environment()function loads the necessary API keys and sets up the OpenAI embeddings for document processing. - Document Loading: The
load_and_split_documents()function loads the CSV file and splits the text into smaller chunks for embedding. - Vector Database: The app supports Pinecone for vector storage. If you prefer to use FAISS, you can uncomment the relevant code in
utils.py. - Retriever and RAG Chain: The
create_retriever()andcreate_rag_chain()functions create the retrieval system and the RAG pipeline, respectively, to answer user queries. - Evaluation: The evaluation process takes the question, context, and response and runs an evaluation to assess the quality of the answer.
- API Keys: Never commit your API keys to version control. Use
.streamlit/secrets.tomlto securely store them. - FAISS: The FAISS implementation is optional and commented out. To use FAISS, uncomment the relevant lines in
utils.pyand ensure you havefaiss-gpuinstalled. - Data Handling: The app expects the uploaded CSV file to have columns that are interpretable as context. Ensure your data is in the correct format.
- Add more RAG techniques and customizable evaluation metrics.
- Improve error handling and logging.
- Extend the UI to allow for more interactive features.