Biodiversity and Conservation Knowledge RAG System

This is a Proof of Concept (POC), built during Google GenAI & Agents for Builders - Berlin Edition with the Agenda.

This project aims to build a Retrieval-Augmented Generation (RAG) system for Darukaa to provide accurate and relevant information on biodiversity and conservation topics, leveraging Google Cloud and Vertex AI Search.

Overall Goal

To create a robust and user-friendly system that can answer complex questions about biodiversity and conservation by retrieving information from a curated knowledge base (PDF documents) and generating informative responses.

Demo

Darukaa_Assistant.mov

demo.mov

Core Components

Knowledge Base: A collection of relevant PDF documents (e.g., biodiversity and conservation-related articles, research papers) on biodiversity and conservation.
Indexing and Retrieval: Using Google Vertex AI Search to index the PDF documents and retrieve relevant passages based on user queries.
User Interface: A web application (built with Next.js) for users to interact with the system, input queries, and view responses.
Language Model (Partially Implemented): A powerful language model (e.g., Gemini) to generate coherent and informative answers based on the retrieved information.

Google Tools Used

Google Vertex AI Search: For indexing and retrieving information from the PDF knowledge base.
Google Cloud Storage: To store the PDF documents.
Vertex AI: To access and utilize powerful language models.
Google Cloud Functions or Cloud Run (Not Implemented): For deploying the backend logic that handles user queries and interacts with Vertex AI Search and the language model.

Approach Taken

Phase 1: Setting up the Knowledge Base and Indexing (Focus for Hacking Session)

Curate Knowledge Base: Gather relevant PDF documents on biodiversity and conservation topics.
Upload to Cloud Storage: Store the PDF documents in a Google Cloud Storage bucket.
Configure Vertex AI Search:
- Create a Vertex AI Search data store.
- Connect the data store to the Cloud Storage bucket containing the PDFs.
- Initiate the indexing process.
Test Retrieval: Use the Vertex AI Search console or API to test if relevant document snippets are retrieved for sample queries.

Phase 2: Building the Application and Integration

Develop Backend:
- Create an API endpoint (using Cloud Functions or Cloud Run) to receive user queries.
- Implement logic to call the Vertex AI Search API with the user query.
- Process the search results (retrieved passages).
- Call a language model (via Vertex AI) with the user query and the retrieved passages to generate an answer.
- Return the generated answer to the frontend.
Develop Frontend (Next.js):
- Create a user interface with an input field for queries and a display area for responses.
- Implement logic to send user queries to the backend API.
- Display the responses received from the backend.

Improvement

Customer Query Improvement: Use Gemini ai.model to rephrasec customer query before sending to RAG, Vertex AI search.
RAG Response Improvement: Combine user query and RAG answers to formulate a better answer using the Gemini ai.model.

How to Run

Create GCP Bucket and add research files
Create a data source in the Vertex AI Search Project.
Update the configs in discoveryEngine.ts
Get Gemini API Key and configure it in the gemini-clints.ts
Run the app npm i and npm run dev

How the Application Works

The application works by taking a user's query and sending it to a backend service. This service interacts with Google Vertex AI Search, which has pre-indexed a collection of PDF documents related to biodiversity and conservation. Vertex AI Search retrieves the most relevant passages from these documents based on the user's query. These retrieved passages, along with the original query, are then passed to a large language model (like Gemini) hosted on Vertex AI. The language model uses this information to generate a comprehensive and informed answer, which is then sent back to the user interface for display. This process of combining retrieval with generation is known as Retrieval-Augmented Generation (RAG), and it allows the system to provide answers grounded in specific, up-to-date information from the provided PDF files.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idx		.idx
.vscode		.vscode
docs		docs
files		files
src		src
.gitignore		.gitignore
.modified		.modified
README.md		README.md
components.json		components.json
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biodiversity and Conservation Knowledge RAG System

Overall Goal

Demo

Core Components

Google Tools Used

Approach Taken

Phase 1: Setting up the Knowledge Base and Indexing (Focus for Hacking Session)

Phase 2: Building the Application and Integration

Improvement

How to Run

How the Application Works

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Biodiversity and Conservation Knowledge RAG System

Overall Goal

Demo

Core Components

Google Tools Used

Approach Taken

Phase 1: Setting up the Knowledge Base and Indexing (Focus for Hacking Session)

Phase 2: Building the Application and Integration

Improvement

How to Run

How the Application Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages