Jhana AI is a voice assistant meditation coach designed to guide users through meditation sessions using advanced AI techniques. Jhana listens to user queries, provides meditative guidance, and supports users in their meditation journey.
This repository is a development sandbox for Jhana AI, where we curate the dataset used to train the model, experiment with the pipeline, and try out different language models to build the best possible meditation coach.
Please note that the datasets and models used in this repository are for research and development purposes only. The notebooks may not work correctly, because the datasets are not included in the repository. The models are not included either, and the code may not work without the necessary models.
- Voice Recognition: Records audio input from the user.
- Speech Transcription: Utilizes Whisper to transcribe the recorded audio.
- AI-Powered Responses: Engages with users through Mixtral, powered by Ollama, to provide insightful meditation guidance.
- Text-to-Speech: Converts AI responses into audible speech using XTTS-v2.
- Audio Playback: Plays back the generated guidance for a seamless meditative experience.
Ensure you have the following installed:
ffmpegportaudio19-dev
In Ubuntu, you can install these dependencies using apt:
sudo apt install ffmpeg portaudio19-dev- Clone the repository:
git clone https://github.com/carecodeconnect/jhana-sandbox.git
cd jhana-sandbox- Install Python dependencies:
pip install -r requirements.txt- Install Ollama:
curl -fsSL https://ollama.com/install.sh | shDownload the necessary models:
- Whisper: Follow instructions from OpenAI's GitHub to download
whisper-small. - TTS: XTTS-v2 can be installed as part of the Python dependencies listed in
requirements.txt. TestTTSin the terminal using the following command:
python -m tts --text "Hello, world!" --output-file "hello_world.wav" --model_name "tts_models/multilingual/multi-dataset/xtts_v2"- Ollama: After installing Ollama, run the following to set up Mixtral:
ollama run mixtral:8x7b-instruct-v0.1-q4_0├── data/
│ └── input/
│ └── audio/
│ └── voices_to_clone/
├── img/
├── notebooks/
├── src/
├── .gitattributes
├── .gitignore
├── README.md
└── requirements.txt
- Record Audio: Captures audio from the microphone and saves it as a file in
data/input/audio/speech_to_transcribe/. - Transcribe Audio: Uses Whisper to convert the saved audio file into text.
- AI Interaction: Feeds the transcribed text to Mixtral via Ollama and receives meditation guidance.
- Generate Response: The response from Mixtral is printed to the console and saved in
data/output/text/. - Text-to-Speech: Converts the Mixtral response into speech using XTTS-v2.
- Save and Play Speech: The generated speech is saved in
data/output/audio/and then played back to the user.
To run the basic STT -> LLM -> TTS pipeline, run the 02_pipeline.py Jupyter notebook in notebooks.
Please note the src/main.py file runs too slowly at the moment.