AsukaAI is your personal offline AI companion, ensuring privacy while offering powerful features. This local AI waifu seamlessly integrates speech-to-text, text generation, and text-to-speech functionalities:
- RealtimeSTT with faster_whisper under the hood for speech-to-text
- Ollama with Llama3 under the hood for text generation
- RealtimeTTS with CoquiTTS under the hood for text-to-speech
- Privacy: All processing is done locally on your machine, ensuring your data never leaves your device.
- Customizability: Create and integrate your own models according to your needs.
- Modular Design: Each component (STT, text generation, TTS) can be independently configured and customized.
- User-friendly: Easy to set up and use with straightforward installation and configuration.
- Real-time processing: Speech-to-text and text-to-speech are performed in real-time, enabling almost seamless interaction with AI.
- Long-term memory: AI remembers and recalls information across sessions, enabling personalized and context-aware interactions, all while keeping data stored locally.
- VRAM: 0GB (GPU not required)
- RAM: 16GB
- Free Disk Space: 10GB
- VRAM: 12GB
- RAM: 16GB
- Free Disk Space: 10GB
- GPU: Nvidia GPU recommended; CPU can be used if Nvidia GPU is not available.
- AMD GPUs Not supported yet.
- OS: Windows is preferable; Linux and macOS have not been tested yet.
-
Python 3.11: Make sure Python is installed on your system. You can download it from python.org.
-
CUDA Toolkit (if using GPU): Ensure you have the CUDA toolkit installed to leverage GPU acceleration. Download it from NVIDIA's website.
-
Ollama: Ensure Ollama is installed on your system. Download it here Ollama.
-
Miniconda: Ensure Miniconda is installed. Don't forget to check
Add to PATH
during installation. Download it here Miniconda.
-
Clone the Repository:
git clone https://github.com/vancoder1/AsukaAI.git cd AsukaAI
-
Install everything using install_windows.bat:
.\install_windows.bat
-
Wait for the installation to complete.
-
Start the Application:
.\start_windows.bat
-
Speech-to-Text:
- Just start speaking to start transcription.
- Wait for faster_whisper to process and extract text from your recording.
-
Text Generation:
- Ollama3 provides human-like interaction. If you prefer a different model, you can change it in
models/modelfile.md
. - Check the Ollama library for available models.
- For customizing model behavior, refer to the Ollama documentation on their GitHub page.
- Important: After modifying
models/modelfile.md
, run the following command to update the model:.\update_model.bat
- Ollama3 provides human-like interaction. If you prefer a different model, you can change it in
-
Text-to-Speech:
- Convert generated text to speech in real time while text is being generated.
- To change the voice, place your desired WAV audio file (approximately 10 seconds, clear voice, no background noise, 22050hz, mono) into the
data/reference_voices
directory and name the filereference.wav
. Make sure you removed previousreference.wav
andreference.json
files.
-
Configuration:
- You can customize various settings in the
config.json
file:
{ "version": "1.0.0", "description": "Configuration file for AI app", "stt": { "model": "distil-small.en" }, "tts": { "reference_file": "data/reference_voices/reference.wav" } }
- stt.model: Defines the model used for speech-to-text processing. You can change this to other available models such as
"base.en"
or"distil-large-v3"
etc. - tts.reference_file: Sets the path to the reference audio file for text-to-speech processing. Ensure the specified file exists and is correctly formatted.
- You can customize various settings in the
Contributions are welcome! If you have any ideas, suggestions, or bug reports, feel free to open an issue or submit a pull request.
This project is licensed under the Apache License 2.0.
For any questions or feedback, please open an issue on this repository or reach out to [email protected]
.