RAG-powered API and ChatBot for serving all LLaMA models on custom documents.
- Make conda env from environment.yml file
conda env create -f environment.yml
- Then you should activate your created env
conda activate rag
Define your huggignface token as environment variable.
export HUGGING_FACE_HUB_TOKEN="YOUR_TOKEN"
Run the main.py for serving fastapi on 8000 port
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Open the http://localhost:8000/llama3/?query=YOUR_QUESTION in browser or use following command:
curl -X 'GET' \
'http://localhost:8000/llama3/?query=YOUR_QUESTION' \
-H 'accept: application/json'
Endpoints available:
- http://localhost:8000/llama2/?query=...
- http://localhost:8000/llama3/?query=...
- http://localhost:8000/llama31/?query=...
This repo includes a simple Streamlit chat UI powered by streamlit-chat
template.
Do these steps after section "Use with GET request".
Run Streamlit UI
cd ui
streamlit run app.py
Open http://localhost:8501/ to see ChatBot app.
Optional: set a different backend URL if not using localhost:8000
export RAG_API_BASE_URL="http://your-api-host:8000"