A Retrieval-Augmented Generation (RAG) system using LLaMA and FAISS, with a modern React frontend.
src/
: Backend componentsretriever.py
: FAISS-based document retrievergenerator.py
: LLaMA-based text generator using llama-cpp-pythonrag_pipeline.py
: RAG pipeline implementationapi.py
: FastAPI backend server
project/
: Frontend React applicationsrc/
: React components and logicpublic/
: Static assets
data/
: Document storagemodels/
: Model storagellama-2-7b/
: LLaMA model files
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
-
Prepare the LLaMA model:
- Convert your LLaMA model to GGML/GGUF format using llama.cpp
- Place the converted model file (e.g.,
ggml-model.bin
) inmodels/llama-2-7b/
-
Run the setup script:
python setup.py
- Navigate to the project directory:
cd project
- Install dependencies:
npm install
-
Ensure you have Docker and Docker Compose installed on your system.
-
Prepare the LLaMA model:
- Convert your LLaMA model to GGML/GGUF format using llama.cpp
- Place the converted model file (e.g.,
ggml-model.bin
) inmodels/llama-2-7b/
-
Create necessary directories:
mkdir -p models/llama-2-7b uploads
touch models/llama-2-7b/.gitkeep uploads/.gitkeep
- Build and start the containers:
docker-compose up --build
- Start the FastAPI server:
python api_main.py
The API will be available at http://localhost:8000
- In a new terminal, start the Vite development server:
cd project
npm run dev
The UI will be available at http://localhost:5173
- Start the application:
docker-compose up
- Access the application:
- Frontend:
http://localhost:5173
- Backend API:
http://localhost:8000
- Frontend:
To stop the application:
docker-compose down
- Open your browser and navigate to
http://localhost:5173
- Upload a PDF document using the file upload interface
- Once the document is processed, you can start asking questions about its content
- The AI will respond with answers based on the document's content
POST /upload
: Upload a PDF document for processingPOST /query
: Query the processed documentGET /health
: Check API health status
- Modify the retriever's model in
src/retriever.py
- Adjust generation parameters in
src/generator.py
- Customize the prompt template in
src/rag_pipeline.py
- Use a different model by updating the
model_path
insrc/generator.py
- Customize the UI components in
project/src/components/
- Modify the API integration in
project/src/App.tsx
- Update styles in
project/src/index.css
To convert your LLaMA model to GGML/GGUF format:
- Clone llama.cpp:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
- Convert your model:
python convert.py --outfile models/llama-2-7b/ggml-model.bin --outtype f16 /path/to/your/llama/model
For more details, refer to the llama.cpp documentation.