The UNICEF Technical Documentation RAG (Retrieval-Augmented Generation) MCP Server provides intelligent access to technical documentation through semantic search capabilities. This Model Context Protocol (MCP) server specializes in processing and retrieving information from the Global Child Hazard Database - Technical Documentation and related climate risk assessment materials.
This MCP server serves as the technical documentation backend for the UNICEF Geosphere project, providing access to the Global Child Hazard Database - Technical Documentation.
- Document Processing: Automatic parsing and indexing of technical documentation
- Vector Search: Semantic similarity-based document retrieval
- Context Extraction: Relevant passages for answering specific questions
- Climate Risk Methodologies: CCRI calculation approaches and algorithms
- Dataset Specifications: Detailed descriptions of hazard and exposure datasets
- Indicator Definitions: Technical definitions of risk indicators
- Data Sources: Source documentation
- FastMCP: Model Context Protocol server framework
- Vector Database: Document embeddings and similarity search
- LlamaIndex: Document processing and RAG pipelines
- Sentence Transformers: Text embedding generation
rag/
├── server.py # MCP server and tool definitions
├── handlers.py # RAG implementation and document processing
├── config.py # Configuration and settings management
├── schemas.py # Pydantic models and validation
├── constants.py # Application constants
├── config.yaml # Server configuration
├── logging_config.py # Logging setup
└── data/vector_index/ # Document storage and vector indices
process_ccri_doc.py # Document processing script
Global_Child_Hazard_Database_2025_Technical_Documentation.md # Global Child Hazard Database - Technical Documentation
- Source Documents: Global Child Hazard Database - Technical Documentation (Markdown format)
- Vector Storage: Persistent vector database for document embeddings
- Processing Power: Sufficient resources for document embedding generation
Note: The technical documentation must be provided as a single Markdown file named
Global_Child_Hazard_Database_2025_Technical_Documentation.md
at the repository root. If your source document
is a Word or Google Doc, convert it to Markdown using one of the available online tools
like word2md.com or a built-in tool like Pandoc. Using
Pandoc, you can convert the document to Markdown using the following command:
pandoc --from docx --to markdown --extract-media=. <input_file.docx> -o Global_Child_Hazard_Database_2025_Technical_Documentation.md
The MCP server exposes specialized tools for technical documentation access:
Performs semantic search against the Global Child Hazard Database - Technical Documentation to find relevant information.
Parameters:
question
(required): Natural language question about climate risk methodologies, datasets, or technical specifications
Returns: Dictionary containing:
data
: List of relevant document sectionsinput_arguments
: Input arguments for the tool
# Install dependencies using uv
uv sync
Before running the server, you must process the Global Child Hazard Database - Technical Documentation:
# Process and index the Global Child Hazard Database documentation
uv run python process_ccri_doc.py
This step:
- Parses the Global Child Hazard Database - Technical Documentation Markdown
- Splits content into searchable chunks
- Generates vector embeddings for each chunk
- Creates a persistent vector index
- Stores metadata for each document section
rag/config.yaml
:
server:
host: "0.0.0.0" # Server bind address
port: 6001 # Internal MCP port
transport: "sse" # MCP transport protocol
The server is reachable only on the internal Docker network. The agent connects via rag_mcp:6001/sse
.
# Development mode
mcp dev rag/server.py
# Production mode
uv run rag/server.py
# Run all tests
uv run pytest
# Run specific tests
uv run pytest tests/test_handlers.py -v
This service requires an AWS Bedrock bearer token when using the default configuration:
AWS_BEARER_TOKEN_BEDROCK
: Bearer token for Bedrock (read from Docker secretaws_bearer_token_bedrock
or environment variable;.env
supported)
Notes:
- Environment variables are loaded with
.env
support (seerag/initialize.py
). - Embeddings are configured in
rag/config.yaml
(embeddings.model_id
,embeddings.region_name
).
- Clone repository
- Install dependencies:
uv sync
- Process documentation:
uv run python process_ccri_doc.py
- Run tests:
uv run pytest
- Start server:
mcp dev rag/server.py
- Code Style: Follow PEP 8 and use type hints
- Testing: Add tests for new RAG functionality
- Documentation: Update tool descriptions and examples
- Document Preparation: Ensure documents are in markdown format
- Processing Script: Update
process_ccri_doc.py
for new documents - Metadata Schema: Extend metadata structure if needed
- Testing: Verify search functionality with new content
- Index Update: Regenerate vector index with new documents
This project is licensed under the MIT License. See the LICENSE file for details.
- Issues: Submit issues on GitHub repository
- RAG Documentation: LlamaIndex RAG Guide
- Technical Support: Repository maintainers