UNICEF Technical Documentation RAG MCP Server

The UNICEF Technical Documentation RAG (Retrieval-Augmented Generation) MCP Server provides intelligent access to technical documentation through semantic search capabilities. This Model Context Protocol (MCP) server specializes in processing and retrieving information from the Global Child Hazard Database - Technical Documentation and related climate risk assessment materials.

Overview

This MCP server serves as the technical documentation backend for the UNICEF Geosphere project, providing access to the Global Child Hazard Database - Technical Documentation.

Features

Core Capabilities

Document Processing: Automatic parsing and indexing of technical documentation
Vector Search: Semantic similarity-based document retrieval
Context Extraction: Relevant passages for answering specific questions

Technical Documentation Coverage

Climate Risk Methodologies: CCRI calculation approaches and algorithms
Dataset Specifications: Detailed descriptions of hazard and exposure datasets
Indicator Definitions: Technical definitions of risk indicators
Data Sources: Source documentation

Technology Stack

FastMCP: Model Context Protocol server framework
Vector Database: Document embeddings and similarity search
LlamaIndex: Document processing and RAG pipelines
Sentence Transformers: Text embedding generation

Project Structure

rag/
├── server.py              # MCP server and tool definitions
├── handlers.py            # RAG implementation and document processing
├── config.py              # Configuration and settings management
├── schemas.py             # Pydantic models and validation
├── constants.py           # Application constants
├── config.yaml            # Server configuration
├── logging_config.py      # Logging setup
└── data/vector_index/     # Document storage and vector indices
process_ccri_doc.py        # Document processing script
Global_Child_Hazard_Database_2025_Technical_Documentation.md # Global Child Hazard Database - Technical Documentation

Prerequisites

Document Processing Requirements

Source Documents: Global Child Hazard Database - Technical Documentation (Markdown format)
Vector Storage: Persistent vector database for document embeddings
Processing Power: Sufficient resources for document embedding generation

Note: The technical documentation must be provided as a single Markdown file named Global_Child_Hazard_Database_2025_Technical_Documentation.md at the repository root. If your source document is a Word or Google Doc, convert it to Markdown using one of the available online tools like word2md.com or a built-in tool like Pandoc. Using Pandoc, you can convert the document to Markdown using the following command:

pandoc --from docx --to markdown --extract-media=. <input_file.docx> -o Global_Child_Hazard_Database_2025_Technical_Documentation.md

Available Tools

The MCP server exposes specialized tools for technical documentation access:

1. Technical Documentation Search

`get_ccri_relevant_information(question: str)`

Performs semantic search against the Global Child Hazard Database - Technical Documentation to find relevant information.

Parameters:

question (required): Natural language question about climate risk methodologies, datasets, or technical specifications

Returns: Dictionary containing:

data: List of relevant document sections
input_arguments: Input arguments for the tool

Installation

Dependencies

# Install dependencies using uv
uv sync

Document Processing Setup

Before running the server, you must process the Global Child Hazard Database - Technical Documentation:

# Process and index the Global Child Hazard Database documentation
uv run python process_ccri_doc.py

This step:

Parses the Global Child Hazard Database - Technical Documentation Markdown
Splits content into searchable chunks
Generates vector embeddings for each chunk
Creates a persistent vector index
Stores metadata for each document section

Configuration

Server configuration

rag/config.yaml:

server:
  host: "0.0.0.0" # Server bind address
  port: 6001 # Internal MCP port
  transport: "sse" # MCP transport protocol

The server is reachable only on the internal Docker network. The agent connects via rag_mcp:6001/sse.

Development

Running the Server

# Development mode
mcp dev rag/server.py

# Production mode
uv run rag/server.py

Testing

# Run all tests
uv run pytest

# Run specific tests
uv run pytest tests/test_handlers.py -v

Secrets and environment

This service requires an AWS Bedrock bearer token when using the default configuration:

AWS_BEARER_TOKEN_BEDROCK: Bearer token for Bedrock (read from Docker secret aws_bearer_token_bedrock or environment variable; .env supported)

Notes:

Environment variables are loaded with .env support (see rag/initialize.py).
Embeddings are configured in rag/config.yaml (embeddings.model_id, embeddings.region_name).

Development Setup

Clone repository
Install dependencies: uv sync
Process documentation: uv run python process_ccri_doc.py
Run tests: uv run pytest
Start server: mcp dev rag/server.py

Contributing

Development Guidelines

Code Style: Follow PEP 8 and use type hints
Testing: Add tests for new RAG functionality
Documentation: Update tool descriptions and examples

Adding New Documents

Document Preparation: Ensure documents are in markdown format
Processing Script: Update process_ccri_doc.py for new documents
Metadata Schema: Extend metadata structure if needed
Testing: Verify search functionality with new content
Index Update: Regenerate vector index with new documents

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

Issues: Submit issues on GitHub repository
RAG Documentation: LlamaIndex RAG Guide
Technical Support: Repository maintainers

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
.vscode		.vscode
docs		docs
rag		rag
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.secrets.baseline		.secrets.baseline
Dockerfile		Dockerfile
Global_Child_Hazard_Database_2025_Technical_Documentation.md		Global_Child_Hazard_Database_2025_Technical_Documentation.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UNICEF Technical Documentation RAG MCP Server

Overview

Features

Core Capabilities

Technical Documentation Coverage

Technology Stack

Project Structure

Prerequisites

Document Processing Requirements

Available Tools

1. Technical Documentation Search

`get_ccri_relevant_information(question: str)`

Installation

Dependencies

Document Processing Setup

Configuration

Server configuration

Development

Running the Server

Testing

Secrets and environment

Development Setup

Contributing

Development Guidelines

Adding New Documents

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tryolabs/unicef-rag-mcp

Folders and files

Latest commit

History

Repository files navigation

UNICEF Technical Documentation RAG MCP Server

Overview

Features

Core Capabilities

Technical Documentation Coverage

Technology Stack

Project Structure

Prerequisites

Document Processing Requirements

Available Tools

1. Technical Documentation Search

get_ccri_relevant_information(question: str)

Installation

Dependencies

Document Processing Setup

Configuration

Server configuration

Development

Running the Server

Testing

Secrets and environment

Development Setup

Contributing

Development Guidelines

Adding New Documents

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`get_ccri_relevant_information(question: str)`

Packages