Skip to content

llmariner/inference-manager

Repository files navigation

inference-manager

The inference-manager manages inference runtimes (e.g., vLLM and Ollama) in containers, load models, and process requests.

Inferece Request flow.

Please see inference_request_flow.md.