inference-manager

The inference-manager manages inference runtimes (e.g., vLLM and Ollama) in containers, load models, and process requests.

Inferece Request flow.

Name		Name	Last commit message	Last commit date
Latest commit History 1,062 Commits
.github		.github
api/v1		api/v1
build		build
common/pkg		common/pkg
deployments		deployments
dist		dist
docs/development		docs/development
engine		engine
hack		hack
server		server
triton-proxy		triton-proxy
ts		ts
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
buf.gen.ts.yaml		buf.gen.ts.yaml
buf.gen.yaml		buf.gen.yaml
buf.lock		buf.lock
buf.yaml		buf.yaml
common.mk		common.mk
go.mod		go.mod
go.sum		go.sum
package.json		package.json
tsconfig.json		tsconfig.json