vllm

Star

Here are 15 public repositories matching this topic...

mostlygeek / llama-swap

Star

Model swapping for llama.cpp (or any local OpenAI API compatible server)

golang openai llama openai-api llamacpp vllm localllm localllama

Updated Sep 29, 2025
Go

vllm-project / semantic-router

Sponsor

Star

Intelligent Mixture-of-Models Router for Efficient LLM Inference

python kubernetes rust golang fine-tuning envoyproxy pii-detection mixture-of-models huggingface-transformers bert-classification prompt-engineering vllm huggingface-candle ai-gateway semantic-router llm-tool-call prompt-guard envoy-ext-proc

Updated Oct 1, 2025
Go

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated Jan 5, 2025
Go

kubeai-project / kubeai

Star

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator

Updated Oct 2, 2025
Go

InftyAI / llmaz

Star

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

kubernetes inference huggingface llm modelscope llamacpp vllm text-generation-inference ollama sglang inference-platform

Updated Sep 15, 2025
Go

thushan / olla

Sponsor

Star

Lightweight & fast AI inference proxy for self-hosted LLMs backends like Ollama, LM Studio and others. Designed for speed, simplicity and local-first deployments.

golang ai proxy self-hosted llamacpp vllm llm-inference ollama llm-proxy lmstudio llm-router llm-routing self-hosted-ai

Updated Oct 1, 2025
Go

llmariner / llmariner

Star

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

kubernetes ai gpu ml inference openai operator k8s autoscaling fine-tuning multi-cluster llm vllm

Updated Sep 30, 2025
Go

scitix / arks

Star

Arks is a cloud-native inference framework running on Kubernetes

kubernetes ai inference dynamo reasoning cloudnative-services llm vllm sglang scitix

Updated Sep 29, 2025
Go

Climatik-Project / Climatik-Project

Star

Carbon Limiting Auto Tuning for Kubernetes

kubernetes sustainability kepler kubernetes-operator power-capping green-computing keda kserve llm vllm llm-inference

Updated Nov 11, 2024
Go

lordmathis / llamactl

Star

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

self-hosted mlx openai-api llm llamacpp llama-cpp vllm llm-inference localllm localllama llama-server llm-router mlx-lm

Updated Oct 1, 2025
Go

moeru-ai / demodel

Star

🚀🛸 Easily boost the speed of pulling your models and datasets from various of inference runtimes. (e.g. 🤗 HuggingFace, 🐫 Ollama, vLLM, and more!)

p2p transformers dataset huggingface modelscope vllm ollama transformers-js

Updated Jun 25, 2025
Go

TimeSurgeLabs / promptproxy

Star

Call many AIs from a single API.

docker ai openai llama huggingface openai-api llm vllm openai-api-proxy llama2

Updated Mar 28, 2024
Go

uzunenes / k8s-ai-stack

Star

Production-ready AI for Kubernetes. Run cutting‑edge LLMs on NVIDIA GPUs with vLLM. Use Ollama for embeddings and vision. Access securely through OpenWebUI. Scalable, high‑performance, and fully self‑hosted.

k8s nvidia-gpu mlops vllm llm-inference kubernetes-orchestration

Updated Aug 4, 2025
Go

sjy-dv / GXpert

Star

A sample architecture that mimics MoE (Mixture of Experts) using Go.

go golang moe mixture-of-experts llm vllm ai-routing

Updated Apr 24, 2025
Go

gaspardpetit / nfrx

Star

nfrx is an inference exchange gateway

mcp embeddings self-hosting pooling rag openai-api vllm ollama mcp-server mcp-client

Updated Sep 1, 2025
Go

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm

Here are 15 public repositories matching this topic...

mostlygeek / llama-swap

vllm-project / semantic-router

bricks-cloud / BricksLLM

kubeai-project / kubeai

InftyAI / llmaz

thushan / olla

llmariner / llmariner

scitix / arks

Climatik-Project / Climatik-Project

lordmathis / llamactl

moeru-ai / demodel

TimeSurgeLabs / promptproxy

uzunenes / k8s-ai-stack

sjy-dv / GXpert

gaspardpetit / nfrx

Improve this page

Add this topic to your repo