ommv0.2.1
Open Model Manager
Run any model locally. Serve Ollama, OpenAI, and Anthropic APIs from a single process. Native GPU acceleration. Drop-in compatible.
$
curl -fsSL https://poly.inc/omm.sh | bash3
protocols
Ollama · OpenAI · Anthropic
4
GPU backends
CUDA · Metal · ROCm · Vulkan
68
SDK providers
via @omm-sdk
Native inferencellama.cpp with persistent KV cache and zero-reload workers
Hybrid routingLocal GGUF models use native engine, everything else proxies to Ollama
Multi-protocolOne server, three APIs — transparent format conversion
Tool callingCross-protocol function calling with automatic format translation
ThinkingExtended reasoning via Anthropic SSE with <think> tag parsing
MultimodalVision support across all three API protocols
$
omm pull llama3$
omm serve$
omm launch claude --model llama3/api/*Ollama/v1/chat/completionsOpenAI/v1/messagesAnthropic/metricsPrometheus