omm
Models
Run local LLMs with OMM. A desktop app for installing, configuring, and chatting with AI models — powered by a custom inference engine with GPU acceleration.
Install a model
terminal
$ omm pull llama3.2
Features
Local & Private
Run LLMs on your machine. No data leaves your device.
One-Click Install
Pull from HuggingFace or Ollama registry. GGUF and SafeTensors.
Multi-Engine
llama.cpp for broad hardware support. rvllm for multi-GPU and batched inference.
3 API Protocols
Ollama, OpenAI, and Anthropic compatible. Drop-in for any tool.
GPU Acceleration
CUDA, Metal, Vulkan, and ROCm backends with auto-detection.
Model Catalog
Browse and install from 50,000+ models on HuggingFace.
Popular Models
ModelParamsQuantSize
Llama 3.370BQ4_K_M40 GB
Qwen 2.532BQ4_K_M19 GB
Gemma 327BQ4_K_M16 GB
Mistral Small24BQ4_K_M14 GB
DeepSeek R114BQ5_K_M10 GB
Phi-414BQ4_K_M8 GB
Llama 3.23BQ8_03.2 GB
Qwen 2.5 Coder1.5BQ8_01.6 GB
50,000+ models available via HuggingFace and Ollama registry
Documentation