omm

Models

Run local LLMs with OMM. A desktop app for installing, configuring, and chatting with AI models — powered by a custom inference engine with GPU acceleration.

Install a model

terminal

$ omm pull llama3.2

Features

Local & Private

Run LLMs on your machine. No data leaves your device.

One-Click Install

Pull from HuggingFace or Ollama registry. GGUF and SafeTensors.

Multi-Engine

llama.cpp for broad hardware support. rvllm for multi-GPU and batched inference.

3 API Protocols

Ollama, OpenAI, and Anthropic compatible. Drop-in for any tool.

GPU Acceleration

CUDA, Metal, Vulkan, and ROCm backends with auto-detection.

Model Catalog

Browse and install from 50,000+ models on HuggingFace.

Popular Models

ModelParamsQuantSize

Llama 3.370BQ4_K_M40 GB

Qwen 2.532BQ4_K_M19 GB

Gemma 327BQ4_K_M16 GB

Mistral Small24BQ4_K_M14 GB

DeepSeek R114BQ5_K_M10 GB

Phi-414BQ4_K_M8 GB

Llama 3.23BQ8_03.2 GB

Qwen 2.5 Coder1.5BQ8_01.6 GB

50,000+ models available via HuggingFace and Ollama registry

Documentation

About Installation Quick Start Model Catalog Inference Engines GPU Setup API Reference Configuration