Model Catalog
OMM can pull models from the Ollama registry and HuggingFace, giving you access to 50,000+ models. Models are downloaded in GGUF format (quantized, ready to run) or SafeTensors format (full precision, for rvllm).
Popular Models
| Model | Parameters | Best For | Minimum RAM |
|---|---|---|---|
| Llama 3.3 | 70B | General purpose, reasoning | 40 GB |
| Llama 3.2 | 3B | Lightweight tasks, edge devices | 4 GB |
| Qwen 2.5 | 32B | Code, multilingual | 20 GB |
| Qwen 2.5 Coder | 7B | Code generation | 6 GB |
| DeepSeek R1 | 14B | Reasoning, math | 10 GB |
| Gemma 3 | 27B | Multimodal (text + vision) | 16 GB |
| Mistral Small | 24B | Fast general purpose | 14 GB |
| Phi-4 | 14B | Reasoning, compact | 8 GB |
| LLaVA | 7B | Image understanding | 6 GB |
Installing Models
Quantization Levels
GGUF models come in different quantization levels. Lower precision = smaller size, faster inference, with some quality trade-off:
| Quant | Bits | Size vs FP16 | Quality |
|---|---|---|---|
| Q2_K | 2-bit | ~25% | Lowest — extreme compression |
| Q4_K_M | 4-bit | ~40% | Good — recommended for most use cases |
| Q5_K_M | 5-bit | ~50% | Very good — minimal quality loss |
| Q6_K | 6-bit | ~60% | Excellent — near FP16 quality |
| Q8_0 | 8-bit | ~75% | Near-lossless |
| F16 | 16-bit | 100% | Full precision — no quality loss |