Model Catalog

OMM can pull models from the Ollama registry and HuggingFace, giving you access to 50,000+ models. Models are downloaded in GGUF format (quantized, ready to run) or SafeTensors format (full precision, for rvllm).

ModelParametersBest ForMinimum RAM
Llama 3.370BGeneral purpose, reasoning40 GB
Llama 3.23BLightweight tasks, edge devices4 GB
Qwen 2.532BCode, multilingual20 GB
Qwen 2.5 Coder7BCode generation6 GB
DeepSeek R114BReasoning, math10 GB
Gemma 327BMultimodal (text + vision)16 GB
Mistral Small24BFast general purpose14 GB
Phi-414BReasoning, compact8 GB
LLaVA7BImage understanding6 GB

Installing Models

Terminal
bash
# From Ollama registry
omm pull llama3.2
omm pull qwen2.5-coder:7b
# From HuggingFace (GGUF)
omm pull --hf bartowski/Llama-3.3-70B-Instruct-GGUF
# From HuggingFace (SafeTensors, rvllm only)
omm pull --hf meta-llama/Llama-3.1-8B
# List installed models
omm list

Quantization Levels

GGUF models come in different quantization levels. Lower precision = smaller size, faster inference, with some quality trade-off:

QuantBitsSize vs FP16Quality
Q2_K2-bit~25%Lowest — extreme compression
Q4_K_M4-bit~40%Good — recommended for most use cases
Q5_K_M5-bit~50%Very good — minimal quality loss
Q6_K6-bit~60%Excellent — near FP16 quality
Q8_08-bit~75%Near-lossless
F1616-bit100%Full precision — no quality loss
PreviousFirst RequestNextGGUF & SafeTensors