Model Catalog

OMM can pull models from the Ollama registry and HuggingFace, giving you access to 50,000+ models. Models are downloaded in GGUF format (quantized, ready to run) or SafeTensors format (full precision, for rvllm).

Popular Models

Model	Parameters	Best For	Minimum RAM
Llama 3.3	70B	General purpose, reasoning	40 GB
Llama 3.2	3B	Lightweight tasks, edge devices	4 GB
Qwen 2.5	32B	Code, multilingual	20 GB
Qwen 2.5 Coder	7B	Code generation	6 GB
DeepSeek R1	14B	Reasoning, math	10 GB
Gemma 3	27B	Multimodal (text + vision)	16 GB
Mistral Small	24B	Fast general purpose	14 GB
Phi-4	14B	Reasoning, compact	8 GB
LLaVA	7B	Image understanding	6 GB

Installing Models

Terminal

bash

# From Ollama registry
omm pull llama3.2
omm pull qwen2.5-coder:7b

# From HuggingFace (GGUF)
omm pull --hf bartowski/Llama-3.3-70B-Instruct-GGUF

# From HuggingFace (SafeTensors, rvllm only)
omm pull --hf meta-llama/Llama-3.1-8B

# List installed models
omm list

Quantization Levels

GGUF models come in different quantization levels. Lower precision = smaller size, faster inference, with some quality trade-off:

Quant	Bits	Size vs FP16	Quality
Q2_K	2-bit	~25%	Lowest — extreme compression
Q4_K_M	4-bit	~40%	Good — recommended for most use cases
Q5_K_M	5-bit	~50%	Very good — minimal quality loss
Q6_K	6-bit	~60%	Excellent — near FP16 quality
Q8_0	8-bit	~75%	Near-lossless
F16	16-bit	100%	Full precision — no quality loss

#Model Catalog

#Popular Models

#Installing Models

#Quantization Levels

Model Catalog

Popular Models

Installing Models

Quantization Levels