Configuration

OMM is configured via ~/.omm/config.toml. The desktop app provides a settings UI; CLI users edit the file directly.

Full Configuration Reference

config.toml
toml
# Server settings
host = "127.0.0.1"
port = 11435
max_loaded_models = 3
num_parallel = 4
gpu_layers = 35
keep_alive = "5m"
active_profile = "default"
# Engine configuration
[engine]
default_engine = "auto" # auto, llama, rvllm
autotune = false
[engine.llama]
gpu_layers = 35 # Layers to offload to GPU
context_length = 8192 # Default context window
rope_scaling = "none" # none, linear, yarn
flash_attention = true
[engine.rvllm]
tensor_parallel = 1 # Number of GPUs
max_batch_size = 256
kv_cache_dtype = "auto" # auto, fp8, fp16
speculative_model = "" # Draft model for speculative decoding
# Logging
[logging]
level = "info" # debug, info, warn, error
file = "" # Log file path (empty = stderr)
# Model defaults
[defaults]
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1
system_prompt = ""

Environment Variables

VariableDefaultDescription
OMM_HOST127.0.0.1Server bind address
OMM_PORT11435Server port
OMM_HOME~/.ommData directory
OMM_ENGINEautoInference engine selection
OMM_API_KEY(none)Optional API authentication
OMM_GPU_LAYERS35GPU layer count override

Profile System

Profiles let you switch between different configurations (tools, MCPs, skills, hooks, system prompts) without editing the main config. Profiles are stored in~/.omm/profiles/.

Terminal
bash
# List profiles
omm profile list
# Switch profile
omm profile set coding
# Create a new profile
omm profile create research --model qwen2.5-72b
PreviousAPI ReferenceNextAuthentication