Quick Setup: Qwen3.5-27B Local with llama.cpp + OpenCode

1. Install llama.cpp

brew upgrade llama.cpp

2. Run model (downloads Q6_K on first run)

llama-server -hf unsloth/Qwen3.5-27B-GGUF:Q6_K --port 9090

More options & quants: https://unsloth.ai/docs/models/qwen3.5#qwen3.5-27b

3. Add to OpenCode config (~/.opencode/config.json)

{
  "provider": {
    "llama.cpp": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama.cpp local",
      "options": {
        "baseURL": "http://127.0.0.1:9090/v1"
      },
      "models": {
        "qwen35-27b-local": {
          "name": "Qwen3.5-27B (Local)",
          "limit": {
            "context": 262144,
            "output": 32000
          }
        }
      }
    }
  }
  // merge with existing config
}

Done.
Q6_K ≈ 23 GB RAM/VRAM, near-lossless quality, 256K context.
Lower quants (Q5_K_M, Q4_K_M) if faster/slimmer needed.