Open MLX Model Mirror — Apple Silicon LLMs

Tier-1 model families

Six families rapid-mlx supports as first-class citizens — each R2-mirrored alias here is verified against its shard sum at render time and tagged to the same RAM tier the hardware-tiers installer recommender uses. Hunyuan 3 (HY3) is the one exception: it is Ultra-only (M3 Ultra 256 GB), is not on the R2 mirror, and is pulled straight from HuggingFace by alias (rapid-mlx serve hy3-preview-4bit). Skip to the full catalog below for the long tail.

Qwen 3.6

MoE workhorse

The 35B-A3B MoE anchors the 48-95 GB tier (3B active, 35B total). Native MTP head baked into every checkpoint — pair with a matching MTP drafter (below) for speculative decode.

qwen3.6-35b-8bit

35.2 GB48-95 GB

35B-A3B MoE at 8-bit MLX. Anchor pick of the 48-95 GB tier.

Browse mlx-community/Qwen3.6-35B-A3B-8bit

$ rapid-mlx pull qwen3.6-35b-8bit

qwen3.6-35b-4bit

19.0 GB24-47 GB

35B-A3B MoE at 4-bit MLX. Fits at 32-48 GB with headroom.

Browse mlx-community/Qwen3.6-35B-A3B-4bit

$ rapid-mlx pull qwen3.6-35b-4bit

qwen3.6-27b-4bit

15.0 GB24-47 GB

Dense 27B at 4-bit MLX. Hybrid attention layout.

Browse mlx-community/Qwen3.6-27B-4bit

$ rapid-mlx pull qwen3.6-27b-4bit

qwen3.5-4b-4bit

2.9 GB8-23 GB

Qwen 3.5 4B 4-bit — the small-tier stand-in until Qwen 3.6 ships a <8B SKU.

Browse mlx-community/Qwen3.5-4B-MLX-4bit

$ rapid-mlx pull qwen3.5-4b-4bit

Gemma 4

dense

Google's dense Gemma 4 line. QAT variants ship for sharper low-bit quants; the 12B / 26B / 31B checkpoints land cleanly on Mac Studio.

gemma-4-31b-4bit

17.2 GB48-95 GB

Dense 31B at 4-bit MLX. Strong chat/tools baseline.

Browse mlx-community/gemma-4-31b-it-4bit

$ rapid-mlx pull gemma-4-31b-4bit

gemma-4-26b-4bit

14.6 GB24-47 GB

Dense 26B at 4-bit MLX. Trimmed from 31B.

Browse mlx-community/gemma-4-26b-a4b-it-4bit

$ rapid-mlx pull gemma-4-26b-4bit

gemma-4-12b-8bit

11.9 GB24-47 GB

Dense 12B at 8-bit MLX. Higher fidelity than the 4-bit twin.

Browse mlx-community/gemma-4-12B-it-8bit

$ rapid-mlx pull gemma-4-12b-8bit

gemma-4-12b-4bit

6.3 GB8-23 GB

Dense 12B at 4-bit MLX. Fits a 16 GB MacBook.

Browse mlx-community/gemma-4-12B-it-4bit

$ rapid-mlx pull gemma-4-12b-4bit

DeepSeek R1

reasoning

R1-Distill line — chain-of-thought reasoning baked into the base weights via distillation from the DeepSeek R1 teacher.

deepseek-r1-32b-4bit

17.2 GB48-95 GB

R1-Distill Qwen 32B at 4-bit MLX. Reasoning-first flagship distill.

Browse mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit

$ rapid-mlx pull deepseek-r1-32b-4bit

deepseek-r1-8b-4bit

4.3 GB8-23 GB

R1-Distill Qwen3 8B at 4-bit MLX. Reasoning on a 16 GB Mac.

Browse mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit

$ rapid-mlx pull deepseek-r1-8b-4bit

gpt-oss

Harmony-native

OpenAI's harmony-native MoE. Ships MXFP4 weights with 8-bit accumulators; parser pair auto-selects harmony/harmony.

gpt-oss-120b-mxfp4-q8

59.1 GB96+ GB

OpenAI 120B MoE at MXFP4-Q8. Frontier reasoning + tool calling on-device.

Browse mlx-community/gpt-oss-120b-MXFP4-Q8

$ rapid-mlx pull gpt-oss-120b-mxfp4-q8

gpt-oss-20b-mxfp4-q8

11.3 GB24-47 GB

OpenAI 20B MoE at MXFP4-Q8. Anchor pick of the 24-47 GB tier.

Browse mlx-community/gpt-oss-20b-MXFP4-Q8

$ rapid-mlx pull gpt-oss-20b-mxfp4-q8

Hunyuan 3 (HY3)

Ultra-only MoE

Tencent's Hunyuan 3 — a 295B-total / 21B-active MoE with a 3.8B MTP head. Ultra-only: peak resident is ~156 GB, so it needs an M3 Ultra with 256 GB unified memory. Will not fit smaller Macs. Weights pull from HuggingFace (not R2-mirrored).

hy3-preview-4bit

156.0 GB256 GB Ultra

295B-A21B MoE at 4-bit MLX. Ultra-only — requires an M3 Ultra 256 GB Mac (~156 GB peak). Pulls from HuggingFace, not the R2 mirror.

HuggingFace mlx-community/Hy3-preview-4bit

$ rapid-mlx serve hy3-preview-4bit

Ternary Bonsai

2-bit ternary

A ternary-quantised Qwen 3.5-class 27B that packs into 7.9 GB and runs on a 16 GB Mac — the quant is stock MLX 2-bit affine, no custom kernel. Strong on the mainstream: code, math, reasoning, EN/ZH general writing, and agent/framework tool-calling (verified end-to-end with the OpenAI SDK, LangChain, pydantic-ai, smolagents, and Aider). Known limit: strict Chinese classical regulated verse (五言绝句 with fixed rhyme) can trip a repetition loop — a niche edge case.

bonsai-27b-2bit

7.9 GB8-23 GB

27B ternary (stock MLX 2-bit affine) in 7.9 GB — fits a 16 GB Mac. ~46 tok/s on M3 Ultra. Strong at code/math/reasoning + EN/ZH writing + tool-calling.

Browse prism-ml/Ternary-Bonsai-27B-mlx-2bit

$ rapid-mlx pull bonsai-27b-2bit

MTP drafters

speculative decoding

Small drafter heads that pair with their parent Qwen 3.6 checkpoint. Pass --enable-mtp on rapid-mlx serve to activate speculative decode.

qwen3.6-35b-mtp-4bit

461 MBdrafter

Speculative-decode sidecar for qwen3.6-35b-8bit. Not standalone weights.

Browse mlx-community/Qwen3.6-35B-A3B-MTP-4bit

$ rapid-mlx pull qwen3.6-35b-mtp-4bit

qwen3.6-27b-mtp-4bit

241 MBdrafter

Speculative-decode sidecar for qwen3.6-27b-4bit. Not standalone weights.

Browse mlx-community/Qwen3.6-27B-MTP-4bit

$ rapid-mlx pull qwen3.6-27b-mtp-4bit

The rapid-mlx model mirror. One command away.

Tier-1 model families

Qwen 3.6

Gemma 4

DeepSeek R1

gpt-oss

Hunyuan 3 (HY3)

Ternary Bonsai

MTP drafters

Pick by Mac RAM

Qwen3.5-4B 4bit

Model catalog

Download directly — no rapid-mlx required