Open-Source LLMs Compared 2026 – 25+ Models You Should Know

7. März 2026Aktualisiert: 13. April 20269 min LesezeitDeep Dive

TL;DR: „25+ open-source LLMs compared side by side: Gemma 4 (26B, 85 t/s on consumer hardware), Hunter Alpha (1T), Llama 4, Qwen3.5, DeepSeek-R1, Nemotron Cascade 2, Mistral, and more. With GitHub stats, hardware requirements, and a decision guide."

— Till Freitag

Last updated: April 2026 – GitHub stars and model versions are updated regularly. New: Gemma 4, Nemotron Cascade 2, and Kimi K2.5 added.

Why Open-Source LLMs Matter Now

2025 was the year open-source LLMs closed the gap with proprietary models. In 2026, they're on par in many areas – or better. For businesses, that means more control, less vendor lock-in, and better GDPR compliance.

This article gives you a comprehensive overview of the most important open-source LLMs – with real GitHub data, hardware requirements, and clear recommendations.

The Big Comparison Table

Model	Provider	Parameters	GitHub ⭐	License	Standout Feature
Gemma 4 🆕	Google	26B (MoE)	8,500+	Gemma License	85 t/s on consumer hardware, 256K context
Nemotron Cascade 2 🆕	NVIDIA	30B	3,500+	NVIDIA Open	~54 t/s locally, optimized for inference
Hunter Alpha	Anonymous (via OpenRouter)	1T (~32B active)	–	Unknown	Largest free model, 1M context
Kimi K2.5	Moonshot AI	1T (32B active)	5,000+	Modified MIT	Agent Swarm (100 sub-agents), multimodal
Llama 4 Scout	Meta	109B (17B active)	7,500+	Llama License	10M token context
Llama 4 Maverick	Meta	400B (17B active)	7,500+	Llama License	Meta's best MoE model
Qwen3.5-122B	Alibaba	122B (10B active)	27,000+	Apache 2.0	Beats GPT-5-mini
Qwen3-235B	Alibaba	235B	27,000+	Apache 2.0	Thinking mode
DeepSeek-R1	DeepSeek	671B (37B active)	102,000+	MIT	Chain-of-thought reasoning
DeepSeek-V3	DeepSeek	671B (37B active)	102,000+	MIT	Multi-token prediction
Mistral Large 2	Mistral	123B	10,700+	Apache 2.0	128k context, 80+ languages
Mixtral 8x22B	Mistral	141B (39B active)	10,700+	Apache 2.0	Sparse MoE pioneer
Gemma 3	Google	1B–27B	6,800+	Gemma License	Multimodal, on-device
Phi-4	Microsoft	14B	12,000+	MIT	Reasoning on small hardware
Phi-4-Mini	Microsoft	3.8B	12,000+	MIT	Runs on smartphones
Command R+	Cohere	104B	3,200+	CC-BY-NC	RAG-optimized, 10 languages
Yi-1.5	01.AI	6B–34B	7,800+	Apache 2.0	Strong multilingual support
DBRX	Databricks	132B (36B active)	3,200+	Databricks Open	Enterprise MoE
Falcon 3	TII	1B–10B	2,000+	Apache 2.0	UAE research project
StableLM 2	Stability AI	1.6B–12B	8,500+	Stability License	Compact & efficient
InternLM 3	Shanghai AI Lab	8B	7,200+	Apache 2.0	Long context up to 1M
OLMo 2	AI2	7B–13B	6,400+	Apache 2.0	Fully open (data + code)
Jamba 1.5	AI21 Labs	52B (12B active)	900+	Apache 2.0	Mamba-Transformer hybrid
StarCoder 2	BigCode	3B–15B	2,000+	BigCode OpenRAIL-M	Code specialist
CodeLlama	Meta	7B–70B	16,400+	Llama License	Code generation & infilling
DeepSeek-Coder-V2	DeepSeek	236B (21B active)	12,000+	MIT	Code + math specialist
Qwen2.5-Coder	Alibaba	0.5B–32B	27,000+	Apache 2.0	Code completion, multi-lang

Top Models in Detail

🔥 Gemma 4 (Google) 🆕

Google's new MoE flagship: 26B parameters, just 14 GB, 85 tokens per second on consumer hardware. The model that definitively blurs the line between cloud and local intelligence. → Our Gemma 4 deep dive

Strengths:

85 t/s on an AMD Ryzen AI MAX+ with 128 GB RAM
256K context window for long document analysis
Function calling that actually works
GPT-4-quality complex reasoning – locally, no cloud required

Weaknesses:

Gemma License (not pure Apache 2.0)
MoE architecture – not all frameworks support it natively
No video input yet

GitHub: github.com/google/gemma.cpp · 8,500+ ⭐

🐉 Hunter Alpha → Xiaomi MiMo-V2-Pro (formerly "likely DeepSeek V4")

⚡ Update April 2026: Hunter Alpha was confirmed on March 18, 2026 as Xiaomi's MiMo-V2-Pro – it was never DeepSeek V4. The team is led by Luo Fuli, a former DeepSeek engineer. → The full story · → China's AI Offensive: The Analysis

The largest AI model available on OpenRouter: >1 trillion parameters, with ~42B active parameters per token. Originally launched anonymously on March 11, 2026, now commercially available under Xiaomi's MiMo brand.

Strengths:

1T parameters with ~42B active (MoE) – largest available model
1M token context window
ClawEval 61.5 – strong agentic performance
Known provider (Xiaomi, publicly listed)
Open source planned after stabilization

Weaknesses:

No longer free ($1–2 / MTok input, $3–6 / MTok output)
Not locally runnable (OpenRouter API only for now)
Privacy: OpenRouter logging policies still apply

Access: openrouter.ai/xiaomi/mimo-v2-pro

🌙 Kimi K2.5 (Moonshot AI)

Beijing-based Moonshot AI's flagship: 1 trillion parameters with MoE (32B active), 384 experts, and a unique Agent Swarm architecture. → The Cursor controversy: Why Composer 2 runs on Kimi K2.5

Strengths:

Agent Swarm: coordinates up to 100 sub-agents for complex tasks
Multimodal (text + image + video)
AIME 2025: 96.1% – beats all frontier models on math reasoning
Modified MIT license – commercial use free below 100M MAU

Weaknesses:

Very large – local deployment requires high-end hardware (128 GB+ RAM)
Chinese provider – compliance considerations
Modified MIT adds attribution requirements above thresholds

GitHub: github.com/MoonshotAI/Kimi-K2.5 · 5,000+ ⭐

⚡ Nemotron Cascade 2 (NVIDIA) 🆕

NVIDIA's new inference-optimized model: 30B parameters, runs at ~54 t/s on Project KNUT (RTX 4060 Ti + RTX 3060). Specifically designed for fast local inference. → Project KNUT: Local AI Infrastructure

Strengths:

54 t/s on consumer GPUs – 15x faster than human speech
Quality comparable to GPT-4o mini
Optimized for NVIDIA hardware (CUDA)

Weaknesses:

NVIDIA license (not Apache 2.0)
Primarily designed for NVIDIA GPUs
Still relatively small community

🦙 Llama 4 (Meta)

Meta's latest generation comes in two flavors: Scout (109B, 10M context) and Maverick (400B, for quality). Both use Mixture-of-Experts – only 17B parameters are active per query.

Strengths:

Largest context window of any open-source model (10M tokens with Scout)
Strong community and ecosystem
Multimodal (text + image)

Weaknesses:

Llama License isn't "true" open source (commercial restrictions above 700M MAU)
Large models require significant hardware

GitHub: github.com/meta-llama/llama-models · 7,500+ ⭐

🌐 Qwen3.5 (Alibaba)

Currently the strongest open-source MoE model. 122B parameters, only 10B active – runs on a MacBook with 64 GB RAM. → Our Qwen3.5 deep dive

Strengths:

Beats GPT-5-mini on most benchmarks
Apache 2.0 – true open source
262k context window (expandable to 1M)

Weaknesses:

No multimodal (text only)
Chinese provider – compliance concern for some organizations

GitHub: github.com/QwenLM/Qwen3 · 27,000+ ⭐

🔬 DeepSeek-R1

The model that shook the AI world in early 2025. 671B parameters with MoE (37B active), specialized in chain-of-thought reasoning.

Strengths:

Reasoning quality on GPT-o1 level
MIT license – maximum freedom
"Thinking" mode shows the reasoning process

Weaknesses:

Very large – local use only with high-end hardware
Chinese provider

GitHub: github.com/deepseek-ai/DeepSeek-V3 · 102,000+ ⭐

🌊 Mistral Large 2

Mistral's flagship: 123B parameters, 128k context, 80+ languages. Europe's counterweight to the US and Chinese models.

Strengths:

European provider (Paris) – easier GDPR narrative
Strong multilingual capabilities
Apache 2.0

Weaknesses:

Smaller community than Llama or Qwen
Fewer specialized variants

GitHub: github.com/mistralai/mistral-inference · 10,700+ ⭐

💎 Gemma 3 (Google)

Google's open model family from 1B to 27B – optimized for on-device use. Multimodal from 4B. Now the predecessor to Gemma 4, but still relevant for edge deployments.

Strengths:

Multimodal (text + image) even in small variants
Runs on smartphones and Raspberry Pi
ShieldGemma for safety

Weaknesses:

Gemma License has usage policies (not pure Apache 2.0)
Maximum size only 27B

GitHub: github.com/google/gemma.cpp · 6,800+ ⭐

🧠 Phi-4 (Microsoft)

Microsoft's "Small Language Model" with 14B parameters that beats larger models at reasoning tasks.

Strengths:

Outstanding quality per parameter
MIT license
Runs on consumer hardware

Weaknesses:

No multimodal in the base variant
Small context window (16k)

GitHub: github.com/microsoft/phi-4 · 12,000+ ⭐

Coding LLMs Compared

For developers, there are specialized code models:

Model	Parameters	Languages	Standout Feature
StarCoder 2	3B–15B	600+	Trained on The Stack v2
CodeLlama	7B–70B	~20	Infilling & long contexts
DeepSeek-Coder-V2	236B (21B active)	300+	Code + math combined
Qwen2.5-Coder	0.5B–32B	90+	Best open-source code model per size

Our recommendation: Qwen2.5-Coder-32B for maximum quality, StarCoder 2-3B if it needs to run locally on a laptop.

Decision Matrix: Which Model for Which Use Case?

Your Use Case	Recommended Model	Why
Frontier quality locally	🆕 Gemma 4 (26B)	GPT-4 level, 85 t/s, 14 GB
Agentic tasks & multi-step workflows	Hunter Alpha or Kimi K2.5	1T parameters, Agent Swarm
Analyze GDPR-sensitive documents	Qwen3.5-122B locally	Best quality/resource ratio
Code generation & refactoring	Qwen2.5-Coder-32B	Beats larger models at code
Complex reasoning	DeepSeek-R1	Chain-of-thought at GPT-o1 level
Fast local inference	🆕 Nemotron Cascade 2	54 t/s on consumer GPUs
Run on smartphone/edge	Gemma 3 (4B) or Phi-4-Mini	Optimized for minimal hardware
RAG with company data	Command R+	Built for Retrieval-Augmented Generation
Maximum context (long documents)	Llama 4 Scout	10M token context window
European provider required	Mistral Large 2	French company, Apache 2.0
Fully open training data	OLMo 2	Only model with completely open data
Multi-agent workflows	Kimi K2.5 or DeepSeek-V3	Agent Swarm with 100 sub-agents (Kimi)

Hardware Guide: What Do You Actually Need?

RAM / VRAM	Models (quantized, Q4)	Example Hardware
8 GB	Phi-4-Mini, Gemma 3 (1B–4B)	MacBook Air M3, RTX 3060
16 GB	Phi-4, Gemma 3 (12B), Gemma 4 (26B, Q4), Yi-1.5-9B	MacBook Pro M3, RTX 4070
32 GB	Mistral 7B, Llama 3.3-8B, Qwen2.5-14B, Nemotron Cascade 2	MacBook Pro M4, RTX 4090
64 GB	Qwen3.5-122B, Mixtral 8x22B	MacBook Pro M4 Max
128 GB+	DeepSeek-R1, Llama 4 Maverick, Kimi K2.5, Gemma 4 (FP16)	Multi-GPU server, Mac Studio Ultra

Licenses: The Devil in the Details

Not every "open-source" model is equally open:

License	Models	Commercial Use	Restrictions
Apache 2.0	Qwen, Mistral, Yi, Falcon, OLMo	✅ Unrestricted	None
MIT	DeepSeek, Phi	✅ Unrestricted	None
Llama License	Llama 4, CodeLlama	✅ Up to 700M MAU	Above 700M MAU: Meta license needed
Gemma License	Gemma 3, Gemma 4	✅ With conditions	Usage policies apply
CC-BY-NC	Command R+	❌ Non-commercial	Research & personal only
Modified MIT	Kimi K2.5	✅ Below 100M MAU	Attribution required above 100M MAU / $20M revenue
NVIDIA Open	Nemotron Cascade 2	✅ With conditions	NVIDIA usage terms

Tip: For commercial projects, prefer Apache 2.0 or MIT. With Llama, check carefully whether the usage terms fit your case.

How to Run Open-Source LLMs Locally

The easiest ways to start an open-source model on your machine:

Ollama – One command: ollama run gemma4 – done
LM Studio – GUI for non-developers, drag & drop GGUF models
vLLM – For production deployments with high throughput
llama.cpp – C++ runtime, maximum CPU performance

→ More about GGUF, GGML, and Safetensors

Our Take

The question is no longer "cloud or local?" – it's "which model for which task?". With Gemma 4, the answer has shifted again: frontier intelligence is now laptop-sized. Our recommendation:

Gemma 4 locally as the new default for most tasks
Cloud APIs for customer chatbots and creative tasks (Claude, GPT-5)
Open source locally for sensitive data, bulk processing, and prototyping
Hybrid architecture as the goal: the best model for every job, regardless of provider

The future doesn't belong to one model – it belongs to the architecture that's flexible enough to use any model.

→ Our AI services → Gemma 4: Frontier intelligence goes laptop-sized → Project KNUT: Local AI infrastructure with 52 GB VRAM → Hunter Alpha: The world's largest free AI model → Kimi K2.5: The model behind Cursor's Composer 2 → Qwen3.5 deep dive: 122B parameters on your laptop → AI agents compared