Open-Source LLMs Compared 2026 – 25+ Models You Should Know

    Open-Source LLMs Compared 2026 – 25+ Models You Should Know

    Malte LenschMalte Lensch7. März 2026Aktualisiert: 13. April 20269 min LesezeitDeep Dive
    Till Freitag

    TL;DR: „25+ open-source LLMs compared side by side: Gemma 4 (26B, 85 t/s on consumer hardware), Hunter Alpha (1T), Llama 4, Qwen3.5, DeepSeek-R1, Nemotron Cascade 2, Mistral, and more. With GitHub stats, hardware requirements, and a decision guide."

    — Till Freitag

    Last updated: April 2026 – GitHub stars and model versions are updated regularly. New: Gemma 4, Nemotron Cascade 2, and Kimi K2.5 added.

    Why Open-Source LLMs Matter Now

    2025 was the year open-source LLMs closed the gap with proprietary models. In 2026, they're on par in many areas – or better. For businesses, that means more control, less vendor lock-in, and better GDPR compliance.

    This article gives you a comprehensive overview of the most important open-source LLMs – with real GitHub data, hardware requirements, and clear recommendations.

    The Big Comparison Table

    Model Provider Parameters GitHub ⭐ License Standout Feature
    Gemma 4 🆕 Google 26B (MoE) 8,500+ Gemma License 85 t/s on consumer hardware, 256K context
    Nemotron Cascade 2 🆕 NVIDIA 30B 3,500+ NVIDIA Open ~54 t/s locally, optimized for inference
    Hunter Alpha Anonymous (via OpenRouter) 1T (~32B active) Unknown Largest free model, 1M context
    Kimi K2.5 Moonshot AI 1T (32B active) 5,000+ Modified MIT Agent Swarm (100 sub-agents), multimodal
    Llama 4 Scout Meta 109B (17B active) 7,500+ Llama License 10M token context
    Llama 4 Maverick Meta 400B (17B active) 7,500+ Llama License Meta's best MoE model
    Qwen3.5-122B Alibaba 122B (10B active) 27,000+ Apache 2.0 Beats GPT-5-mini
    Qwen3-235B Alibaba 235B 27,000+ Apache 2.0 Thinking mode
    DeepSeek-R1 DeepSeek 671B (37B active) 102,000+ MIT Chain-of-thought reasoning
    DeepSeek-V3 DeepSeek 671B (37B active) 102,000+ MIT Multi-token prediction
    Mistral Large 2 Mistral 123B 10,700+ Apache 2.0 128k context, 80+ languages
    Mixtral 8x22B Mistral 141B (39B active) 10,700+ Apache 2.0 Sparse MoE pioneer
    Gemma 3 Google 1B–27B 6,800+ Gemma License Multimodal, on-device
    Phi-4 Microsoft 14B 12,000+ MIT Reasoning on small hardware
    Phi-4-Mini Microsoft 3.8B 12,000+ MIT Runs on smartphones
    Command R+ Cohere 104B 3,200+ CC-BY-NC RAG-optimized, 10 languages
    Yi-1.5 01.AI 6B–34B 7,800+ Apache 2.0 Strong multilingual support
    DBRX Databricks 132B (36B active) 3,200+ Databricks Open Enterprise MoE
    Falcon 3 TII 1B–10B 2,000+ Apache 2.0 UAE research project
    StableLM 2 Stability AI 1.6B–12B 8,500+ Stability License Compact & efficient
    InternLM 3 Shanghai AI Lab 8B 7,200+ Apache 2.0 Long context up to 1M
    OLMo 2 AI2 7B–13B 6,400+ Apache 2.0 Fully open (data + code)
    Jamba 1.5 AI21 Labs 52B (12B active) 900+ Apache 2.0 Mamba-Transformer hybrid
    StarCoder 2 BigCode 3B–15B 2,000+ BigCode OpenRAIL-M Code specialist
    CodeLlama Meta 7B–70B 16,400+ Llama License Code generation & infilling
    DeepSeek-Coder-V2 DeepSeek 236B (21B active) 12,000+ MIT Code + math specialist
    Qwen2.5-Coder Alibaba 0.5B–32B 27,000+ Apache 2.0 Code completion, multi-lang

    Top Models in Detail

    🔥 Gemma 4 (Google) 🆕

    Google's new MoE flagship: 26B parameters, just 14 GB, 85 tokens per second on consumer hardware. The model that definitively blurs the line between cloud and local intelligence. → Our Gemma 4 deep dive

    Strengths:

    • 85 t/s on an AMD Ryzen AI MAX+ with 128 GB RAM
    • 256K context window for long document analysis
    • Function calling that actually works
    • GPT-4-quality complex reasoning – locally, no cloud required

    Weaknesses:

    • Gemma License (not pure Apache 2.0)
    • MoE architecture – not all frameworks support it natively
    • No video input yet

    GitHub: github.com/google/gemma.cpp · 8,500+ ⭐


    🐉 Hunter Alpha → Xiaomi MiMo-V2-Pro (formerly "likely DeepSeek V4")

    Update April 2026: Hunter Alpha was confirmed on March 18, 2026 as Xiaomi's MiMo-V2-Pro – it was never DeepSeek V4. The team is led by Luo Fuli, a former DeepSeek engineer. → The full story · → China's AI Offensive: The Analysis

    The largest AI model available on OpenRouter: >1 trillion parameters, with ~42B active parameters per token. Originally launched anonymously on March 11, 2026, now commercially available under Xiaomi's MiMo brand.

    Strengths:

    • 1T parameters with ~42B active (MoE) – largest available model

    • 1M token context window
    • ClawEval 61.5 – strong agentic performance
    • Known provider (Xiaomi, publicly listed)
    • Open source planned after stabilization

    Weaknesses:

    • No longer free ($1–2 / MTok input, $3–6 / MTok output)
    • Not locally runnable (OpenRouter API only for now)
    • Privacy: OpenRouter logging policies still apply

    Access: openrouter.ai/xiaomi/mimo-v2-pro


    🌙 Kimi K2.5 (Moonshot AI)

    Beijing-based Moonshot AI's flagship: 1 trillion parameters with MoE (32B active), 384 experts, and a unique Agent Swarm architecture. → The Cursor controversy: Why Composer 2 runs on Kimi K2.5

    Strengths:

    • Agent Swarm: coordinates up to 100 sub-agents for complex tasks
    • Multimodal (text + image + video)
    • AIME 2025: 96.1% – beats all frontier models on math reasoning
    • Modified MIT license – commercial use free below 100M MAU

    Weaknesses:

    • Very large – local deployment requires high-end hardware (128 GB+ RAM)
    • Chinese provider – compliance considerations
    • Modified MIT adds attribution requirements above thresholds

    GitHub: github.com/MoonshotAI/Kimi-K2.5 · 5,000+ ⭐


    ⚡ Nemotron Cascade 2 (NVIDIA) 🆕

    NVIDIA's new inference-optimized model: 30B parameters, runs at ~54 t/s on Project KNUT (RTX 4060 Ti + RTX 3060). Specifically designed for fast local inference. → Project KNUT: Local AI Infrastructure

    Strengths:

    • 54 t/s on consumer GPUs – 15x faster than human speech
    • Quality comparable to GPT-4o mini
    • Optimized for NVIDIA hardware (CUDA)

    Weaknesses:

    • NVIDIA license (not Apache 2.0)
    • Primarily designed for NVIDIA GPUs
    • Still relatively small community

    🦙 Llama 4 (Meta)

    Meta's latest generation comes in two flavors: Scout (109B, 10M context) and Maverick (400B, for quality). Both use Mixture-of-Experts – only 17B parameters are active per query.

    Strengths:

    • Largest context window of any open-source model (10M tokens with Scout)
    • Strong community and ecosystem
    • Multimodal (text + image)

    Weaknesses:

    • Llama License isn't "true" open source (commercial restrictions above 700M MAU)
    • Large models require significant hardware

    GitHub: github.com/meta-llama/llama-models · 7,500+ ⭐


    🌐 Qwen3.5 (Alibaba)

    Currently the strongest open-source MoE model. 122B parameters, only 10B active – runs on a MacBook with 64 GB RAM. → Our Qwen3.5 deep dive

    Strengths:

    • Beats GPT-5-mini on most benchmarks
    • Apache 2.0 – true open source
    • 262k context window (expandable to 1M)

    Weaknesses:

    • No multimodal (text only)
    • Chinese provider – compliance concern for some organizations

    GitHub: github.com/QwenLM/Qwen3 · 27,000+ ⭐


    🔬 DeepSeek-R1

    The model that shook the AI world in early 2025. 671B parameters with MoE (37B active), specialized in chain-of-thought reasoning.

    Strengths:

    • Reasoning quality on GPT-o1 level
    • MIT license – maximum freedom
    • "Thinking" mode shows the reasoning process

    Weaknesses:

    • Very large – local use only with high-end hardware
    • Chinese provider

    GitHub: github.com/deepseek-ai/DeepSeek-V3 · 102,000+ ⭐


    🌊 Mistral Large 2

    Mistral's flagship: 123B parameters, 128k context, 80+ languages. Europe's counterweight to the US and Chinese models.

    Strengths:

    • European provider (Paris) – easier GDPR narrative
    • Strong multilingual capabilities
    • Apache 2.0

    Weaknesses:

    • Smaller community than Llama or Qwen
    • Fewer specialized variants

    GitHub: github.com/mistralai/mistral-inference · 10,700+ ⭐


    💎 Gemma 3 (Google)

    Google's open model family from 1B to 27B – optimized for on-device use. Multimodal from 4B. Now the predecessor to Gemma 4, but still relevant for edge deployments.

    Strengths:

    • Multimodal (text + image) even in small variants
    • Runs on smartphones and Raspberry Pi
    • ShieldGemma for safety

    Weaknesses:

    • Gemma License has usage policies (not pure Apache 2.0)
    • Maximum size only 27B

    GitHub: github.com/google/gemma.cpp · 6,800+ ⭐


    🧠 Phi-4 (Microsoft)

    Microsoft's "Small Language Model" with 14B parameters that beats larger models at reasoning tasks.

    Strengths:

    • Outstanding quality per parameter
    • MIT license
    • Runs on consumer hardware

    Weaknesses:

    • No multimodal in the base variant
    • Small context window (16k)

    GitHub: github.com/microsoft/phi-4 · 12,000+ ⭐


    Coding LLMs Compared

    For developers, there are specialized code models:

    Model Parameters Languages Standout Feature
    StarCoder 2 3B–15B 600+ Trained on The Stack v2
    CodeLlama 7B–70B ~20 Infilling & long contexts
    DeepSeek-Coder-V2 236B (21B active) 300+ Code + math combined
    Qwen2.5-Coder 0.5B–32B 90+ Best open-source code model per size

    Our recommendation: Qwen2.5-Coder-32B for maximum quality, StarCoder 2-3B if it needs to run locally on a laptop.

    Decision Matrix: Which Model for Which Use Case?

    Your Use Case Recommended Model Why
    Frontier quality locally 🆕 Gemma 4 (26B) GPT-4 level, 85 t/s, 14 GB
    Agentic tasks & multi-step workflows Hunter Alpha or Kimi K2.5 1T parameters, Agent Swarm
    Analyze GDPR-sensitive documents Qwen3.5-122B locally Best quality/resource ratio
    Code generation & refactoring Qwen2.5-Coder-32B Beats larger models at code
    Complex reasoning DeepSeek-R1 Chain-of-thought at GPT-o1 level
    Fast local inference 🆕 Nemotron Cascade 2 54 t/s on consumer GPUs
    Run on smartphone/edge Gemma 3 (4B) or Phi-4-Mini Optimized for minimal hardware
    RAG with company data Command R+ Built for Retrieval-Augmented Generation
    Maximum context (long documents) Llama 4 Scout 10M token context window
    European provider required Mistral Large 2 French company, Apache 2.0
    Fully open training data OLMo 2 Only model with completely open data
    Multi-agent workflows Kimi K2.5 or DeepSeek-V3 Agent Swarm with 100 sub-agents (Kimi)

    Hardware Guide: What Do You Actually Need?

    RAM / VRAM Models (quantized, Q4) Example Hardware
    8 GB Phi-4-Mini, Gemma 3 (1B–4B) MacBook Air M3, RTX 3060
    16 GB Phi-4, Gemma 3 (12B), Gemma 4 (26B, Q4), Yi-1.5-9B MacBook Pro M3, RTX 4070
    32 GB Mistral 7B, Llama 3.3-8B, Qwen2.5-14B, Nemotron Cascade 2 MacBook Pro M4, RTX 4090
    64 GB Qwen3.5-122B, Mixtral 8x22B MacBook Pro M4 Max
    128 GB+ DeepSeek-R1, Llama 4 Maverick, Kimi K2.5, Gemma 4 (FP16) Multi-GPU server, Mac Studio Ultra

    Licenses: The Devil in the Details

    Not every "open-source" model is equally open:

    License Models Commercial Use Restrictions
    Apache 2.0 Qwen, Mistral, Yi, Falcon, OLMo ✅ Unrestricted None
    MIT DeepSeek, Phi ✅ Unrestricted None
    Llama License Llama 4, CodeLlama ✅ Up to 700M MAU Above 700M MAU: Meta license needed
    Gemma License Gemma 3, Gemma 4 ✅ With conditions Usage policies apply
    CC-BY-NC Command R+ ❌ Non-commercial Research & personal only
    Modified MIT Kimi K2.5 ✅ Below 100M MAU Attribution required above 100M MAU / $20M revenue
    NVIDIA Open Nemotron Cascade 2 ✅ With conditions NVIDIA usage terms

    Tip: For commercial projects, prefer Apache 2.0 or MIT. With Llama, check carefully whether the usage terms fit your case.

    How to Run Open-Source LLMs Locally

    The easiest ways to start an open-source model on your machine:

    1. Ollama – One command: ollama run gemma4 – done
    2. LM Studio – GUI for non-developers, drag & drop GGUF models
    3. vLLM – For production deployments with high throughput
    4. llama.cpp – C++ runtime, maximum CPU performance

    → More about GGUF, GGML, and Safetensors

    Our Take

    The question is no longer "cloud or local?" – it's "which model for which task?". With Gemma 4, the answer has shifted again: frontier intelligence is now laptop-sized. Our recommendation:

    • Gemma 4 locally as the new default for most tasks
    • Cloud APIs for customer chatbots and creative tasks (Claude, GPT-5)
    • Open source locally for sensitive data, bulk processing, and prototyping
    • Hybrid architecture as the goal: the best model for every job, regardless of provider

    The future doesn't belong to one model – it belongs to the architecture that's flexible enough to use any model.


    → Our AI services → Gemma 4: Frontier intelligence goes laptop-sized → Project KNUT: Local AI infrastructure with 52 GB VRAM → Hunter Alpha: The world's largest free AI model → Kimi K2.5: The model behind Cursor's Composer 2 → Qwen3.5 deep dive: 122B parameters on your laptop → AI agents compared

    TeilenLinkedInWhatsAppE-Mail

    Verwandte Artikel

    Hunter Alpha Unmasked: Not DeepSeek V4, but Xiaomi's MiMo-V2-Pro
    13. März 20264 min

    Hunter Alpha Unmasked: Not DeepSeek V4, but Xiaomi's MiMo-V2-Pro

    Hunter Alpha wasn't DeepSeek V4 – it was Xiaomi's MiMo-V2-Pro. We correct our analysis, explain what happened, and look …

    Weiterlesen
    Geopolitical AI landscape between western and eastern technologyDeep Dive
    13. April 20268 min

    China's AI Offensive: From Hunter Alpha to DeepSeek V4 on Huawei Chips

    An anonymous 1T model, a DeepSeek mix-up, and the reveal that Xiaomi was behind it. Meanwhile, DeepSeek V4 on Huawei chi…

    Weiterlesen
    Gemma 4 AI model running on a compact mini PC – frontier intelligence goes local
    6. April 20264 min

    Gemma 4: Frontier Intelligence Goes Laptop-Sized – The Hype Is Real

    Google's Gemma 4 delivers GPT-4 level intelligence in 14 GB. 85 tokens per second on consumer hardware, 256K context, na…

    Weiterlesen
    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClawDeep Dive
    21. Februar 202610 min

    The Best OpenClaw Alternatives 2026 – from NanoClaw to NullClaw

    OpenClaw has 160,000+ GitHub stars – but not everyone needs 430,000 lines of code. We compare the best alternatives in 2…

    Weiterlesen
    Why We Switched from ChatGPT to Claude – and What We Learned About LLMs Along the Way
    20. Februar 20265 min

    Why We Switched from ChatGPT to Claude – and What We Learned About LLMs Along the Way

    We worked with ChatGPT for 18 months – then switched to Claude. Here's our honest comparison of all major LLMs and why C…

    Weiterlesen
    Visualization of Kimi K2.6 long-horizon agents: a Moonshot crescent symbol alongside distributed sub-agent nodes over a coordination gridDeep Dive
    21. April 20268 min

    Kimi K2.6: The Most Interesting AI Optimization in 2026 Isn't Intelligence – It's Duration

    Moonshot AI open-sourced Kimi K2.6 yesterday. 1 trillion parameters, 300 sub-agents, 13 hours of autonomous code refacto…

    Weiterlesen
    Kimi K2.5: The Chinese Open-Weight Model Behind Cursor's Composer 2
    26. März 20264 min

    Kimi K2.5: The Chinese Open-Weight Model Behind Cursor's Composer 2

    Cursor's Composer 2 is secretly built on Moonshot AI's Kimi K2.5 – a 1 trillion parameter open-weight model from Beijing…

    Weiterlesen
    OpenClaw Pricing Shock: How to Avoid the $500 Bill
    5. April 20262 min

    OpenClaw Pricing Shock: How to Avoid the $500 Bill

    Anthropic just killed third-party tool coverage under Claude subscriptions. If you're running OpenClaw without prep, you…

    Weiterlesen
    Comparison of three orchestration tools Make, Claude Code and OpenClaw as stack layers
    21. März 20265 min

    Make vs. Claude Code vs. OpenClaw – Picking the Right Orchestration Layer (2026)

    Make.com, Claude Code, or OpenClaw? Three tools, three layers of the stack. Here's when to pick which orchestration tool…

    Weiterlesen