Google Releases Gemma 4: Four Open Models, Apache 2.0, and a 31B
AI

Google Releases Gemma 4: Four Open Models, Apache 2.0, and a 31B

Gemma 4 ships four open models under Apache 2.0 with frontier-level benchmarks

April 3, 2026
7 min read

Google's Most Significant Open Model Release Yet

On April 2, 2026, Google DeepMind released Gemma 4, the fourth generation of its open model family. The release ships four distinct model sizes, all built from the same research as Gemini 3, all natively multimodal, and all available under an Apache 2.0 license for the first time in the Gemma family's history.

The developer community has downloaded Gemma models more than 400 million times since the first generation launched, building a Gemmaverse of over 100,000 variants. Gemma 4 is designed to be the foundation that next wave builds on. The 31B dense model currently ranks third among all open models globally on the Arena AI text leaderboard. The 26B Mixture-of-Experts model ranks sixth, with only 3.8 billion active parameters during inference.

Both numbers are significant. But for enterprise teams that have been evaluating open models for production deployment, the Apache 2.0 license is the change that unblocks adoption that the benchmark numbers alone could not.

The Four Model Sizes and What Each Is For

Gemma 4 ships in two deployment tiers covering fundamentally different hardware targets.

The edge tier: E2B and E4B

The E-prefix models use a technique called Per-Layer Embeddings that feeds a secondary embedding signal into every decoder layer, allowing them to deliver more capability than their raw parameter count suggests. The E2B has 2.3 billion effective parameters with 5.1 billion total. The E4B has 4.5 billion effective parameters.

Both edge models support text, image, and audio input, with a 128K context window. They are designed to run completely offline on phones, Raspberry Pi, and NVIDIA Jetson Orin Nano with near-zero latency. Google partnered with the Pixel team, Qualcomm, MediaTek, and Arm to optimize these models for mobile hardware. Initial Arm engineering tests on the E2B show 5.5x speedup in prefill and up to 1.6x faster decode using Armv9 CPU capabilities.

Android developers can access both models today through the AICore Developer Preview, with forward compatibility guaranteed for Gemini Nano 4, which will ship on production devices later this year.

On benchmarks, the E4B scores 42.5% on AIME 2026 and 52.0% on LiveCodeBench. The E2B scores 37.5% and 44.0% respectively. Both significantly outperform Gemma 3 27B on most benchmarks despite being a fraction of the size, driven by the built-in reasoning capability.

The workstation tier: 26B A4B MoE and 31B Dense

The 26B A4B uses a Mixture-of-Experts architecture with 128 experts, of which a subset totaling approximately 3.8 billion active parameters is used for each forward pass. This delivers the quality of a much larger model at a fraction of the compute cost, with 256K context window support. It ranks sixth globally on Arena AI.

The 31B Dense model is the flagship. It scores 89.2% on AIME 2026, up from 20.8% for Gemma 3 27B. It scores 80.0% on LiveCodeBench v6 and reaches a Codeforces ELO of 2,150. On BigBench Extra Hard, the jump is from 19.3% for Gemma 3 to 74.4% for Gemma 4 31B. On vision benchmarks, MMMU Pro reaches 76.9% and MATH-Vision hits 85.6%. It ranks third globally on the Arena AI text leaderboard.

Both workstation models support text and image input with variable resolution and aspect ratio, video comprehension up to 60 seconds at 1 fps, and 256K context windows that allow full code repositories or long documents in a single prompt.

What Is New Architecturally

Gemma 4 introduces several confirmed architectural changes from the official model card.

Alternating attention layers interleave local sliding-window attention of 512 to 1,024 tokens with global full-context attention. The final layer is always global. This design delivers the memory efficiency of a lightweight model without sacrificing long-range understanding.

Dual RoPE uses standard rotary position embeddings for sliding-window layers and proportional RoPE for global layers. This enables the 256K context window on the larger models without the quality degradation at long distances that typically accompanies extended context.

Shared KV cache reuses key/value tensors from the last N layers across earlier layers, reducing both memory and compute during inference.

Native function calling was trained into the model from the ground up rather than relying on instruction-following to produce structured output. Google's FunctionGemma research informed this capability. The result is native support for structured JSON output, multi-step planning, and configurable extended thinking mode across all four model sizes. On the agentic side, the models can also output bounding boxes for UI element detection, enabling browser automation and screen-parsing agents.

The Apache 2.0 License Change Is the Real Business Story

For two years, Gemma models competed on benchmarks while operating under a custom license that created friction for enterprise adoption. The custom license included use restrictions, content policies with ambiguous enforcement, and terms that Google could update unilaterally. Legal review added time and cost. Compliance teams flagged edge cases. Many enterprise teams chose Qwen or Mistral instead, not because the models were better, but because the licensing was cleaner.

Gemma 4 ships under Apache 2.0. No monthly active user caps. No acceptable-use policy enforcement. No restrictions on commercial deployment, redistribution, or fine-tuning. No risk of Google pulling access.

For organizations building products on open models, deploying in air-gapped or sovereign environments, or embedding AI in systems where the license terms need to be unambiguous for legal and compliance review, this change removes the primary structural barrier that had kept Gemma off the shortlist.

As The Register notes, the timing is deliberate. Chinese labs including Alibaba, Moonshot AI, and Z.AI have been releasing increasingly capable open-weight models, some rivaling proprietary offerings. Google is moving in the opposite direction from the Chinese labs that have recently pulled back from fully open releases, competing directly on both capability and licensing terms.

Where to Get It and How to Deploy

Model weights are available immediately on Hugging Face, Kaggle, and via Ollama. The 31B and 26B models are accessible through Google AI Studio. The E4B and E2B are available in the AI Edge Gallery app.

Day-one framework support includes Hugging Face Transformers, vLLM, llama.cpp, MLX for Apple Silicon, LM Studio, and transformers.js for in-browser inference. NVIDIA is distributing Gemma 4 through RTX AI Garage for local inference on consumer RTX GPUs.

The hardware requirements are practical: the E2B and E4B run on smartphones and 8GB laptops. The 26B MoE runs on a 24GB GPU with Q4 quantization. The 31B Dense runs on a single 80GB H100 unquantized. On Arm CPUs with SME2 capabilities, the E2B shows a 5.5x prefill speedup, making it viable on high-end Android devices without AI accelerators.

Conclusion

Gemma 4 is the most significant open model release Google has made. The benchmark numbers demonstrate a generational leap from Gemma 3, particularly in reasoning, coding, and multimodal tasks. The architecture innovations, including alternating attention, dual RoPE, and natively trained function calling, are real improvements over the prior generation, not incremental parameter scaling.

The Apache 2.0 license change is the decision that will have the most durable commercial impact. It positions Gemma 4 as a serious option for enterprise teams that had been waiting for Google to compete on the same licensing terms as the rest of the open model ecosystem. That wait is over.

If you are looking for guidance on evaluating and integrating open models like Gemma 4 into your product or infrastructure stack, including deployment architecture, fine-tuning strategy, and agentic workflow design, please reach out to MonkDA. We work with development teams building AI-integrated products at every stage of adoption.

Frequently Asked Questions

Ready to take your idea to market?

Let's talk about how MonkDA can turn your vision into a powerful digital product.