DeepMind Creates AlphaEvolve: An AI Agent That Invents

In May 2025, a machine rewrote a chapter of math history.

For over five decades, a method called Strassen’s algorithm held the crown for multiplying 4×4 matrices faster than the traditional way. It was a big deal in 1969. Strassen had shown you could skip some steps and still get the right answer, saving time and computing power. But in all those years, no one had found a better shortcut.

Then came AlphaEvolve.

This new AI agent, built by Google DeepMind and powered by the latest Gemini models, pulled off something no human had done. It discovered a fresh way to multiply 4×4 complex matrices using just 48 steps—beating Strassen’s 56-step approach. And it didn’t just guess. It built the algorithm, proved it worked, and passed all the tests.

That’s the big story: AlphaEvolve doesn’t just run code. It invents it.

By combining powerful AI models with an engine that learns and evolves over time, AlphaEvolve is starting to crack open problems that seemed out of reach. It’s not just making computers faster. It’s pointing to a future where machines help uncover new ideas in math, science, and engineering.

And this breakthrough isn’t just about numbers on a whiteboard. It’s already making a dent in real-world systems, saving power in data centers, speeding up AI training, and even redesigning computer chips.

How AlphaEvolve Thinks

To understand how AlphaEvolve pulls off such impressive discoveries, we need to look under the hood.

AlphaEvolve is a carefully designed system that blends creativity, memory, testing, and iteration into a single, fast-moving loop. Think of it as a kind of digital Darwinism where only the best code survives.

A New Approach to Old Problems

Designing better algorithms is one of the hardest jobs in computer science. For decades, humans have done it by hand, using experience, insight, and trial and error. Some of the best-known methods, like Strassen’s algorithm, have stood untouched for over 50 years. The problem is that as computers and problems get more complex, the need for better algorithms grows much faster than human discovery can keep up.

That’s where AlphaEvolve comes in. It’s built to speed up this process—not just to tweak code, but to invent entirely new ways of solving tough problems.

Built for Creativity and Control

AlphaEvolve was created by Google DeepMind in 2025. It works by combining two powerful AI brains, Gemini Flash and Gemini Pro, with an automated system that can test and refine its own work. It doesn’t just write a single function. It evolves entire systems of code that solve problems in new and efficient ways.

This setup marks a big leap forward from earlier AI tools, which were more focused on small, specific tasks. AlphaEvolve aims to be general-purpose, meaning it can be used across many fields—math, engineering, even chip design—as long as there’s a way to measure what “better” looks like.

The Building Blocks of AlphaEvolve

At the heart of AlphaEvolve is a looping system that keeps learning and improving over time. Here’s how it works:

Program Database

This is AlphaEvolve’s memory. It stores every past algorithm it has tried, along with scores for how well they performed. When it needs to try something new, it looks here for inspiration, using past wins to shape future experiments.

Prompt Sampler

This part takes information from the database and turns it into smart prompts for the AI. These prompts aren’t just “write code.” They include examples of good solutions, descriptions of the current problem, and feedback from earlier tests. All this context helps the AI make smarter changes.

LLM Duo: Gemini Flash and Gemini Pro

These are the engines that generate new ideas. Flash is fast and good for quick experiments. It tries lots of things quickly. Pro is slower but smarter, used for deeper thinking and big improvements. Together, they balance speed and depth.

Evaluator Pool

This is the judge. Every time a new piece of code is generated, it gets tested here. The evaluators check if it works, how fast it runs, how clean it is, and more. Only the best results move forward.

Controller Loop

This piece coordinates everything. It keeps the flow going between generation and evaluation, making sure as many ideas as possible are tested in a short amount of time.

Why Two AI Brains Are Better Than One

Gemini Flash and Pro play different roles, and that’s the key to AlphaEvolve’s success. Flash moves fast, trying lots of ideas quickly. It’s great for covering ground. Pro steps in when things get stuck or need a smarter fix. It adds depth and polish to promising directions.

This approach helps avoid common AI traps. If you only use a fast model, you might get stuck in a shallow idea. If you only use a deep model, you won’t explore enough ideas. Together, they let AlphaEvolve stay creative and effective.

And as Gemini models improve, so does AlphaEvolve. It’s a system that gets smarter as its tools evolve.

The Evolution Loop: Survival of the Fittest Code

AlphaEvolve follows a method inspired by evolution. It starts with a working piece of code. Then it changes it, tests the result, and keeps the versions that do best. Here’s the cycle:

The Prompt Sampler picks a few parent programs from the database.
It builds a detailed prompt with goals, past ideas, and hints.
Gemini generates new code—often just small changes or “diffs.”
These new versions (called child programs) are tested by the Evaluator Pool.
The best performers are added back into the database for future use.

Over time, this process uncovers smarter and faster solutions. Some end up looking very different from the original. That’s part of the magic. AlphaEvolve doesn’t just improve old ideas. It invents entirely new ones.

Catching Mistakes Before They Spread

One of the biggest problems with large language models is “hallucination.” They sometimes write code that looks fine but doesn’t actually work. AlphaEvolve avoids this by never trusting a single answer. Every suggestion is tested.

The Evaluator Pool runs each candidate and checks if it meets the criteria—correctness, speed, size, and so on. Only verified solutions move forward. This process is crucial, especially in fields like math and hardware, where even tiny errors can cause major problems.

By focusing on hard tests and real results, AlphaEvolve avoids a lot of the guesswork that affects other AI systems.

AlphaEvolve: The Highlights So Far

AlphaEvolve isn’t just a theoretical experiment. It’s already solving real problems—both in deep math and massive data centers. From rewriting century-old geometry challenges to making Google’s AI run faster, its reach is broad and growing. Here’s what it has achieved so far.

Cracking Open Classic Math Problems

One of AlphaEvolve’s most impressive feats is in pure mathematics, a space where progress often moves slowly.

A New Milestone in Matrix Math

In 1969, Strassen’s algorithm changed the way computers multiply matrices. It was a major leap at the time. For a specific case—4×4 complex-valued matrices—his method required 49 scalar multiplications. That record stood for 56 years.

Then AlphaEvolve broke it.

It found a method that uses just 48 multiplications. Even better, it works in a broader range of math systems than past AI attempts, like those from AlphaTensor, which were limited to simpler binary math. This means the new method can be used in real-world systems like graphics, physics simulations, and machine learning models.

And it didn’t stop there. AlphaEvolve also made improvements in 14 other matrix multiplication tasks. That’s a big deal for fields that rely heavily on fast, efficient computation.

Solving Long-Standing Math Mysteries

AlphaEvolve was tested on over 50 open math problems—tough puzzles in geometry, number theory, and beyond. It matched the best-known solutions in three-quarters of the cases. In about 20% of them, it found even better answers.

Some examples:

The Kissing Number Problem: In 11 dimensions, AlphaEvolve increased the known lower bound from 592 to 593 spheres.
Erdős’s Minimum Overlap Problem: AlphaEvolve set a new upper bound.

It also advanced problems in autocorrelation, the uncertainty principle, circle packing, and more. These aren’t just programming exercises. They’re deep, abstract challenges that usually require weeks or months of expert effort. With AlphaEvolve, new ideas can emerge in hours.

Making Google’s Infrastructure Smarter

AlphaEvolve isn’t only about math. It’s also driving big improvements inside Google’s real-world systems.

Boosting Data Center Efficiency

Google’s Borg system manages tasks across millions of servers. AlphaEvolve discovered a smarter way to schedule those tasks. The result: a 0.7% improvement in compute resource usage.

That may sound small—but at Google’s scale, it’s the equivalent of tens of thousands of virtual servers freed up. It’s like getting a whole new data center, without buying a single extra machine.

And it’s not a black-box solution. The AI-generated code is readable, debuggable, and easy to deploy. Engineers can understand it, tweak it, and trust it.

Helping Design Better AI Chips

AlphaEvolve also lent a hand in hardware design. It rewrote part of a Verilog file—the language used to describe circuits—for an upcoming Google TPU (Tensor Processing Unit). By removing unneeded bits in the design, it helped shrink the chip’s size and lower its power use. All while passing strict correctness tests.

This shows how AlphaEvolve can work directly with engineers, helping them find non-obvious improvements to physical hardware.

Speeding Up Gemini’s Own Training

Here’s where it gets wild: AlphaEvolve helped make Gemini—its own AI brain—train faster.

It found better ways to break down large matrix operations used in training. That sped up a critical type of computation called a Pallas kernel by 23%. Overall, that led to a 1% reduction in the time it takes to train Gemini models.

Since training a model like Gemini takes a huge amount of time and energy, even a 1% gain is meaningful. Plus, what used to take weeks of manual engineering was done in days through automated experiments.

Supercharging Transformer Code

AlphaEvolve also improved a key component in Transformer models, called FlashAttention. This involves code that’s normally off-limits for humans—it’s low-level and already optimized by compilers.

But AlphaEvolve went even deeper. It modified the XLA Intermediate Representation (IR), a layer of compiler code, and managed a 32% speedup. It also improved pre- and post-processing stages by about 15%.

That kind of performance gain doesn’t just speed things up. It helps engineers find bottlenecks they didn’t know existed, making future optimizations easier and faster.

One System, Many Domains

Part of what makes AlphaEvolve so powerful is its flexibility. It doesn’t just work in one programming language or type of task. It can operate across different layers of abstraction and fields.

Python: Used to define and run many experiments.
Verilog: For chip-level hardware tweaks.
JAX & Pallas: For speeding up AI training code.
XLA IR: To optimize at the compiler level.
Math: Tackling geometry, combinatorics, and number theory problems.

If a problem can be scored automatically, AlphaEvolve can try to solve it. That’s what gives it such broad potential—from designing circuits to discovering new mathematical truths.

What Makes AlphaEvolve Different from Competitors

AlphaEvolve might seem like just another AI code tool—but it’s not. To understand why it’s such a big leap, it helps to compare it to earlier systems that also tried to automate discovery. While many of those tools made important progress, AlphaEvolve steps up the game with broader capabilities, deeper thinking, and a smarter design.

From Functions to Full Codebases

DeepMind’s earlier system, FunSearch, came out in 2023. It used an AI-guided evolution method to find small functions—usually 10 to 20 lines of Python code—that solved math puzzles. It worked well, but only for simple problems where the AI could quickly test lots of ideas.

AlphaEvolve takes this idea and expands it dramatically. Instead of single functions, it can evolve entire codebases, including multiple interacting functions. It’s not limited to Python either. It works with other languages, including low-level ones like Verilog and compiler code.

It also handles much longer evaluation times by running tests in parallel across many machines. This lets it explore bigger, more complex problems than FunSearch ever could.

Outperforming a Specialist

In 2022, DeepMind launched AlphaTensor, a system focused purely on matrix multiplication. It used reinforcement learning to find new ways to multiply matrices faster, and it was a big deal in AI research.

But AlphaEvolve, which wasn’t even built just for matrix math, still beat it in some areas. It found the faster 48-step solution for 4×4 complex matrices, a case AlphaTensor didn’t crack. In fact, AlphaEvolve has improved results for 14 different matrix targets—showing that general-purpose AI can now compete with, and even surpass, highly specialized tools.

That’s a strong signal that LLM-based systems, if designed right, can handle serious scientific challenges without needing to be narrowly focused.

Beyond Assembly Hacks

Another comparison is AlphaDev, launched in 2023. It used reinforcement learning too, but worked at the assembly code level—optimizing tiny sequences of low-level instructions to make things like sorting faster. It found clever improvements, but only at a very small scale.

AlphaEvolve operates at a much higher level, working with complete algorithms and programming languages that developers actually use. It’s built for broad discovery, not just micro-optimizations. That opens the door to real-world impact across many fields—not just speeding up a few assembly instructions.

The Power of LLM-Driven Evolution

There’s also a deeper shift happening here. AlphaEvolve isn’t based on reinforcement learning like AlphaTensor or AlphaDev. Instead, it uses a loop driven by large language models (LLMs)—specifically Gemini Flash and Pro.

These LLMs bring a huge amount of built-in knowledge, learned from vast amounts of code and text. AlphaEvolve guides them with smart prompts, uses feedback from earlier attempts, and evolves better ideas over time. It’s not just trying random changes. It’s learning, iterating, and building on past successes.

This evolutionary framework makes the search more efficient. It also allows for bigger creative jumps—something that’s harder to achieve with traditional reinforcement learning or genetic algorithms.

System Name	Core Methodology	Primary Generative Engine	Scope of Discovery	Key Application Examples	Key Achievements Noted
AlphaEvolve	Evolutionary Computation with Automated Evaluation	LLM Ensemble (Gemini 2.0 Flash & Pro)	Entire codebases, complex algorithms across multiple functions/components, general-purpose	Matrix multiplication, mathematical problem solving (e.g., Kissing Number), data center scheduling, TPU circuit design, AI model training optimization	4×4 complex matrix mult. (48 ops), 0.7% data center efficiency, 23% Gemini kernel speedup, new Kissing Number bound (593 in 11D)
FunSearch	LLM-guided Evolutionary Search	LLM (e.g., PaLM 2)	Single functions, heuristics (typically Python, 10-20 lines)	Mathematical problems (e.g., cap set problem, bin packing)	Discovered new solutions to open problems in mathematics
AlphaTensor	Reinforcement Learning	– (RL-based architecture search)	Matrix multiplication algorithms	Discovering faster matrix multiplication algorithms	Found faster algorithms for specific matrix sizes/fields (e.g., 4×4 binary)
AlphaDev	Reinforcement Learning (AlphaZero-based)	– (RL-based search)	Short code sequences (assembly language) for fundamental tasks	Sorting algorithms (e.g., for 5 elements), hashing algorithms	Improved sorting library performance (e.g., LLVM libc++), faster hashing

What Sets AlphaEvolve Apart

Here’s what really makes AlphaEvolve different:

General-purpose intelligence: It’s not built for just one task. If you can define a way to test success, AlphaEvolve can try to solve it.
Full-code evolution: It doesn’t just tweak a few lines. It evolves complex, multi-function systems—something earlier systems couldn’t handle.
Smart LLM combo: Using both Gemini Flash for fast ideas and Gemini Pro for deeper improvements gives it a powerful mix of creativity and depth.
Tight feedback loop: With automated evaluation, prompt engineering, and continuous refinement, it’s designed to get smarter with every round.

Together, these features point to something much bigger than just better coding tools. AlphaEvolve may be the start of a new kind of AI system—one that doesn’t just run the code, but invents the future of code itself.

AlphaEvolve: Limits and Open Questions

AlphaEvolve is powerful—but it’s not without limits. Experts across AI and science have praised its potential, while also pointing out where it falls short.

Expert Reactions

AI researcher Gary Marcus called AlphaEvolve a real step beyond typical large language models. Stuart Battersby, CTO at Chatterbox Labs, sees it as a game-changer for faster algorithm development but warns that strong safeguards are essential.

Pushmeet Kohli of DeepMind described AlphaEvolve as a “super coding agent” that finds solutions no one had imagined. Yet, he admitted it’s still too compute-heavy to share widely. Research lead Matej Balog noted that in 20% of the open math problems it tackled, AlphaEvolve beat the best-known solutions—each one provably correct.

Still, not everyone is sold. Oxford’s Simon Frieder cautioned that AlphaEvolve only works well in problems that can be clearly scored by code—limiting its use in fields like medicine or biology, where success often depends on human judgment or lab results.

Real Discoveries or Smart Tweaks?

Some critics question how “new” AlphaEvolve’s findings really are. Is it inventing, or just uncovering missed optimizations? That debate came up around its 48-step matrix multiplication method. While it beat Strassen’s decades-old algorithm, some saw it as a niche win. Others pointed out that it’s both novel and practical—and could be used in real systems.

DeepMind researchers argue that any case where AlphaEvolve outperforms the state-of-the-art qualifies as discovery. Even if it’s “just” a clever optimization, it’s one humans didn’t find in 50 years.

A Tool with Boundaries

AlphaEvolve’s biggest strength—automated evaluation—is also a key limitation. If a problem can’t be scored easily or objectively, the system can’t help much. That rules out many real-world, fuzzy, or subjective domains for now.

And while it generates high-quality results, it’s also expensive to run and closed to the public. These challenges will need to be solved before it can become widely useful.

Bigger Picture — Humans + AI = New Discovery Workflow

AlphaEvolve isn’t just another coding tool. It hints at a bigger shift in how discoveries get made—one where human experts and AI agents collaborate to push science and engineering forward, together.

A New Era of AI-Driven Discovery

At its core, AlphaEvolve is a general-purpose algorithm discovery engine. If a problem can be expressed as code and scored automatically, AlphaEvolve can work on it. That opens the door to breakthroughs across a wide range of fields:

Materials science: Designing new materials by predicting how atoms and molecules behave—if simulators can provide fast, reliable feedback.
Drug discovery: Speeding up how we screen compounds, predict binding, and plan synthetic paths.
Sustainability: Finding better ways to route energy, manage grids, reduce waste, or model climate systems.
Quantum and physics: Discovering new math and algorithms for exotic computing hardware or complex natural systems.

The catch? These fields need accurate, automatable scoring systems to guide the AI’s learning loop. Without them, AlphaEvolve can’t evolve.

Smarter Business, Too

Beyond science, the tech behind AlphaEvolve could transform business and industry:

Supply chains: Smarter routing, inventory, and forecasting.
Finance: More efficient models for trading, fraud detection, and risk.
Manufacturing: Better scheduling, quality control, and predictive maintenance.

One reason it’s viable in these areas? The AI generates human-readable code, making it easier for teams to test, debug, and integrate into existing workflows.

Humans Define the Problem. AI Explores the Space.

AlphaEvolve doesn’t replace scientists or engineers. It works with them.

Humans still frame the challenge, define success metrics, and guide the AI with starting code or context. AlphaEvolve takes it from there—rapidly testing ideas, refining solutions, and surfacing novel options.

This new workflow splits the work:

Humans focus on creative direction and high-level insight.
AI handles the heavy lifting of trial, error, and optimization.

In time, this partnership could redefine research itself. Scientists may become AI orchestrators, focused on setting up problems and steering exploration—expanding who can contribute to cutting-edge discovery.

What’s Next for AlphaEvolve

DeepMind is already planning to open AlphaEvolve to researchers:

Early access: Select academic teams will be invited to use AlphaEvolve for their own projects.
User interface: A simplified, more accessible tool is in development.
LLM feedback loop: Discoveries from AlphaEvolve could feed back into improving the Gemini models that power it.
Expansion to natural sciences: If complex simulations can score ideas accurately, AlphaEvolve could help with biology, chemistry, and physics problems.

And to tackle fuzzier problems? DeepMind is exploring hybrid evaluations, mixing automatic scoring with human input or model-based reviews.

What this all points to is a powerful feedback loop: AI discovers better algorithms → those algorithms improve AI models → the cycle repeats, accelerating discovery.

Conclusion — A Glimpse of the Next R&D Era

AlphaEvolve marks a turning point. It’s not just using algorithms—it’s inventing them. From beating decades-old math records to optimizing Google’s own infrastructure, it’s already proving that AI can help solve some of the world’s hardest technical problems.

What sets it apart is its combination of deep learning, evolutionary search, and rigorous testing. That makes it more than a coding tool—it’s a discovery engine. And as researchers gain access and scoring tools improve, its impact could spread across science, business, and beyond.

The future of innovation may not be human or AI—it may be both, working side by side to explore what neither could find alone.

Article last updated on: Last updated: June 3, 2025