Three years ago, Extropic made the bet that energy would become the limiting factor for AI scaling.
We were right.1
Scaling AI will require a major breakthrough in either energy production, or the energy efficiency of AI hardware and algorithms.
We are proud to unveil our breakthrough AI algorithms and hardware, which can run generative AI workloads using radically less energy than deep learning algorithms running on GPUs.
- We designed the world’s first scalable probabilistic computer.
- We fabricated probabilistic circuits that perform sampling tasks using orders of magnitude less energy than the current state of the art.
- We developed a new generative AI algorithm for our hardware that can use orders of magnitude less energy than existing algorithms.2

To explain our work, we are releasing:
- A hardware proof of technology, the XTR-0 development platform, already beta-tested by some of our early partners
- A post on this website and an academic paper describing our novel hardware (the Thermodynamic Sampling Unit) and our new generative AI algorithm (the Denoising Thermodynamic Model)
thrml, our Python library for simulating our hardware, which can be used to develop thermodynamic machine learning algorithms
With the fundamental science done, we are moving from breakthrough to buildout.
Once we succeed, energy constraints will no longer limit AI scaling.
Scaling The AI Energy Wall
Like other AI companies, we envision a future where AI is abundant. We believe AI is a fundamental driver of civilizational progress, thus scaling AI is of paramount importance. We imagine a future where AI helps humanity discover new drugs to cure disease, predicts the weather better to mitigate the impact of natural disasters, improves automation of manufacturing, drives our cars, and augments human cognition in a democratized fashion. We hope to bring that future to reality.
However, that future is completely out of reach with today’s technology. Already, almost every single new data center is experiencing difficulties sourcing power3. With today’s technology, serving advanced models to everyone all the time would consume vastly more energy than humanity can produce. To provide more AI per person, we will need to produce more energy per person, or get more AI per Joule.

Continuing to scale using existing AI systems will require vast amounts of energy. Many companies are working on better ways to produce that energy, but that is only half of the equation.
Extropic is working on the other half of the equation: making computing more efficient. Scaling up energy production requires the support of a nation-state, but a more efficient computer can be built by a dozen people in a garage outside Boston.
Extropic is Rethinking Computing
If we constrain ourselves to the computer architectures that are popular today, reducing energy consumption will be very hard. Most of the energy budget in a CPU or GPU goes towards communication, because moving bits of information around a chip requires charging up wires. The cost of this communication can be reduced by either reducing the capacitance of the wires or reducing the voltage level used for signalling. Neither of these quantities has gotten significantly smaller over the last decade, and we don’t think they will get smaller in the next decade either.
Fortunately, we don’t need to limit ourselves to today’s computer architectures. Today’s AI algorithms were designed to run well on GPUs because GPUs were already popular, but GPUs only became popular because they were good at rendering graphics. GPUs can do amazing things, but today’s machine learning paradigm is the result of evolution, not design.

The current machine learning paradigm has a lot of momentum. Without a major shift in computing demand, there’s no reason to throw away decades of optimizations to start over from scratch.
But recently, computational demands have shifted from deterministic to probabilistic, and from performance-constrained to energy-constrained.
To meet those demands, Extropic has developed a new type of computing hardware for the probabilistic, energy-efficient, AI-powered future.
The Thermodynamic Sampling Unit
We have developed a new type of computing hardware, the thermodynamic sampling unit (TSU).
We call our new hardware a sampling unit, not a processing unit, because TSUs perform an entirely different type of operation than CPUs and GPUs. Instead of processing a series of programmable deterministic computations, TSUs produce samples from a programmable distribution.
Running a generative AI algorithm fundamentally comes down to sampling from some complicated probability distribution. Modern AI systems do a lot of matrix multiplication to produce a vector of probabilities, and then sample from that. Our hardware skips the matrix multiplication and directly samples from complex probability distributions.
Specifically, TSUs sample from energy-based models, which are a type of machine learning model that directly define the shape of a probability distribution via an energy function.
The inputs to a TSU are parameters that specify the energy function of an EBM, and the outputs of a TSU are samples from the defined EBM. To use a TSU for machine learning, we fit the parameters of the energy function so that the EBM running on the TSU is a good model of some real-world data.
The Denoising Thermodynamic Model
To demonstrate how our hardware can be used for AI, we invented a new generative AI model, the Denoising Thermodynamic Model (DTM).
DTMs were inspired by diffusion models, and leverage TSUs to generate data by gradually pulling that data out of noise over several steps. You can find an explanation of how DTMs run on TSUs here.
Running DTMs on our TSUs could be 10,000x more energy efficient than modern algorithms running on GPUs, as shown by our simulations of TSUs running DTMs on the small benchmarks in our paper.

This work on DTMs is the first glimpse of what machine learning workloads can look like on TSUs. We believe that DTM’s will start a thermodynamic machine learning revolution, inspiring AI researchers to explore novel architectures and algorithms made for TSU’s in order to achieve unparalleled AI performance per watt.
The library that we used to write our simulations, thrml, is now open source. Using thrml, the open-source community can start developing algorithms for TSUs before the hardware becomes commercially available.
We funded an independent researcher to replicate our paper using the latest version of thrml, and that replication is now open-source. Anyone with a GPU can replicate the results from our paper by running that code.
Looking Forward
Our mission is to remove the energy constraints that currently limit AI scaling.
To accomplish that mission, we will need to dramatically scale up the capabilities of our hardware and algorithms.
While we have accomplished a lot so far with our small team, we are going to need more help to build production-scale systems.
We are looking for experienced mixed signal integrated circuit designers and hardware systems engineers to help us build progressively larger computers that can power more and more of the world’s AI inference.
We are also seeking experts in probabilistic ML to enhance the capabilities of our algorithms to match those of today’s foundation models, both by developing algorithms that run solely on a TSU and by developing hybrid algorithms that leverage both TSUs and GPUs.
We intend to use TSUs for tasks other than running AI models, such as simulations of biology and chemistry. We are hoping to develop partnerships with organizations that run probabilistic workloads, to develop new algorithms that will enable them to leverage our next generation chips. If this sounds interesting to you, fill out our algorithmic partnership form.
We invite early-career researchers and PhD students who are interested in the theory and application of TSUs to apply for a research grant.
References
- Today, 92% of data center executives see grid constraints as a major obstacle to scaling (Schneider Electric). This is being driven by the massive scale of this new breed of data center. Nine of the top ten utilities in the US have named data centers as their main source of growth (Reuters). “A large data center used to mean 10 to 50 megawatts of power. Now, developers are pitching single campuses in the multigigawatt range—on par with the energy draw of entire cities—all to power clusters of AI chips.”(The Information). And the scaling shows no signs of stopping, OpenAI has reportedly floated the idea of 250GW of compute capacity by 2033, which is roughly ⅓ of current peak power consumption in the US (The Information), Amazon is building a data center complex next to a nuclear reactor (AP News), xAI purchased a former Duke Energy power plant to help power Colossus (Semianalysis) ↩︎
- Our algorithm can generate images like those in the Fashion MNIST dataset using ~10,000x less energy than a VAE running on a GPU. Read our paper for more details. ↩︎
- From this Data Center Frontier survey. ↩︎
For more rigor, see our paper.
For more on Probabilistic Graphical Models, we recommend this textbook.
Source: zedreviews.com