Scaling Inference Lab deploys first cluster

In a major milestone for the Scaling Inference Lab, the first cluster has officially been deployed. Set to run until April 2027, cluster 1 is designed to explore whether pipeline parallelism can unlock meaningful inference performance from commodity GPU hardware. Alongside this milestone, the UK government has announced an additional £20m+ in funding for the Lab to establish a world-class national capability for testing, validating, and scaling new AI compute systems.

As demand for new cutting-edge technology soars, it is becoming increasingly important to fully leverage the performance of legacy GPUs.

Partnering with Quettaflop, cluster 1 will test the hypothesis that it is possible to deliver significantly cheaper AI infrastructure by combining older GPUs, which are depreciating quickly, with ongoing software optimisations.

If successful, the outputs of this first cluster could result in vastly lower CapEx for serving LLM inference compared to the default alternative today, which is to deploy expensive fleets of new clusters using the latest, state-of-the-art hardware.

Specifically, by distributing model layers across lower-cost GPUs in sequence, this first cluster explores whether intelligent software architecture can substitute for raw hardware capability and offer a compelling alternative to the latest state-of-the-art GPU hardware, improving fault tolerance and scalability with improved network intelligence.

The project will examine if OpEx can be reduced too, by considering additional software optimisations in combination with carefully selected energy sources, all with the aim of proving out a new inference serving architecture that is much more capital efficient overall.

The mission behind the Scaling Inference Lab

"There is an insatiable need to improve the cost efficiency of future AI infrastructure. While there is a lot of capital being deployed and a wide swath of technologies being explored to help meet this need, many are developed at the component level.

Outside of the largest hyperscalers, there is comparatively little activity focused on the systems-level benefits of each underlying technology. ARIA has launched the Scaling Inference Lab, delivered by CommonAI to address this."

Suraj Bramhavar — Programme Director, ARIA

The Scaling Inference Lab provides a testbed for new AI technologies, prioritising rapid iteration, open collaboration, and long-term sustainability.

The lab aims to provide a shared software and hardware infrastructure lowering the barrier for innovative startups to plug into larger rack-scale systems and ensure that they are tested against the latest emerging workloads.

The outcome is to provide strategic insights to governments looking to procure future AI infrastructure and investors seeking to fund the development of next-generation technologies.

How the Scaling Inference Lab benefits the UK

The UK government is currently exploring many additional tools to strengthen its position in AI infrastructure.

This includes supporting more UK vendors pushing the boundaries of AI technologies, providing resources for private investors to intelligently invest into UK startups and funding high-risk research into innovative technical concepts.

The Scaling Inference Lab is supporting this by developing a functional simulator toolkit to support a comprehensive roadmapping framework. By modelling current and emerging workloads, the toolkit will identify architectural bottlenecks and forecast the true systems-level benefits of new technologies. The resulting insights will equip the UK government and private investors to make better-informed capital allocation, investments and procurement decisions.

Cluster 2 will deploy in August 2026, partnering with Callosum to focus on the heterogeneous orchestration for agentic workloads. More details to be announced soon!

Learn more about the Scaling Inference Lab and the hypotheses being tested on the website: scalinginference.org

About CommonAI CIC

CommonAI CIC is a non-profit membership organisation, founded on a belief in collaborative engineering for the safe and responsible development of foundational AI technologies. We are focused on building shared infrastructure that organisations can use to run and improve AI systems in real conditions with sovereignty, certainty and efficiency.

Scaling Inference Lab deploys first cluster to cut AI infrastructure costs

The mission behind the Scaling Inference Lab

How the Scaling Inference Lab benefits the UK

About CommonAI CIC