Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

AI Hypercomputer is a fully integrated supercomputing architecture for AI workloads – and it’s easier to use than you think. In this blog, we break down four common use cases, including reference architectures and tutorials, representing just a few of the many ways you can use AI Hypercomputer today.

Short on time? Here’s a quick summary.

Partner with aster.cloud
for your next big idea.
Let us know here.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

Affordable inference. JAX, Google Kubernetes Engine (GKE) and NVIDIA Triton Inference Server are a winning combination, especially when you pair them with Spot VMs for up to 90% cost savings. We have several tutorials, like this one on how to serve LLMs like Llama 3.1 405B on GKE.
Large and ultra-low latency training clusters. Hypercompute Cluster gives you physically co-located accelerators, targeted workload placement, advanced maintenance controls to minimize workload disruption, and topology-aware scheduling. You can get started by creating a cluster with GKE or try this pretraining NVIDIA GPU recipe.
High-reliability inference. Pair new cloud load balancing capabilities like custom metrics and service extensions with GKE Autopilot, which includes features like node auto-repair to automatically replace unhealthy nodes, and horizontal pod autoscaling to adjust resources based on application demand.
Easy cluster setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You can get started with one of our AI/ML blueprints.
If you want to see a broader set of reference implementations, benchmarks and recipes, go to the AI Hypercomputer GitHub.

Why it matters
Deploying and managing AI applications is tough. You need to choose the right infrastructure, control costs, and reduce delivery bottlenecks. AI Hypercomputer helps you deploy AI applications quickly, easily, and with more efficiency relative to just buying the raw hardware and chips.

Take Moloco, for example. Using the AI Hypercomputer architecture they achieved 10x faster model training times and reduced costs by 2-4x.

Let’s dive deeper into each use case.

1. Reliable AI inference

According to Futurum, in 2023 Google had ~3x fewer outage hours vs. Azure, and ~3x fewer than AWS. Those numbers fluctuate over time, but maintaining high availability is a challenge for everyone. The AI Hypercomputer architecture offers fully integrated capabilities for high-reliability inference.

Many customers start with GKE Autopilot because of its 99.95% pod-level uptime SLA. Autopilot enhances reliability by automatically managing nodes (provisioning, scaling, upgrades, repairs) and applying security best practices, freeing you from manual infrastructure tasks. This automation, combined with resource optimization and integrated monitoring, minimizes downtime and helps your applications run smoothly and securely.

There are several configurations available, but in this reference architecture we use TPUs with the JetStream Engine to accelerate inference, plus JAX, GCS Fuse, and SSDs (like Hyperdisk ML) to speed up the loading of model weights. As you can see, there are two notable additions to the stack that get us to high reliability: Service Extensions and custom metrics.

Custom metrics, utilizing the Open Request Cost Aggregation (ORCA) protocol, allow applications to send workload-specific performance data (like model serving latency) to Cloud Load Balancer, which then uses this information to make intelligent routing and scaling decisions.

Service extensions allow you to customize the behavior of Cloud Load Balancer by inserting your own code (written as plugins) into the data path, enabling advanced traffic management and manipulation.

Try it yourself. Start by defining your Load Balancing Metrics, create a plugin using Service Extensions, or spin up a fully-managed Kubernetes cluster with Autopilot. For more ideas, check out this blog on the latest networking enhancements for generative AI applications

2. Large scale AI training

Training large AI models demands massive, efficiently scaled compute. Hypercompute Cluster is a supercomputing solution built on AI Hypercomputer that lets you deploy and manage a large number of accelerators as a single unit, using a single API call. Here are a few things that set Hypercompute Cluster apart:

Clusters are densely physically co-located for ultra-low-latency networking. They come with pre-configured and validated templates for reliable and repeatable deployments, and with cluster-level observability, health monitoring, and diagnostic tooling.
To simplify management, Hypercompute Clusters are designed for integrating with orchestrators like GKE and Slurm, and are deployed via the Cluster Toolkit. GKE provides support for over 50,000 TPU chips to train a single ML model.

In this reference architecture, we use GKE Autopilot and A3 Ultra VMs.

A3 Ultra uses NVIDIA H200 GPUs with twice the GPU-to-GPU network bandwidth and twice the high bandwidth memory (HBM) compared to A3 Mega GPUs. They are built with our new Titanium ML network adapter and incorporate NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience, perfect for large multi-node workloads on GPUs.

GKE supports up to 65,000 nodes — we believe this is more than 10X larger scale than the other two largest public cloud providers.

Try it yourself: Create a Hypercompute Cluster with GKE or try this pretraining NVIDIA GPU recipe.

3. Affordable AI inference

Serving AI, especially large language models (LLMs), can become prohibitively expensive. AI Hypercomputer combines open software, flexible consumption models and a wide range of specialized hardware to minimize costs.

Cost savings are everywhere, if you know where to look. Beyond the tutorials, there are two cost-efficient deployment models you should know. GKE Autopilot reduces the cost of running containers by up to 40% compared to standard GKE by automatically scaling resources based on actual needs, while Spot VMs can save up to 90% on batch or fault-tolerant jobs. You can combine the two to save even more — “Spot Pods” are available in GKE Autopilot to do just that.

In this reference architecture, after training with JAX, we convert into NVIDIA’s Faster Transformer format for inferencing. Optimized models are served via NVIDIA’s Triton on GKE Autopilot. Triton’s multi-model support allows for easy adaptation to evolving model architectures, and a pre-built NeMo container simplifies setup.

Try it yourself: Start by learning how to serve a model with a single NVIDIA GPU in GKE. You can also serve Gemma open models with Hugging Face TGI, or LLMs like DeepSeek-R1 671B and Llama 3.1 405B.

4. Easy cluster setup and deployment

You need tools that simplify, not complicate, your infrastructure setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You get easy integration with JAX, PyTorch, and Keras. Platform teams get simplified management with Slurm, GKE, and Google Batch, plus flexible consumption models like Dynamic Workload Scheduler and a wide range of hardware options. In this reference architecture, we set up an A3 Ultra cluster with Slurm:

Try it yourself. You can select one of our easy-to-use AI/ML blueprints, available through our GitHub repo, and use it to set up a cluster. We also offer a variety of resources to help you get started, including documentation, quickstarts, and videos.

By: Duncan Campbell (Developer Advocate, Google Cloud) and Jarrad Swain (Product Marketing, Google Cloud)
Originally published at: Google Cloud Blog

Source: zedreviews.com

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

From our partners:

1. Reliable AI inference

2. Large scale AI training

3. Affordable AI inference

4. Easy cluster setup and deployment

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

aster.cloud

Related Topics

One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI

The consequences of relying on AI for accurate news

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

Ink vs Pixels. What you miss versus what you are actually missing.

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

Most Popular

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

Everything The reMarkable Paper Pure Actually Does

Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future

Introducing The Anthropic Institute

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

From our partners:

1. Reliable AI inference

2. Large scale AI training

3. Affordable AI inference

4. Easy cluster setup and deployment

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Related Topics

You May Also Like