aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • aster.cloud
  • March 9, 2025
  • 5 minute read

AI Hypercomputer is a fully integrated supercomputing architecture for AI workloads – and it’s easier to use than you think. In this blog, we break down four common use cases, including reference architectures and tutorials, representing just a few of the many ways you can use AI Hypercomputer today. 

Short on time? Here’s a quick summary.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

  • Affordable inference. JAX, Google Kubernetes Engine (GKE) and NVIDIA Triton Inference Server are a winning combination, especially when you pair them with Spot VMs for up to 90% cost savings. We have several tutorials, like this one on how to serve LLMs like Llama 3.1 405B on GKE.
  • Large and ultra-low latency training clusters. Hypercompute Cluster gives you physically co-located accelerators, targeted workload placement, advanced maintenance controls to minimize workload disruption, and topology-aware scheduling. You can get started by creating a cluster with GKE or try this pretraining NVIDIA GPU recipe.
  • High-reliability inference. Pair new cloud load balancing capabilities like custom metrics and service extensions with GKE Autopilot, which includes features like node auto-repair to automatically replace unhealthy nodes, and horizontal pod autoscaling to adjust resources based on application demand. 
  • Easy cluster setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You can get started with one of our AI/ML blueprints.
  • If you want to see a broader set of reference implementations, benchmarks and recipes, go to the AI Hypercomputer GitHub.

Why it matters
Deploying and managing AI applications is tough. You need to choose the right infrastructure, control costs, and reduce delivery bottlenecks. AI Hypercomputer helps you deploy AI applications quickly, easily, and with more efficiency relative to just buying the raw hardware and chips. 

Take Moloco, for example. Using the AI Hypercomputer architecture they achieved 10x faster model training times and reduced costs by 2-4x.

Let’s dive deeper into each use case.

Read More  Google Cloud Next 2019 | Why Media And Entertainment Companies Are Going G Suite

1. Reliable AI inference

According to Futurum, in 2023 Google had ~3x fewer outage hours vs. Azure, and ~3x fewer than AWS. Those numbers fluctuate over time, but maintaining high availability is a challenge for everyone. The AI Hypercomputer architecture offers fully integrated capabilities for high-reliability inference.

Many customers start with GKE Autopilot because of its 99.95% pod-level uptime SLA. Autopilot enhances reliability by automatically managing nodes (provisioning, scaling, upgrades, repairs) and applying security best practices, freeing you from manual infrastructure tasks. This automation, combined with resource optimization and integrated monitoring, minimizes downtime and helps your applications run smoothly and securely.

There are several configurations available, but in this reference architecture we use TPUs with the JetStream Engine to accelerate inference, plus JAX, GCS Fuse, and SSDs (like Hyperdisk ML) to speed up the loading of model weights.  As you can see, there are two notable additions to the stack that get us to high reliability: Service Extensions and custom metrics.

Custom metrics, utilizing the Open Request Cost Aggregation (ORCA) protocol, allow applications to send workload-specific performance data (like model serving latency) to Cloud Load Balancer, which then uses this information to make intelligent routing and scaling decisions.

Service extensions allow you to customize the behavior of Cloud Load Balancer by inserting your own code (written as plugins) into the data path, enabling advanced traffic management and manipulation.

Try it yourself. Start by defining your Load Balancing Metrics, create a plugin using Service Extensions, or spin up a fully-managed Kubernetes cluster with Autopilot. For more ideas, check out this blog on the latest networking enhancements for generative AI applications

Read More  Three Ukrainian-Language Linux Foundation Courses Are Now Available

2. Large scale AI training

Training large AI models demands massive, efficiently scaled compute. Hypercompute Cluster is a supercomputing solution built on AI Hypercomputer that lets you deploy and manage a large number of accelerators as a single unit, using a single API call. Here are a few things that set Hypercompute Cluster apart:

  • Clusters are densely physically co-located for ultra-low-latency networking. They come with pre-configured and validated templates for reliable and repeatable deployments, and with cluster-level observability, health monitoring, and diagnostic tooling.
  • To simplify management, Hypercompute Clusters are designed for integrating with orchestrators like GKE and Slurm, and are deployed via the Cluster Toolkit. GKE provides support for over 50,000 TPU chips to train a single ML model. 

In this reference architecture, we use GKE Autopilot and A3 Ultra VMs.

A3 Ultra uses NVIDIA H200 GPUs with twice the GPU-to-GPU network bandwidth and twice the high bandwidth memory (HBM) compared to A3 Mega GPUs. They are built with our new Titanium ML network adapter and incorporate NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience, perfect for large multi-node workloads on GPUs.

GKE supports up to 65,000 nodes — we believe this is more than 10X larger scale than the other two largest public cloud providers.

Try it yourself: Create a Hypercompute Cluster with GKE or try this pretraining NVIDIA GPU recipe.

3. Affordable AI inference

Serving AI, especially large language models (LLMs), can become prohibitively expensive. AI Hypercomputer combines open software, flexible consumption models and a wide range of specialized hardware to minimize costs.

  • Cost savings are everywhere, if you know where to look. Beyond the tutorials, there are two cost-efficient deployment models you should know. GKE Autopilot reduces the cost of running containers by up to 40% compared to standard GKE by automatically scaling resources based on actual needs, while Spot VMs can save up to 90% on batch or fault-tolerant jobs. You can combine the two to save even more — “Spot Pods” are available in GKE Autopilot to do just that.
Read More  Reduce Scaling Costs By Up To 50% In Cloud Spanner With Doubled Provisioned Storage

In this reference architecture, after training with JAX, we convert into NVIDIA’s Faster Transformer format for inferencing. Optimized models are served via NVIDIA’s Triton on GKE Autopilot. Triton’s multi-model support allows for easy adaptation to evolving model architectures, and a pre-built NeMo container simplifies setup.

Try it yourself: Start by learning how to serve a model with a single NVIDIA GPU in GKE. You can also serve Gemma open models with Hugging Face TGI, or LLMs like DeepSeek-R1 671B and Llama 3.1 405B.

4. Easy cluster setup and deployment

You need tools that simplify, not complicate, your infrastructure setup. The open-source Cluster Toolkit offers pre-built blueprints and modules for rapid, repeatable cluster deployments. You get easy integration with JAX, PyTorch, and Keras. Platform teams get simplified management with Slurm, GKE, and Google Batch, plus flexible consumption models like Dynamic Workload Scheduler and a wide range of hardware options. In this reference architecture, we set up an A3 Ultra cluster with Slurm:

Try it yourself. You can select one of our easy-to-use AI/ML blueprints, available through our GitHub repo, and use it to set up a cluster. We also offer a variety of resources to help you get started, including documentation, quickstarts, and videos.

By: Duncan Campbell (Developer Advocate, Google Cloud) and Jarrad Swain (Product Marketing, Google Cloud)
Originally published at: Google Cloud Blog

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • AI
  • Google Cloud
  • Hypercomputer
You May Also Like
View Post
  • Gears
  • Technology

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

  • June 15, 2026
View Post
  • Technology

The consequences of relying on AI for accurate news

  • June 10, 2026
View Post
  • Gears
  • Technology

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

  • June 8, 2026
View Post
  • Technology

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

  • June 4, 2026
View Post
  • Technology

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

  • May 25, 2026
pope-leo-xiv-cq5dam-1500.844
View Post
  • Technology

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

  • May 22, 2026
View Post
  • Technology

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

  • May 20, 2026
reMarkable Paper Pure
View Post
  • Gears
  • Technology

Everything The reMarkable Paper Pure Actually Does

  • May 14, 2026

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.