aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data

Scaling Machine Learning Inference With NVIDIA Tensorrt And Google Dataflow

  • aster.cloud
  • January 26, 2023
  • 3 minute read

A collaboration between Google Cloud and NVIDIA has enabled Apache Beam users to maximize the performance of ML models within their data processing pipelines, using NVIDIA TensorRT and NVIDIA GPUs alongside the new Apache Beam TensorRTEngineHandler.

The NVIDIA TensorRT SDK provides high-performance, neural network inference that lets developers optimize and deploy trained ML models on NVIDIA GPUs with the highest throughput and lowest latency, while preserving model prediction accuracy. TensorRT was specifically designed to support multiple classes of deep learning models, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformer-based models.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Deploying and managing end-to-end ML inference pipelines while maximizing infrastructure utilization and minimizing total costs is a hard problem. Integrating ML models in a production data processing pipeline to extract insights requires addressing challenges associated with the three main workflow segments:

  1. Preprocess large volumes of raw data from multiple data sources to use as inputs to train ML models to “infer / predict” results, and then leverage the ML model outputs downstream for incorporation into business processes.
  2. Call ML models within data processing pipelines while supporting different inference use-cases: batch, streaming, ensemble models, remote inference, or local inference. Pipelines are not limited to a single model and often require an ensemble of models to produce the desired business outcomes.
  3. Optimize the performance of the ML models to deliver results within the application’s accuracy, throughput, and latency constraints. For pipelines that use complex, computate-intensive models for use-cases like NLP or that require multiple ML models together, the response time of these models often becomes a performance bottleneck. This can cause poor hardware utilization and requires more compute resources to deploy your pipelines in production, leading to potentially higher costs of operations.
Read More  Google Cloud Data Heroes Series: Meet Francisco, The Ecuadorian American Founder Of Direcly, A Google Cloud Partner

Google Cloud Dataflow is a fully managed runner for stream or batch processing pipelines written with Apache Beam. To enable developers to easily incorporate ML models in data processing pipelines, Dataflow recently announced support for Apache Beam’s generic machine learning prediction and inference transform, RunInference. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use models in production pipelines without needing lots of boilerplate code.

You can see an example of its usage with Apache Beam in the following code sample. Note that the engine_handler is passed as a configuration to the RunInference transform, which abstracts the user from the implementation details of running the model.

engine_handler = TensorRTEngineHandlerNumPy(
          min_batch_size=4,
          max_batch_size=4,
          engine_path=
          'gs://gcp_bucket/single_tensor_features_engine.trt')

pcoll = pipeline | beam.Create(SINGLE_FEATURE_EXAMPLES)
predictions = pcoll | RunInference(engine_handler)

Along with the Dataflow runner and the TensorRT engine, Apache Beam enables users to address the three main challenges. The Dataflow runner takes care of pre-processing data at scale, preparing the data for use as model input. Apache Beam’s single API for batch and streaming pipelines means that RunInference is automatically available for both use cases. Apache Beam’s ability to define complex multi-path pipelines also makes it easier to create pipelines that have multiple models. With TensorRT support, Dataflow now also has the ability to optimize the inference performance of models on NVIDIA GPUs.

For more details and samples to start using this feature today please have a look at the NVIDIA Technical Blog, “Simplifying and Accelerating Machine Learning Predictions in Apache Beam with NVIDIA TensorRT.” Documentation for RunInference can be found at the Apache Beam document site and for Dataflow docs.

Read More  Samsung Upskills Their Big Data Center Teams To Transform Business

By: Reza Rokni (Senior Staff Developer Advocate Dataflow) and Ruichao Ren (Deep Learning Specialist)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Apache Beam
  • Google Cloud
  • Google Dataflow
  • Machine Learning
  • NVIDIA
  • TensorRT
You May Also Like
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Technology

Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future

  • May 11, 2026
View Post
  • Data

Streamline read scalability with Cloud SQL autoscaling read pools

  • March 23, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
View Post
  • Data
  • Technology

3 obstacles to agentic AI adoption and how to overcome them

  • December 22, 2025
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.