aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Programming
  • Public Cloud

Debunking Myths About Python On Dataflow

  • aster.cloud
  • November 19, 2021
  • 4 minute read

For many developers that come to Dataflow, Google Cloud’s fully managed data processing service, the first decision they have to make is which programming language to use. Dataflow developers use the open-source Apache Beam SDK to author their pipelines, and have several choices for language to use: Java, Python, Go, SQL, Scala, and Kotlin. In this post, we’ll focus on one of our fastest growing languages: Python.

The Python SDK for Apache Beam was introduced shortly after Dataflow’s general availability was announced in 2015, and is the primary choice for several of our largest customers. However, it has suffered from a reputation for being incomplete and inferior to its predecessor, the Java SDK. While historically there was some truth to this perception, Python’s feature set has caught up to the Java SDK and offers new capabilities that are specifically catered to Python developers. We’ll take the rest of this blog to inspect some popular myths, and conclude with a brief review of the latest & greatest for the Python SDK.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Myth 1: Python doesn’t support streaming pipelines.

BUSTED. Streaming support for Python has been available for more than two years, released as part of Beam 2.16 in October 2019. This means all of the unique capabilities of streaming Dataflow, including Streaming Engine, update, drain, and snapshots are all available for Python users.

Myth 2: SqlTransform isn’t available for Python.

BUSTED. Tired of writing tedious code to join together datastreams? Use SqlTransform for Python. Apache Beam introduced support for SqlTransform to the Python SDK last year as part of our advancements with multi-language pipelines (more on that later). Take a look at this example to get started.

Read More  The Linux Foundation And Harvard’s Lab For Innovation Science Release Census For Open Source Software Security

Myth 3: State and Timer APIs aren’t available in Python.

BUSTED. Two of the most powerful features of the Beam SDK are the State and Timer APIs, which allow for more fine-grained control over aggregations than windows and triggers do. These APIs are available in Python, and offer parity with the Java SDK for the most common use cases. Reference the Beam programming guide for some examples of what you can do with these APIs.

Myth 4: There is support for a limited set of I/O’s in Python.

BUSTED. The most glaring disparity between the Java and Python SDKs is the discrepancy between I/O connectors, which facilitate read & write operations for Dataflow pipelines. Our support for multi-language pipelines puts this myth to rest. With cross-language transformations, Python pipelines can invoke a Java transformation underneath the hood to provide access to the entire library of Java-based I/O connectors. In fact, that’s how we implemented the KafkaIO module for Python (see example). Developers can invoke their own cross-language transformations using the instructions in the patterns library.

Myth 5: There are fewer code samples in Python.

PLAUSIBLE: Apache Beam maintains several repos for Python examples: one for snippets, one for notebook samples, and one for complete examples. However, there are a couple of notable exceptions where Python is missing, namely our Dataflow Templates repository. This is attributable to the fact that most of Dataflow’s initial users were Java developers. But this quick observation ignores two key factors: 1) the unique assets that are only available for Python developers, and 2) the tremendous momentum behind the Beam Python community.

Read More  How To Scale And Secure Your Cloud Management For Defense Applications

Python developers love writing exploratory code in JupyterLab notebook environments. Beam offers an interactive module that allows you to interactively build and run your Beam Python pipelines in a Jupyter Notebook. We make deploying these notebooks really simple with Beam Notebooks, which spins up a managed Notebook that contains all the required Beam libraries to prototype your pipelines. We also have a number of helpful examples & tutorials that show how you can sample data from a streaming source, or attach GPUs to your notebooks to accelerate your processing. The notebook also contains a learning track for new Beam developers that cover everything from basic operations, aggregations, and streaming concepts. You can review the documentation here.

Over the past few years, we have seen a number of extensions built on top of the Beam Python SDK. Cruise Automation published the Terra library, which enables 70+ Cruise data scientists to submit jobs without having to understand the underlying infrastructure. Spotify open-sourced Klio, a framework built on top of Beam Python that simplifies common tasks required for processing audio and media files. I have even pointed customers to beam-nuggets, a lightweight collection of Beam Python transformations used for reading/writing from/to relational databases. Open-source developers and large organizations are doubling down on Beam Python, and these brief examples underscore that trend.

What’s new:

The Dataflow team has a slew of new capabilities that will help Python developers advance their use case. Here’s a quick run-down of the newest features:

  • Custom containers: Users can now specify their own container image when they launch their Dataflow job. This is a common ask from our Python audience, who like to package their pipeline code with their own libraries and dependencies. We’re excited to announce that this feature is generally available—take a look at the documentation so you can try for yourself!
  • GPUs: Dataflow recently announced the general availability of GPU support on Dataflow. You can now accelerate your data processing by provisioning GPUs on your Dataflow job, another common request from machine learning practitioners on Dataflow. You can review the details of the launch here.
  • Beam DataFrames: Beam DataFrames brings the magic of pandas to the Beam SDK, allowing developers to convert PCollections to DataFrames and use the standard methods available with the pandas DataFrame API. DataFrames gives developers a more natural way to interact with their datasets and create their pipelines, and will be a stepping stone to future efficiency improvements. Beam DataFrames are generally available starting Beam 2.32, which was released in August. Learn more about Beam DataFrames here.
Read More  Why A Computer Isn't The First Thing You Need When Teaching Kids To Code

We invite you to try out our new features using Beam Notebooks today!

Do you have an interesting idea that you want to share with our Beam community? You can reach out to us through various modes, all found here. We are excited to see what’s next for Beam Python.

 

 

By Mehran Nazir Product Manager, Dataflow
Source Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Dataflow
  • Google Cloud
  • Python
You May Also Like
View Post
  • Public Cloud

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

  • June 10, 2026
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
Google Cloud and ElevenLabs
View Post
  • Public Cloud
  • Technology

ElevenLabs Partners with Google Cloud for Cloud Services and the Latest NVIDIA Blackwell GPUs

  • February 26, 2026
View Post
  • Public Cloud

Delivering a secure, open, and sovereign digital world

  • February 12, 2026
View Post
  • Public Cloud

Formula E and Google Cloud Announce Multi-Year ‘Principal Partnership’

  • January 26, 2026
View Post
  • Public Cloud

Sawasdee Thailand! Google Cloud launches new region in Bangkok

  • January 23, 2026
View Post
  • Public Cloud

Retailers Help Mitigate Risk with Oracle’s AI-Driven Supply Chain Collaboration

  • January 11, 2026

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.