aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Design
  • Engineering
  • Solutions
  • Technology

Building Trust In The Data With Dataplex

  • aster.cloud
  • October 19, 2022
  • 5 minute read

Analytics data is growing exponentially and so is the dependence on the data in making critical business and product decisions.  In fact, the best decisions are said to be the ones which are backed by data. In data, we trust!

But do we trust the data ?


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

As the data volumes have grown – one of the key challenges organizations are facing is how to maintain the data quality in a scalable and consistent way across the organization.  While data quality is not a newly found need,  the needs used to be  contained when the data footprint was small and data consumers were few. In such a world, data consumers knew who the producers were and producers knew what the consumers needed. But today, data ownership is getting distributed and data consumption is finding new users and use cases.  So the existing data quality approaches find themselves limited and are isolated to certain pockets of the organization.  This often exposes data consumers to inconsistent and inaccurate data which ultimately impacts the decisions made from that data.  As a result, organizations today are losing 10s of millions of dollars due to the low quality of data.

These organizations are looking for solutions that empower their data producers to consistently create high quality data cloud scale.

Building Trust with Dataplex data quality

Earlier this year,  at Google Cloud,  we launched Dataplex, an intelligent data fabric that enables governance and data management across distributed data at scale.  One of the key things Dataplex enables out-of-box is for data producers to build trust in the data with a built-in data quality.

Dataplex data quality task delivers a declarative, data-ops centric experience for validating data across BigQuery and Google Cloud Storage.  Producers can now easily build and publish quality reports or can easily include data validations as part of their data production pipeline.  Reports can be aggregated across various data quality dimensions and the execution is entirely serverless.

Read More  Multicloud Strategy For The Public Sector: Optimize Return, Mitigate Risk

 

Dataplex data quality task provides  –

  1. A declarative approach for defining “what good looks like” that can be managed as part of a CI/CD workflow.
  2. A serverless and managed execution with no infrastructure to provision.
  3. Ability to validate across data quality dimensions like  freshness, completeness, accuracy and validity.
  4. Flexibility in execution – either by using Dataplex serverless scheduler (at no extra cost) or executing the data validations as part of a pipeline (e.g. Apache Airflow).
  5. Incremental execution – so you save time and money by validating new data only.
  6. Secure and performant execution with zero data-copy from BigQuery environments and projects.
  7. Programmatic consumption of quality metrics for Dataops workflows.

Users can also execute these checks on data that is stored in BigQuery and Google Cloud Storage but is not yet organized with Dataplex.  For Google Cloud Storage data that is managed by Dataplex, Dataplex auto-detects and auto-creates tables for structured and semi-structured data. These tables can be referenced with the Dataplex data quality task as well.

Behind the scenes – Dataplex makes use of an open source data quality engine – Cloud Data Quality Engine – to run these checks. Providing an open platform is one of our key goals and we have made contributions to this engine to integrate seamlessly with Dataplex’s  metadata and serverless environment.

You can learn more about this in our product documentation.

Building enterprise trust at American Eagle Outfitters

One of our enterprise customers  – American Eagle Outfitters (AEO) – is continuing to build trust in their critical data using Dataplex Data Quality Task.  Kanhu Badtia, lead data engineer from AEO, shares their rationale and experience with Dataplex data quality task:

“AEO is a leading global specialty retailer offering high-quality & on-trend clothing under its American Eagle® and Aerie® brands. Our company operates stores in the United States, Canada, Mexico, and Hong Kong, and ships to 81 countries worldwide through its websites.

Read More  Google Cloud And VMware Expand Global Partnership To Help Enterprises Accelerate App Modernization And Cloud Transformation

We are a data-driven organization that utilizes data from physical and digital store fronts, from social media channels, from logistics/delivery partners and many other sources through established compliant processes. We have a team of data scientists and analysts who create models, reports and dashboards that inform responsible business decision-making on such matters as inventory, promotions, new product launches and other internal business reviews. As the data engineering team at AEO, our goal is to provide highly trusted data for our internal data consumers.

Before Dataplex – AEO had methods for maintaining data quality that were effective for their purpose. However, those methods were not scalable with the continual expansion of data volume and demand for quality results from our data consumers.  Internal data consumers identified and reported quality issues where ‘bad data’ was impacting business critical dashboards/reports . As a result, our teams were often in “fire-fighting” mode – finding & fixing bad data. We were looking for a solution that would standardize and scale data quality across the production data pipelines.

The majority of AEO’s business data is in Google’s BigQuery or in Google Cloud Storage (GCS). When Dataplex launched the data quality capabilities, we immediately started a proof-of-concept. After a careful evaluation, we decided to use it as the central data quality framework for production pipelines. We liked that –

 

  1. It provides an easy declarative (YAML) & flexible way of defining data quality. We were able to parameterize it to use across multiple tables.
  2. It allows validating data in any BigQuery table with a completely serverless and native execution using existing slot reservations.
  3. It allows executing these checks as part of the ETL pipelines using DataPlex Airflow Operators. This is a huge win as pipelines can now pause further processing if critical rules do not pass.
  4. Data quality checks are executed in parallel which gives us the required execution efficiency in pipelines.
  5. Data quality results are stored centrally in BigQuery & can be queried to identify which rules failed/succeeded and how many rows failed. This enables defining custom thresholds for success.
  6. Organizing data in Dataplex Lakes is optional when using Dataplex data quality.
Read More  Introducing Media CDN—the Modern Extensible Platform For Delivering Immersive Experiences

 

Our team truly believes that data quality is an integral part of any data-driven organization and Dataplex DQ capabilities align perfectly with that fundamental principle.

For example, here is a sample Google Cloud Composer / Airflow DAG  that loads & validates the “item_master” table and stops downstream processing if the validation fails.

 

It includes simple rules for uniqueness, completeness and more complex rules for referential integrity or business rules such as checking daily price variance. We publish all data quality results centrally to a BigQuery table, such as this:

Sample data quality output table

 

We query this output table for data quality issues & fail the pipeline in case of critical rule failure. This stops low quality data from flowing downstream.

We now have a repeatable process for data validation that can be used across the key data production pipelines. It standardizes the data production process and effectively ensures that bad data doesn’t break downstream reports and analytics.”

Learn more

Here at Google – we are excited to enable our customer’s journey to high quality, trusted data.  To learn more about our current data quality capabilities please refer to –

  • Dataplex Data Quality Overview
  • Sample Airflow DAG with Dataplex Data Quality task

 

By: Sandeep Karmarkar (Product Manager, Google Cloud) and Kanhu Badtia (Sr. Data Engineer, American Eagle Outfitters)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • American Eagle Outfitters
  • Dataplex
  • Google Cloud
You May Also Like
View Post
  • Gears
  • Technology

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

  • June 15, 2026
View Post
  • Technology

The consequences of relying on AI for accurate news

  • June 10, 2026
View Post
  • Gears
  • Technology

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

  • June 8, 2026
View Post
  • Technology

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

  • June 4, 2026
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Technology

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

  • May 25, 2026
pope-leo-xiv-cq5dam-1500.844
View Post
  • Technology

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

  • May 22, 2026
View Post
  • Technology

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

  • May 20, 2026

Stay Connected!
LATEST
  • 1
    Expectations vs. Reality: The AI We Thought We’d Have in 10 Years
    • June 19, 2026
  • digital-nomad-freelancer-worker-2151205464 2
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 3
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 4
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 5
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 6
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 7
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 8
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 9
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 10
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 2
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 3
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 4
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 5
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.