aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Technology
  • Tools

Better Data For Better AI: New Speech Datasets And Benchmarks For Data

  • aster.cloud
  • December 25, 2021
  • 2 minute read

AI researchers and engineers need better data to enable better AI solutions. The quality of an AI solution is determined by both the learning algorithm (such as a deep-neural network model) and the datasets used to train and evaluate that algorithm. Historically, AI research has focused much more on algorithms than datasets, despite their vital importance. As a result, many algorithms are freely available as starting points, but many important problems lack large, high-quality open datasets. Further, creating new datasets is expensive and error-prone.

Recently, the data-centric AI movement has emerged, which aims to develop new methodologies and tools for constructing better datasets to fix this problem. Conferences, workshops,  challenges, and platforms are being launched to support improving data quality and to foster data excellence. Thought leaders such as Andrew Ng at Landing.AI and Chris Re at Stanford University are encouraging AI developers to focus more on iterative data engineering than they do tuning their learning algorithms. Our CHI-best-paper-award-winning paper, “Everyone wants to do the model work, not the data work” highlighted the significance of data quality in the practice of ML.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

At Google, we are excited to contribute to data-centric AI. Today, Google Cloud is adding a new high value dataset to the Public Dataset Program, and Google researchers are announcing DataPerf, a new multi-organizational effort to develop benchmarks for data quality and data centric algorithms.

Google Cloud is committed to helping users improve their data quality, starting with supporting better public data. The Public Datasets program provides high quality datasets pre-configured on GCP for easy access. Google Cloud is adding a new high-value dataset developed by the MLCommons™ Association (which Google co-founded) to the Public Datasets program: The Multilingual Spoken Words Corpus: a rich audio speech dataset with more than 340,000 keywords in 50 languages with upwards of 23.4 million examples.

Read More  How Kitabisa Re-Structured Its Fundraising Platform To Drive "Kindness At Scale" On Google Cloud

This new public dataset is aligned with the MLCommons Association vision for “open” datasets – accessible by all – that are “living” – continually being improved to raise quality and increase representation and diversity.

Google researchers, in collaboration with multiple organizations, are announcing the DataPerf effort at the NeurIPS Data-Centric AI workshop today, to develop benchmarks to improve data quality. Much like the the MLPerf™ benchmarking effort which is now the industry standard for machine learning hardware/software speed, DataPerf brings together the originators of prior efforts including: CATS4ML, Data-Centric AI Competition, DCBench, Dynabench, and the MLPerf benchmarks to define clear metrics that catalyze rapid innovation. DataPerf will measure the utility of training and test data for common problems, and algorithms for working with datasets such as: selecting core sets, correcting errors, identifying under-optimized data slices, and valuing datasets prior to labeling.

Together, supporting open, living datasets for core ML tasks, and the development of benchmarks to direct the rapid evolution of those datasets will empower the researchers and engineers who use Google Cloud to do even more amazing things – and we can’t wait to see what they create!


Acknowledgements: In collaboration with Lora Aroyo and Praveen Paritosh.

 

 

By: Peter Mattson (Staff Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Benchmark
  • Google Cloud
You May Also Like
View Post
  • Gears
  • Technology

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

  • June 15, 2026
View Post
  • Technology

The consequences of relying on AI for accurate news

  • June 10, 2026
View Post
  • Gears
  • Technology

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

  • June 8, 2026
View Post
  • Technology

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

  • June 4, 2026
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Technology

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

  • May 25, 2026
pope-leo-xiv-cq5dam-1500.844
View Post
  • Technology

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

  • May 22, 2026
View Post
  • Technology

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

  • May 20, 2026

Stay Connected!
LATEST
  • 1
    Expectations vs. Reality: The AI We Thought We’d Have in 10 Years
    • June 19, 2026
  • digital-nomad-freelancer-worker-2151205464 2
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 3
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 4
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 5
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 6
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 7
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 8
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 9
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 10
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 2
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 3
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 4
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 5
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.