aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Public Cloud
  • Solutions

How Spam Detection Taught Us Better Tech Support

  • aster.cloud
  • November 15, 2021
  • 5 minute read

Information Technology teams, especially in help desk and support, need a way to track what problems people are having. Ideally they also can know how those problems change over time, especially when technology or policy shifts.

Imagine you are in charge of sending a newspaper delivery team to different neighborhoods. Each person has a bicycle, so you give them a route, and they leave the papers at the right doors. But the roads change. Every day they change. It’s chaos.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

What do you do when routes are changing constantly?  How do you provide the information needed when the context is shifting all the time?

In an IT context we run into similar challenges with traditional problem management frameworks such as ITIL 4, which tend to always assume a fixed, well-defined catalog of services. That way every issue that the IT folks solve is tracked and accounted for. That connection back to the catalog allows insight into what’s causing issues, or where outages or incidents may be impacting a large group of employees.

At Google we don’t have that. In part because we focus on putting the user first, and so we focus on getting people back to a productive state as job #1. Also because products, services and issues are always shifting–just like the roads, the route is never the same, even if the goal remains consistent. That means users come into our IT service desk with new problem types everyday.

Our tech support team, called Techstop, acts as the one-stop shop for all IT issues, and supports people across chat, email and video channels. They need to remain adaptable to new problems Googlers experience and new products they use. In order to track what problems might be on the rise, the Techstop team needs a way to catalog what tools, applications and services are in use at Google.

Thinking back to the newspaper delivery routes, we used a rough approximate map, rather than a very detailed one, giving us a taxonomy of services that was “good enough” for most of our use cases. We got some useful data out of it, but it didn’t give us very granular insight.

Read More  Qualtrics And AWS Expand Relationship To Transform Customer Feedback Into Enhanced Experiences

Need for innovation

Covid-19 put a new focus on scalable problem understanding, specifically for everyday employee IT issues. With so much of the workforce moved to a work-from-home model, we really needed to know where employees were experiencing technology pain. It’s as if whole new neighborhoods popped into existence overnight, but our newspaper delivery crew was the same. More ground to cover, with totally novel street maps.

Furthermore, products used everyday for productivity, such as Google Meet, began to see exponential growth in usage, causing scaling issues and outages. These product teams looked to the Techstop organization to help them prioritize the ever increasing list of feature requests and bugs being filled every day.

Ultimately the “good enough” problem taxonomy failed to produce truly helpful insights. We could find out which products were being affected the most, but not what issues people were having with those products. Even worse, new issues that were unique to the work-from-home model were being hidden by the fact that the catalog could not update in time to catch the rapidly changing problem space underneath it.

Borrowing spam tech

Taking a look around other efforts at Google, the Techstop team found examples of solving a similar problem: detecting new patterns quickly in rapidly changing data.

Gmail handles spam filtering for over a billion people. Those engineers had thought through “how do we detect a new spam campaign quickly?”  Spammers rapidly send bulk messages with slight variations in content (noise, misspellings, etc.) Most classification attempts would become a game of cat and mouse since it takes classifiers some time to learn about new patterns.

classifier

Invoking a trend identification engine using unsupervised density clustering on unstructured text unlocked the ability for Gmail to detect ephemeral spam campaigns more quickly.

The Techstop problem had a similar shape to it. Issues caused by rapidly changing products caused highly dynamic user journeys for both employees and the IT professionals troubleshooting these issues. The tickets filed — like the spam emails — were similar, with slight differences in spelling and word choice.

Read More  Cloud Data Loss Prevention Is Now Automatic!

Density clustering

In contrast to more rigid approaches, such as centroid-based algorithms like k-means, density-based clustering is better suited to large and highly heterogeneous data sets, which may contain clusters of drastically variant size. This flexibility helps us tackle the task of problem identification across the entire scope of the company, which requires the ability to detect and distinguish small-but-significant perturbations in the presence of large-but-stable patterns.

Our implementation uses ClustOn, an in-house technology with a hybrid approach that incorporates density-based clustering. But a more time-tested algorithm such as DBSCAN — an open-source implementation of which is available via scikit-learn’s clustering module — could be leveraged to similar effect.

Middle of the road solution using ML

Piggy-backing off of what Gmail was able to do using density clustering techniques, the Techstop team built a robust solution to tracking problems in a way that solved the rigid taxonomy problem. With density clustering, the taxonomy buckets are redefined as trending clusters and provide an index of issues happening in real-time within the company. Importantly, these buckets emerge naturally, rather than being defined ahead of time by the Engineering or Tech Support teams.

By using the technology built for billions of email accounts, we knew we could handle the scale of Google’s support requests. And the solutions would be more flexible than a tightly defined taxonomy, without compromising on relevance or granularity.

The team took it one step further by modeling cluster behavior using Poisson regression and implemented anomaly detection measures to alert operations teams in real time about ongoing outages, or poorly executed changes. With a lightweight operations team and this new technology, Techstop was able to find granular insights that would have taken an entire dedicated team to manually comb through and aggregate each incident.

Read More  How To Manage Your GraphQL APIs With Apigee

The combination of ML and Operations transformed Techstop data into a valuable reference for product managers and engineering teams looking to understand the issues users face with their products in an enterprise environment.

How it works

To bring it all together, we built a ML pipeline that we call Support Insights, so we could automatically distill summary data from the many interactions and tickets we received. The Support Insights Pipeline combines machine learning, human validation and probabilistic analysis  together in a single systems dynamics approach.

As data moves through this pipeline, they are:

  1. Extracted – Uses the BigQuery API to store and extract, train and load support data. To ingest the 1M+ amount of IT related support data.
  2. Processed Part-of-speech tagging, PII Redaction and TF-IDF transformations to model support data for our clustering algorithms
  3. Clustered Centroid-based clustering runs in timed batches with persistent snaphotting of previous run states to maintain cluster ids and track behavior of clusters over time.
  4. Scored Uses Poisson Regression to model both long-term and short-term behavior of cluster trends and calculates the difference between the two to measure deviation. This deviation score is used to detect anomalous behavior within a trend.
  5. Operationalized Trends with an anomalous score over a certain threshold trigger an IssueTracker API bug. This bug is then picked up by operations teams for relevant deep dive and incident tracking.
  6. Resampled – Uses statistical methods to estimate proportions of customer user journeys (CUJs) within trends
  7. Categorized/mapped – We work with the Operations teams to map trend proportions to User Journey Segments
how it works

In our next post we’ll detail what technologies and methods we used for these seven steps, and walk through how you could use a similar pipeline yourself. To get started start by loading your data into BigQuery and use BigQuery ML to cluster your support data.

 

 

By Nicholaus Jackson, Efficiency Solutions Engineer | Max Saltonstall Senior Developer Relations Engineer, Google Cloud
Source Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • BigQuery;
  • Google Cloud
  • Spam
You May Also Like
View Post
  • Public Cloud

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

  • June 10, 2026
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
Google Cloud and ElevenLabs
View Post
  • Public Cloud
  • Technology

ElevenLabs Partners with Google Cloud for Cloud Services and the Latest NVIDIA Blackwell GPUs

  • February 26, 2026
View Post
  • Public Cloud

Delivering a secure, open, and sovereign digital world

  • February 12, 2026
View Post
  • Public Cloud

Formula E and Google Cloud Announce Multi-Year ‘Principal Partnership’

  • January 26, 2026
View Post
  • Public Cloud

Sawasdee Thailand! Google Cloud launches new region in Bangkok

  • January 23, 2026
View Post
  • Public Cloud

Retailers Help Mitigate Risk with Oracle’s AI-Driven Supply Chain Collaboration

  • January 11, 2026

Stay Connected!
LATEST
  • 1
    Expectations vs. Reality: The AI We Thought We’d Have in 10 Years
    • June 19, 2026
  • digital-nomad-freelancer-worker-2151205464 2
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 3
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 4
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 5
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 6
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 7
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 8
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 9
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 10
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 2
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 3
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 4
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 5
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.