aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Programming

Compute And Storage Should Be Decoupled For Log Management At Scale

  • aster.cloud
  • June 7, 2021
  • 5 minute read

Guest post originally published on The New Stack by Tito George, co-founder, logic.ai

Most log management solutions store log data in a database and enable search by storing an index of the data. As the database grows in size, so does the index management cost. On a small scale, this isn’t problematic. But when dealing with large-scale deployments, organizations end up using lots of compute, storage and human resources just to manage their indexes, in addition to data itself. When companies are handling terabytes of data every day, the database-backed log management system becomes untenable.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Another common issue is that most log solutions don’t store just one set of data. Many DIY log management implementations use popular databases such as MongoDB, ElasticSearch and Cassandra. Let’s take ElasticSearch as an example. An ElasticSearch cluster runs several replicas of data in the hot store tier to ensure high availability. Even with data compression, the replication required to keep the data available still dramatically increases the total amount of storage necessary. The problem is magnified when you account for storage needed for indexes.

Clustering also increases the management complexity and requires users to understand how to manage node failures and data recovery. Even with replication, it is impossible to immediately spin up a new instance when an instance goes down. In most cases, there is some downtime when the log analytics system becomes unavailable. While this happens, data continues to come in because logs are generated in real-time. Catching up requires additional provisioning of resources. Because the real-time data never stops, it can be hard to get the log analytics system to catch up. One-click elasticity is critical to managing this at scale.

The challenges outlined above are classic examples of hidden “storage operations tax” that any DIY solution has to pay. The larger the scale, the higher the tax!  A company ingesting around one terabyte of data per day would need multiple terabytes of storage and a proportional amount of RAM if they wanted to keep 30 days worth of log data searchable.

Read More  Why 2021 Will Be The Year Of Low-Code

The way to solve this problem is by moving away from databases and using a scalable API storage layer. An API storage layer like Amazon Web Services‘ S3, which has traditionally been used for cold storage, fits this requirement quite well. It provides high availability and durability, infinite scale, the lowest price per GB and effectively takes your storage operations tax to zero. However, to make this work, one has to ensure that applications do not have the higher latency that is typical with cold storage.

 

Are You Keeping 30 Days’ Worth of Data?

Enterprises think they are keeping 30 days’ worth of log data in their hot storage, but they aren’t actually doing so. Most queries are in the form of periodically run reports that are not interactive with a user sitting at the console. This is especially true at scale when it is not uncommon to ingest hundreds of megabytes or gigabytes of log data in a minute. Interactive workflows in such environments focus on identifying relevant events and data patterns that are then programmed into a machine and converted to timely real-time notifications to the administrator. This means that most data does not need to be in hot storage at all but rather can be processed in-line during ingest or asynchronously at a later point in time.

There’s another good reason that companies move data into S3-compatible or other cold storage quickly. Reducing data duration in a database separates the data storage from compute and makes it easier for organizations to scale their storage and recover from crashed clusters. It’s dramatically cheaper to store data in cold storage than in a database, and scaling cold storage is easier than scaling a database.

Read More  Deep Links Crash Course : Part 3 Troubleshooting Your Deep Links

This approach, however, creates a new problem where we need to separate data into multiple tiers; hot and cold. Moving and managing data between the two tiers requires expertise. Considerations around what to tier, how often to move data and when to hydrate the hot tier with data from the cold tier now become business as usual. The “storage operations tax” just went up.

 

What if I Need Long-Term Data Retention?

In highly regulated environments, short-term retention is usually not an option as businesses must store data, index it and make data searchable for several years. The same problems exist, albeit at an even larger scale. The choice is between vast amounts of expensive primary storage or tiered storage architecture. With such requirements, it is not uncommon to have the tiered implementation with most of the data sitting in the cold tier, yet with significant data still in the hot tier (e.g., 30-day retention). The “storage operations tax” isn’t going anywhere, just increasing.

 

Eliminating Legacy Storage Architecture and Data Tiering

Companies use a tiered approach to storage because they fear losing the ability to search data in cold storage. If searching is necessary, an arduous request process makes accessing the logs slow and challenging. Running real-time searches on older data is impossible. For some application types, this isn’t a big deal. Still, for revenue-producing, critical path applications, it’s crucial to have quick, real-time access to logs and the ability to get the information out of them at a moment’s notice. Having multiple data tiers, where there is a “hot” store and a “cold” store, creates cost and management overhead, particularly for Day 2 operations. Moving everything to a hot store would be extremely expensive — so what if you could make cold storage your primary store?

Read More  Cloud Native Maturity Model 2.0

 

Making S3 Searchable or ‘Zero Storage Operations Tax’

What if we could make S3-compatible storage just as searchable as a database? The reason companies keep their log data in a database is to enable real-time searches. Still, in practice, most organizations are not keeping nearly as much historical data in databases as their official data retention policies dictate. Suppose any S3-compatible store can be just as searchable as a database. In that case, organizations can dramatically cut down the amount of data stored in databases and the accompanying computing resources needed to manage that data. The most recent data — say, one minute of data — can be stored on the disk, but after a minute, everything moves to S3. There’s no longer the need to run multiple instances of a database for high availability because if the cluster goes down, a new one can be spun up and pointed to the same S3-compatible bucket.

Moving log data directly to cold storage while ensuring real-time searchability makes it easier to scale, increases the log data’s availability and dramatically decreases costs, both on storage and computing resources. When log data is accessed directly in the cold storage, users don’t have to worry about managing indexes between hot and cold store tiers, rehydrating data, or building complex policies. It also means that companies can follow the data retention plans they have to ensure developers can access logs and use them to debug critical applications.


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • AWS
  • Cassandra
  • CNCF
  • Elasticsearch
  • logic.ai
  • MongoDB
  • The New Stack
You May Also Like
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Technology

Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future

  • May 11, 2026
View Post
  • Data

Streamline read scalability with Cloud SQL autoscaling read pools

  • March 23, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
View Post
  • Data
  • Technology

3 obstacles to agentic AI adoption and how to overcome them

  • December 22, 2025
aster-cloud-sms-pexels-tim-samuel-6697306
View Post
  • Programming
  • Software

Send SMS texts with Amazon’s SNS simple notification service

  • July 1, 2025
aster-cloud-website-pexels-goumbik-574069
View Post
  • Programming
  • Software

Host a static website on AWS with Amazon S3 and Route 53

  • June 27, 2025
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.