aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
code
  • Engineering
  • Software Engineering
  • Tools

Cloud Storage As A File System In AI Training

  • aster.cloud
  • November 23, 2021
  • 4 minute read

Cloud Storage is a common choice for Vertex AI and AI Platform users to store their training data, models, checkpoints and logs. Now, with Cloud Storage FUSE, training jobs on both platforms can access their data on Cloud Storage as files in the local file system.

This post introduces the Cloud Storage FUSE for Vertex AI Custom Training. On AI Platform Training, the feature is very similar.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Cloud Storage FUSE provides 3 benefits over the traditional ways of accessing Cloud Storage:

  • Training jobs can start quickly without downloading any training data.
  • Training jobs can perform I/O easily at scale, without the friction of calling the Cloud Storage APIs, handling the responses, or integrating with client-side libraries.
  • Training jobs can leverage the optimized performance of Cloud Storage FUSE.

The problems

Traditionally, training jobs have two ways to use data from Cloud Storage.

  1. They can use gsutil to download the entire dataset prior to training. This may take hours depending on the dataset size, which significantly slows down the start-up of the jobs.
  2. They can call Cloud Storage APIs directly or from a client library integrated. This way greatly adds complexity to the training code and thus the cost for development and maintenance.

Cloud Storage FUSE

Cloud Storage FUSE is a File System in User Space (FUSE) mounted on Vertex AI systems.

When you start a custom training job, the job sees a directory /gcs which contains all the Cloud Storage buckets as subdirectories. The job can visit the subdirectories (ie. buckets) when certain permissions are granted.

For instance, training jobs can read from file /gcs/example-bucket/data.csv to get the training data stored in object gs://example-bucket/data.csv

with open('/gcs/example-bucket/data.csv', 'r') as f:
  lines = f.readlines()

Training jobs can also write to the bucket:

Read More  Enhance Your Analysis With New International Google Trends Datasets In BigQuery
with open('/gcs/example-bucket/epoch3.log', 'a') as f:
  f.write('success!\n')

 

Permissions

Users can assign service accounts to the training jobs to configure their permissions for the Cloud Storage buckets.

  • If the training job is assigned without a service account, it is allowed to access all the buckets owned by the same project.
  • If the training job is assigned with a service account that has Cloud Storage Roles, it has the permissions given by the roles.

For instance, you may create a service account as

  • storage.objectAdmin to bucket A, and
  • storage.objectViewer to bucket B.

If you assign it to your training job, your training job will be able to

  • read and write in bucket A, and
  • read only in bucket B.

The training job will fail with error “permission denied” if it tries to write to bucket B.

Performance

The I/O is often a bottleneck for training jobs with large datasets. Here are some tips to improve the read throughput of the Cloud Storage FUSE:

  • Store data in large files to reduce the number of files used in the training. Fewer files mean less lookup overhead in locating and opening objects in Cloud Storage.
  • Use multiple threads. Higher concurrency utilizes the bandwidth better.
  • Keep the files warm. Files to be accessed frequently (ie warm) are generally better cached and have better performance being read.

Restrictions

Cloud Storage FUSE is not a POSIX compliant file system. Therefore, some usage in a POSIX file system would have unwanted results, which should be avoided.

Directories:

  • The root directory `/gcs` is not readable. If you run ls /gcs, you will get an “Input/output error”. However, it is okay to read the bucket root such as ls /gcs/example-bucket.
  • Renaming a directory is not atomic. A renaming operation interrupted would leave a partial result with some files in the new directory, while others in the old directory. A directory with too many direct and indirect files cannot be renamed.
Read More  Ballerina: A Programming Language For The Cloud

Files:

  • Hard links are not supported.
  • File metadata such as ownership, permissions, mtime, extended attributes, are not supported. Do not rely on file metadata for training logic.
  • Flushing files pushes the entire file to Cloud Storage, which is expensive. Closing a file leads to a flush. Therefore, one should avoid frequent file closes and flushes.
  • Concurrent write to a file would lead to data corruption.

Logs

You can find the logs from Cloud Storage FUSE to help you diagnose the errors in training.

  • First, you follow the link to the Cloud Log Explorer on the training job’s page in Pantheon. In the explorer, you can run queries to inspect the logs generated from your training job.
  • Second, you can view the logs with “gcsfuse” in the resource.labels.taskName property. For instance, the task name “workerpool0-0.gcsfuse” indicates the log is from the Cloud Storage FUSE mounted for the first worker “0” in the first worker pool “workerpool0”.

What’s next

You can find more information on Cloud Storage Fuse in documentation:

  • http://cloud/vertex-ai/docs/training/code-requirements#fuse
  • https://cloud.google.com/storage/docs/gcs-fuse

You can also find code samples using Cloud Storage FUSE for Vertex AI Custom Training:

  • https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/master/community-content

 

By: Oliver Zhuang (Software Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud Storage
  • Cloud Storage FUSE
  • Google Cloud
  • Python
  • Tutorial
  • Vertex AI
You May Also Like
View Post
  • Technology
  • Tools

IBM Launches Enterprise Advantage Service to Help Businesses Scale Agentic AI

  • January 19, 2026
Points, Lines and a Question
View Post
  • Architecture
  • Design
  • Engineering
  • People

What Is The Point In Making Points?

  • November 26, 2025
View Post
  • Software Engineering

Embedded Swift Improvements Coming in Swift 6.3

  • November 22, 2025
Visual Studio Code
View Post
  • Software Engineering

Visual Studio 2026 is here: faster, smarter, and a hit with early adopters

  • November 12, 2025
View Post
  • Software Engineering

Introducing Google Gen AI .NET SDK

  • October 24, 2025
View Post
  • Software Engineering

Julia 1.12 Highlights

  • October 13, 2025
View Post
  • Engineering
  • Software Engineering

Development gets better with Age

  • October 9, 2025
View Post
  • Software Engineering

The Growth of the Swift Server Ecosystem

  • September 27, 2025

Stay Connected!
LATEST
  • 1
    Expectations vs. Reality: The AI We Thought We’d Have in 10 Years
    • June 19, 2026
  • digital-nomad-freelancer-worker-2151205464 2
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 3
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 4
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 5
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 6
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 7
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 8
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 9
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 10
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
  • pope-leo-xiv-cq5dam-1500.844 2
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 3
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 4
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 5
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.