aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Tech
  • Tools

Use Graphs For Smarter AI With Neo4j And Google Cloud Vertex AI

  • aster.cloud
  • January 16, 2022
  • 6 minute read

In this blog post, we’re going to show you how to use two technologies together: Google Cloud Vertex AI, an ML development platform, and Neo4j, a graph database. Together these technologies can be used to build and deploy graph-based machine learning models.

The code underlying this blog post is available in a notebook here.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Why should you use graphs for data science?

Many critical business problems use data that can be expressed as graphs. Graphs are data structures that describe the relationships between data points as much as the data themselves.

An easy way to think about graphs is as analogous to the relationship between nouns and verbs. Nodes, or the nouns, are things such as people, places, and items. Relationships, or the verbs, are how they’re connected. People know each other and items are sent to places. The signal in those relationships is powerful.

 

Graph data can be huge and messy to deal with. It is nearly impossible to use in traditional machine learning tasks.

Google Cloud and Neo4j offer scalable, intelligent tools for making the most of graph data. Neo4j Graph Data Science and Google Cloud Vertex AI make building AI models on top of graph data fast and easy.

Dataset – Identify Fraud with PaySim

Graph based machine learning has numerous applications. One common application is combating fraud in many forms. Credit card companies identify fake transactions, insurers face false claims and lenders look out for stolen credentials.

Statistics and machine learning have been used to fight fraud for decades. A common approach is to build a classification model on individual features of a payment and users. For example, data scientists might train an XGBoost model to predict if a transaction is fraudulent using the amount of transaction, its date and time, origin account, target accounts and resulting balances.

These models miss a lot of fraud. By channeling transactions through a network of fraudulent actors, fraudsters can beat checks that look only at a single transaction. A successful model needs to understand the relationships between fraudulent transactions, legitimate transactions and actors.

Graph techniques are perfect for these kinds of problems. In this example, we’ll show you how graphs apply in this situation. Then, we’ll show you how to construct an end-to-end pipeline training a complete model using Neo4J and Vertex AI. For this example, we’re using a variation on the PaySim dataset from Kaggle that includes graph features.

Read More  Growth, community and trust the 'building blocks' as Singapore refreshes Smart Nation strategies: PM Wong

Loading Data into Neo4j

First off, we need to load the dataset into Neo4j. For this example, we’re using AuraDS. AuraDS offers Neo4j Graph Database and Neo4j Graph Data Science running as a managed service on top of GCP. It’s currently in a limited preview that you can sign up for here.

 

 

AuraDS is a great way to get started on GCP because the service is fully managed. To set up a running database with the Paysim data, all we need to do is click through a few screens and load the database dump file.

Once the data is loaded, there are many ways to explore it with Neo4j. One is to use the Python API in a notebook to run queries.

For instance, we can see the node labels by running the query:

 

CALL db.labels() YIELD label 
CALL apoc.cypher.run('MATCH (:`'+label+'`) RETURN count(*) as freq', {}) 
YIELD value 
RETURN label, value.freq AS freq

 

In our notebook, this gives us the following:

 

The notebook gives examples of other queries including relationship types and transaction types as well. You can explore those yourself here.

Generating Embeddings with Neo4j

After you’ve explored your data set, a common next step is to use the algorithms that are part of Neo4j Graph Data Science to engineer features that encode complex, high dimensional graph data into values that tabular machine learning algorithms can use.

Many users start with basic graph algorithms to identify patterns. You can look at weakly connected components to find disjointed communities of account holders sharing common logins. Louvain methods are useful to find rings of fraudsters laundering money. Page rank can be used to figure out which accounts are most important. However, these techniques require you to know exactly the pattern you’re looking for.

visualization of two weakly connected components

 

A different approach is to use Neo4j to generate graph embeddings. Graph embeddings boil down complex topological information in your graph into a fixed length vector where related nodes in the graph have proximal vectors. If graph topology, for example who fraudsters interact with and how they behave, is an important signal, the embeddings will capture that so that previously undetectable fraudsters can be identified because they have similar embeddings to known fraudsters.

Read More  Red Hat Accelerates Petabyte-Scale Object Storage For Cloud-Native Workloads
graph embeddings, showing how graph topology translates into a fixed dimensional vector space

 

Some techniques make use of the embeddings on their own. For instance, using a t-sne plot to find clusters visually, or computing raw similarity scores. The magic really happens when you combine your embeddings with Google Cloud Vertex AI to train a supervised model.

For our PaySim example, we can create a graph embedding with the following call:

 

CALL gds.fastRP.mutate('client_graph',{ 
 relationshipWeightProperty:'amount', 
 iterationWeights: [0.0, 1.00, 1.00, 0.80, 0.60], 
 featureProperties: ['num_transactions', 'total_transaction_amnt'], 
 propertyRatio: 0.25, 
 nodeSelfInfluence: 0.15, 
 embeddingDimension: 16, 
 randomSeed: 1, 
 mutateProperty:'embedding' 
})

 

That creates a 16 dimensional graph embedding using the Fast Random Project algorithm. One neat feature in this is the nodeSelfInfluence parameter. This helps us tune how much nodes further out in the graph influence the embedding.

With the embedding calculated, we can now dump it into a pandas dataframe, write that to a CSV and push that to a cloud storage bucket where Google Cloud’s Vertex AI can work with it. As before, these steps are detailed in the notebook here.

Machine Learning with Vertex AI

Now that we’ve encoded the graph dynamics into vectors, we can use tabular methods in Google Cloud’s Vertex AI to train a machine learning model.

 

First off, we pull the data from a cloud storage bucket and use that to create a dataset in Vertex AI. The Python call looks like this:

 

dataset = aiplatform.TabularDataset.create(
 display_name="paysim", 
 gcs_source=os.path.join(
  "gs://", STORAGE_BUCKET, STORAGE_PATH, TRAINING_FILENAME
 ), 
)

 

With the dataset created, we can then train a model on it. That python call looks like this:

 

model = job.run(
 dataset=dataset, 
 target_column="is_fraudster", 
 training_fraction_split=0.8, 
 validation_fraction_split=0.1, 
 test_fraction_split=0.1, 
 model_display_name="paysim-prediction-model", 
 disable_early_stopping=False, 
 budget_milli_node_hours=1000, 
)

 

You can view the results of that call in the notebook. Alternatively, you can login to the GCP console and view the results in the Vertex AI’s GUI.

 

The console view is nice because it includes things like ROC curves and the confusion matrix. These can assist in understanding how the model is performing.

Vertex AI also offers helpful tooling for deploying the trained model. The dataset can be loaded into a Vertex AI Feature Store. Then an endpoint can be deployed. New predictions can be computed by calling that endpoint. This is detailed in the notebook here.

Read More  Announcing General Availability Of ReCAPTCHA Enterprise Password Leak Detection

Future Work

Working on this notebook, we quickly realized that there is an enormous amount of potential work that could be done in this area. Machine learning with graphs is a relatively new field, particularly when compared to the study of methods for tabular data.

Specific areas we’d like to explore in future work include:

Improved Dataset – For data privacy reasons, it’s very difficult to publicly share fraud datasets. That led us to use the PaySim dataset in this example. That is a synthetic dataset. From our investigation, both of the dataset and the generator that creates it, there seems to be very little information in the data. A real dataset would likely have more structure to explore.

In future work we’d like to explore the graph of SEC EDGAR Form 4 transactions. Those forms show the trades that officers of public companies make. Many of those people are officers at multiple companies, so we anticipate the graph being quite interesting. We’re planning workshops for 2022 where attendees can explore this data together using Neo4j and Vertex AI. There is already a loader that pulls that data into Google BigQuery here.

Boosting and Embedding – Graph embeddings like Fast Random Projection duplicate the data because copies of sub graphs end up in each tabular datapoint. XGBoost, and other boosting methods, also duplicate data to improve results. Vertex AI is using XGBoost. The result is that the models in this example likely have excessive data duplication. It’s quite possible we’d see better results with other machine learning methods, such as neural networks.

Graph Features – In this example we automatically generated graph features using the embedding. It’s also possible to manually engineer new graph features. Combining these two approaches would probably give us richer features.

Next Steps

If you found this blog post interesting and want to learn more, please sign up for the AuraDS preview here. Learn more about Vertex AI here. The notebook we’ve worked through is here. We hope you fork it and modify it to meet your needs. Pull requests are always welcome!

 

 

By: Ben Lackey (Director, Global Cloud Channel Architecture at Neo4j)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Document Graph
  • Google Cloud
  • Machine Learning
  • Neo4J
  • Vertex AI
You May Also Like
Data center
View Post
  • Data
  • Public Cloud

Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency

  • June 3, 2026
View Post
  • Data
  • Platforms
  • Technology

Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future

  • May 11, 2026
View Post
  • Data

Streamline read scalability with Cloud SQL autoscaling read pools

  • March 23, 2026
View Post
  • Data
  • Platforms
  • Public Cloud

PayPal’s historically large data migration is the foundation for its gen AI innovation

  • March 4, 2026
Smartphone hero image
View Post
  • Gears
  • Tech

Zed Approves | Smartphones for Every Budget Range

  • January 29, 2026
View Post
  • Technology
  • Tools

IBM Launches Enterprise Advantage Service to Help Businesses Scale Agentic AI

  • January 19, 2026
View Post
  • Data
  • Technology

3 obstacles to agentic AI adoption and how to overcome them

  • December 22, 2025
Points, Lines and a Question
View Post
  • Architecture
  • Design
  • Engineering
  • People

What Is The Point In Making Points?

  • November 26, 2025

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.