Exploring the Contemporary Place of Graph Databases

Many organizations continue to avoid the additional complexity of graph databases by forcing relationship workloads into relational databases, even though graph databases have been around for decades. The benefits of a graph database for relationship/network/tree-based workloads are substantial, even if an additional database platform is required. 

Graph databases excel at avoiding complex joins. If you’ve ever used a self-referential join to simulate a graph traversal, you’ll find that if you switch to graph, you can write it much more efficiently as a graph traversal. It decreases from dozens of LOC to 2-3 lines. Graph databases are also excellent at capturing inter-entity dependencies between individual entries.

Whether property or semantic/RDF graphs, a maturity model for the use of graph databases has emerged. The visualization capabilities of graph databases are among the most highly regarded for data exploration and serve as a starting point in the use of graph databases for many. The use of graph algorithms such as Betweenness, PageRank, Closeness, Eigen Centrality, and Clustering Coefficient follows shortly thereafter. 

A mature use of graphs is turning to graphs for artificial intelligence/machine learning. This is when we can truly gain insights; for instance, to identify subcommunities, resolve entities, or determine the important nodes. Important can be highly context-dependent; for example, in a supply chain telecommunications network, important may refer to nodes whose failure would jeopardize the entire network. In a social network, it could be identifying influencers to communicate with.

This use is really graph-based probabilistic machine learning model training. Models based on graph data can provide more precise results than those based on standard datasets. 

The use cases for graph databases are many but fall into the realm of relationship/network/tree applications. Fraud detection specifically in healthcare and the financial verticals, entity resolution, network attack prevention, healthcare fraud, online shipping support, insurance risk, media spend analysis, preventative maintenance and pharmaceutical research are some of the top uses of graph databases.

Graph databases are developing natural language interfaces for graphs and creating graphs from unstructured data. They combine the domain specific knowledge in a graph with the general knowledge in an LLM and combine the structured knowledge in a graph with the unstructured knowledge in an LLM.

Graph databases combine domain-specific knowledge from a graph with general knowledge from an LLM by connecting the two through relationships. For instance, a graph may link the properties of an entity from the knowledge graph to the entity’s definition from the LLM. In this way, the domain-specific context from the graph can be supplemented with general knowledge from the LLM to enhance understanding of a given situation.

If there is an impending threat to the continued use of graph databases, it is vector databases. Graph embeddings are the data structures used by vector databases to quickly compare similar data structures. All attributes are compressed by graph embedding, which utilizes machine learning algorithms to calculate hundreds of attributes per value. Embeddings focus on performance, not explainability. As a low-level representation of an item, embeddings are ideal for “fuzzy” match problems.

The use of vector databases is commonly seen in machine learning recommendation systems, and similarity search applications. Items that are near each other in this embedding space are considered similar to each other in the real world. Graph databases are less ideal for managing this high-dimensional data and are not as scalable in this way as vector databases.

When calculating an embedding, certain ones may only consider customer purchases from the previous year. Other algorithms may consider the customer’s lifetime purchases and searches since their initial visit to the website. Since these embeddings consume valuable RAM, there is a trade-off, and they should be avoided for infrequently compared items.

We see room for both graph databases and vector databases in the architectures being designed today. Graph databases are better suited to processing data with complex relationships, whereas vector databases are better suited to managing high-dimensional data, such as images and video.

Graph databases are designed for relationship-based queries, whereas vector databases excel at similarity searches. Graph databases employ graph traversal techniques to identify relationships between nodes. Vector databases utilize algorithms such as k-Nearest Neighbors (k-NN) to identify comparable vectors and are poised to flourish at managing complex relationships and interconnected data. We will see where the vector database vendors take their roadmap to understand the impact on graph databases.

With graph databases today, it’s all about the use case. There are many that are good for graph such as real-time recommendations, fraud detection and risk, network and IT operations, entity resolution and identifying the relative Importance of nodes.