A Guide to Graph Databases
Graph Databases: How They Work, When to Use Them & the Advantages They Offer
In many of your business dealings, you see natural data relationships that evolve. In such a blended and interconnected world, it’s normal to see the evolution of data as something dynamic and constantly changing while also integrating networks of people and relationships between them. To support their communities and offer the best possible user experience, companies now need to track relationships between millions of people interacting in a myriad of divergent ways.
Financial services companies track behaviors and money flows across their account holders to upsell their services, detect fraud and prevent loss. Project managers track the interrelationships between vendors and timelines to plan and implement their project goals. Virtually every industry has some form of interaction or interrelationship that benefits from tracking flows of data and resources across various channels in an interconnected framework.
In this article you will learn how graph databases can be used to simplify handling these relationships between data while also making it easier for developers and data analysts to use that data to drive business decisions.
What is a graph database?
A graph database is a specialized NoSQL database designed for storing and querying data that is connected via defined relationships. Data points in a graph database are called nodes and these nodes are connected to related data via edges. The data attached to each node are known as properties. Graph databases aren’t restricted by predefined schema like relational databases, and this flexibility allows for data to be connected naturally through the life of an application.
Because of their simplicity and ease of use, graph databases are quickly becoming one of the fastest-growing categories in data management.
Graph database use cases
Developers and analysts use graph databases for a range of use-case scenarios. As you use relationships to process the transactions in your graph databases, you can detect scenarios where a single purchase relates to other data related to a customer, products, regional data, and other data.
With a graph database, you can process purchase and financial transactions in (almost) real-time, which means you can prevent fraud. With a graph database, you can easily detect if a certain email address and credit card are related to other fraudulent charges.
With fraud detection, you can also differentiate accounts where a single email address is being used for multiple people. You can find scenarios where various people are associated with a single IP address, even though they have multiple physical addresses in different accounts.
Master data management (MDM)
Master data management (MDM) is a record of everything that’s essential about your company’s operations. It could include everything about accounts, business units, customers, locations, partners, products, and users. With a graph database, you can connect all that master data to solve pressing business questions. With its immediate business value, you can gain a competitive advantage as you’re able to manage your connected data better and understand your networks.
Network and IT operations
You can easily connect your monitoring tools across your network and IT operations with a graph database. Not only can you gain valuable performance insights, but you can better gauge vulnerabilities, troubleshoot solutions, conduct capacity planning, and better prepare your organization with impact analysis based on user guides.
Identity and access management (IAM)
You can identify and manage changing authorizations, groups, roles, and products with a graph database. As these interrelationships become ever-more complex, you can track all the data and better control access for your native graph with real-time results. With the interconnected nature of a graph database, you can support an intuitive access management relationship. You can be faster and more accurate while ensuring a greater level of efficiency across your organization.
You can easily store a customer’s friends, interests, and purchase history with a graph database. Based on your analysis of what you can see about the relationships between these variables, you can offer a recommendation engine that will offer ideas of what the user will like and prefer. For example, you can extrapolate with a high degree of accuracy that a customer might like products like those another user bought if/when they have the same purchasing history and behavior.
Why should you use a graph database?
A graph database allows you to quickly and easily store data and analyze the relationships among data, so you can better understand the myriad of possible outcomes.
Graphs are everywhere
The most obvious example of a graph database is a social network, but you can see them in business transactions, recommendations based on connections, routing, and the logistics involved in optimal paths related to things like supply chain management.
Support simple modeling
With a graph database, you model based on understanding the problem, so it’s much cleaner and more simplified. It’s an easy-to-understand model which you can use to represent and store complex data.
Use structured or unstructured data
With a graph database, you can support a range of data demands with structured, non structured, and even a hybrid solution to meet your needs.
While almost any graph query could be performed on a relational database using SQL, the query would be extremely complex. Most graph databases have query languages built around the idea of working with edges and nodes and traversing a graph structure. The result is simpler queries that are faster to write and easier to understand.
Here’s an example showing the difference in query complexity between standard SQL versus the Cypher query language used with the Neo4J graph database. The query is grabbing a territory description by using the name of a sales rep employed by the company.
SELECT e.LastName, et.Description FROM Employee AS e JOIN EmployeeTerritory AS et ON (et.EmployeeID = e.EmployeeID) JOIN Territory AS t ON (et.TerritoryID = t.TerritoryID);
MATCH (t:Territory)<-[:IN_TERRITORY]-(e:Employee) RETURN t.description, collect(e.lastName);
The query using Cypher is only 2 lines compared to 4 lines to SQL. This difference in number of lines and complexity only gets larger when you want to grab information from more relationships.
Joins are also very expensive in terms of performance, trying to join values between many tables will result in very slow queries for large data sets. In comparison, these types of queries using a graph database will still be fast even at large scale.
Queries directly from relations
With a graph database, you can query directly from one relation. So, instead of creating three queries, you can more quickly arrive at your answer without the hassle of multiple steps.
Achieve better performance
Graph databases use a simple index, so you see improved efficiency with query performance. Since the queries are broken into sub-queries, they run concurrently to achieve high throughput and low latency. And because graph databases are designed for running graph traversals, they are more efficient in terms of required hardware resources.
Get a visualization
With a graph database, it’s important to visualize the data to understand it better and draw conclusions. You can see parts of the stored relations and entities along with associated properties. Most graph databases will provide a variety of tools or integrations to make visualizing your data easy.
You can quickly and easily add properties to the relations with graph databases. While there are other database models that you could select, graph databases continue to offer the high-quality solution you need to deliver on time and on budget. They're also a great way to avoid the monumental headache of figuring out how to achieve the same results with other methods.
What are the types of graph databases?
Graph databases are often broken down into two main types by their data model: RDF graphs and property graphs. The RDF graph focuses on data integration, while the property graph involves queries and analytics. These database types are similar in that they both consist of points (vertices) with the interrelationships between those points (edges).
RDF graphs (RDF stands for Resource Description Framework) are designed to conform to W3C (World Wide Web Consortium) standards. It's a shift from storing data as a relational database. It expresses information in graphs using 3 parts: object, predicate, and subject.
Property graphs are more versatile representations, so they’re more commonly used across various industries. A property graph models relationships among data points, with detailed information about the subject and how that data interrelates.
How do graphs and graph databases work?
Graphs and graph databases work on relationship principles. You can follow those connections through the data lifecycle since your connected data is equally or more important than any single data point. You start with the idea, move to design, and then implement and operate with your query language. Since you’re not inferring data connections, your data is more expressive and simpler than other relational database structures.
Components of a graph database
There are 3 main components of a graph database. The first are nodes, which represent an entity like a product, user, event, or place. The second component of a graph database are properties, which can be added to these nodes to give more context, for example a user node might have properties like username, email address, interests, and many other potential properties. The third component are the edges or relationships that connect nodes in the graph. These edges can be directed or undirected. For example it might make sense to have a directed edge if you are connecting a manager and their direct report. Edges can also have values attached, for something like a map where the edge represents a road between cities the edge could represent the number of miles between the two cities.
Graph database architecture and design
From a design perspective graph databases provide better performance due to a variety of optimizations compared to more general purpose databases. The most obvious is how data is mapped in-memory compared to stored on disk.
A native graph database uses what is called index-free adjacency. What this means is that on disk each node actually stores pointers to connected nodes. The result is that for great performance the database doesn’t need to store a large index in RAM, because it’s already available via the node itself. This also means that performance stays the same regardless of how large your graph is. It only depends on how many nodes you are traversing
In comparison if you were using a relational database you would have to join tables together at query time and this would get slower as the tables get larger. The alternative would be to have a massive index in memory, but this is also costly.
What are the advantages of graph databases?
When you’re dealing with data that is highly relational in nature, a graph database offers greater performance, with a consistency that’s essential as your data continues to grow. A graph database is a great solution when you have real-time queries involving big data analysis, even as your data continues to expand.
With a graph database, you’re better able to solve problems in ways that are just not practical with relational databases. Consider hypothetical situations that will offer your interconnected data the most practical situation for a graph database before you lock it in.
AI and machine learning friendly
Graph databases are a natural fit for use with machine learning and artificial intelligence. By using a graph database you can find valuable business insights by finding patterns and connections between your data that might otherwise be missed. By using a graph database you have a scalable data store that can quickly be used to train models and create predictions on your data.
Some examples of problems that can be solved by combining a graph database with machine learning would be finding valuable steps in customer acquisition journeys, personalizing services and platforms, finding users across multiple platforms, fraud prevention by finding non-obvious but connected behavior, and much more.
With a graph database, there are no hidden assumptions. The semantics are clear and explicit. With object-oriented thinking, you have the fine control to keep the data in place without hidden assumptions.
With a graph database, you have a flexible platform to discover connections. You can analyze your data based on quality or strength compared to other data in your database. You also have the ability to simply add more properties or types of nodes as your application grows, without having to worry about schema changes.
Accessible recursive path queries
You can find direct and indirect connections between data with real-life queries with a graph database. That level of accessibility is important as you bundle the queries together and look for patterns related to your product and how it interconnects with your audience data.
You can manage big data with a graph database by combining and hierarchizing multiple dimensions. So, you could segment a group based on different dimensions: time, demographics, geo dimensions, and more.
With a graph database, you can easily aggregate and group relevant data in a way that would be impractical with relational databases. So, business analysts and data scientists can conduct virtually any analytical query on a graph database.
What are the disadvantages of graph databases?
There’s always tradeoffs with any technology. It’s not perfect, and you should understand the disadvantages and limitations of using graph databases. Here are a few reasons why you might not want to use a graph database.
No standard query language
With graph databases, there is no standardized query language. The language depends on the platform used, which could be an advantage or disadvantage depending on your situation. This generally means developers will need to learn a new query language which increases time to adopt a graph database and will increase onboarding time for new hires.
This situation may change in the near future, in 2019 a proposal was made for a standard language called GQL (Graph Query language) and approved by an ISO/IEC committee. GQL is intended to be a declarative language similar to SQL but borrow features from current graph query implementations like Cypher and GSQL.
Graphs are not the right choice for apps that require transactions. They’re just not efficient when processing high volumes of transactional data. They also struggle to process queries that span the entire database.
Graph databases have a relatively small user base compared to relational databases, so it may be difficult to find the support you need to optimize further, maintain, or grow your graph database as your company continues to develop.
Graph database examples
Neo4J is currently the most popular graph database on the market. It is open source and provides great performance along with the very productive Cypher query language to make working with your data easy. Neo4J offers cloud and self-hosted enterprise versions of their database in addition to their open source product. They also have a tight integration with the data science ecosystem and provide a data science platform that allows you to build custom models or use 65 pre-built algorithms and models to get insights into your data.
TigerGraph is a proprietary graph database provided by a company of the same name. TigerGraph has built-in support for creating visualizations, doing common tasks related to working with graph data, and also has features for common data science tasks. It has its own query language known as GSQL for accessing your data. Performance is TigerGraph’s key selling point, claiming to support queries that can traverse 10 or more hops and scale to trillion of edges.
AWS Neptune is a hosted graph database provided by Amazon Web Services It supports both types of graph data models, property and RDF. It automatically provides read replicas, backups, and replication across data centers. For querying Neptune supports Gremlin and SPARQL.
Graph database FAQ
How do graph databases and graph analytics work?
Graph databases are perfectly in sync with graph analytics. Graph analytics, or network analysis, explores relationships between customers, devices, operations, and products. You can then leverage that information to gather insights that offer insights that are valuable in your sales and marketing efforts and how you’ll interact with your audience via social media.
The graph analytics market is expected to reach more than $2 billion by 2024. With such dramatic market growth, graph analytics are more important than ever. You can use graph analytics to extrapolate projects for your company’s growth potential, with direct correlations with impacts on your supply chain.
Graph database vs relational database
The fundamental difference between graph databases and relational databases is how their data is stored and formatted. The most important thing to keep in mind is that one isn’t necessarily better than the other, they both make tradeoffs to better serve their intended use case.
A strength of relational databases is that their structure of columns is known by the database which brings a number of benefits. On the other hand this also means that making changes to that structure isn’t as easy compared to a graph database or any other schemaless database.
A relational database will be better for workloads where you are often looking up specific values or doing searches for data that fit some sort of category or value. A graph database will be useful in situations where you would be doing the sort of queries that involve joining tables in a relational database.
Why are graph databases growing in popularity?
Graph databases continue to grow in popularity as they become the foundation of modern data analytics capabilities. Some experts project that they might make up as much as 80% of current data and analytics innovations. That’s a trend that’s expected to continue as organizations continue to search for ways to better leverage data to their advantage via relationships or edges between data points or nodes.
With their efficacy and scalability across networks, a graph database, graph technologies, and graph relationships will continue to prove their value and become ever-more enmeshed and integral for business use in the technological landscape. A graph database is ideal for your storage of data, so you can more easily retrieve the data that is independent but still related in multiple ways.
Available as InfluxDB Open Source, InfluxDB Cloud & InfluxDB Enterprise