Neo is a network-oriented database for semi-structured information.
Too complicated, let us try again. Neo handles data in networks
– nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is difﬁ cult to handle in static tables. It could mean we can go agile all the way into the persistence layer.
The relational database represents one of the most important developments in the
history of computer science. Upon its arrival some 30 years ago, it revolutionized the
way the industry views data management and today it is practically ubiquitous.
In fact, it is so taken for granted that we, as an industry, have stopped thinking. Could
there be a better way to represent and store our data? In some cases the answer is
– yes, absolutely. The relational database is showing its age. Some telltale signs:
- The mismatch between relational data and object oriented programming.
- Schema evolution – updating your data model when the domain model changes
– is just so much manual labor.
- Semi-structured or network-oriented data is difﬁ cult to handle.
The Neo Database
Neo is a database that is built with a different philosophy. Where the relational da-
tabase squeezes all data into static tables, Neo uses a ﬂ exible and dynamic network
model. This model allows data to evolve more naturally as business requirements
change. There’s no need for “alter table…” on your production databases after you
introduce a new version of the business layer and no need to rewire and migrate
your O/R mapping conﬁ gurations. The network will evolve along with your busi-
ness logic. This spells agility.
Neo is an embedded persistence engine, which means it’s a small, lightweight and
non-intrusive Java library that is easy to include in your development environment.
It has been designed for performance and scalability and has been proven to handle
large networks of data (100+ millions of nodes, relationships and properties).
Neo is a newly founded open source project, but the software is robust. It has
been in commercial production in a highly demanding 24/7 environment for al-
most four years and has full support for enterprise-grade features such as distributed
ACID transactions, conﬁ gurable isolation levels and full transaction recovery.
But so much for sweet talk, let’s cut to some code!
Model and Code
In the Neo model, everything is represented by nodes, relationships and properties.
A relationship connects two nodes and has a well-deﬁ ned, mandatory type. Prop-
erties are key-value pairs that are attached to both nodes and relationships. When
you combine nodes, relationships between them and properties on both nodes and
relationships they form a node space – a coherent network representing your busi-
ness domain data.
This may sound fancy, but it’s all very intuitive. Here is how a simple social network
might be modeled:
Figure 1: An example of a social network from a somewhat famous movie. Note the different type on the relation between Agent Smith and his creator The Architect.
Note how all nodes have integer identiﬁ ers and how all relationships have a type
(KNOWS or CODED_BY). In this example, all nodes have a “name” property. But
some nodes have other properties, for example, an “age” property (node 1) or a
“last name” property (node 3). There’s no overall schema that forces all nodes to
look the same. This allows Neo to capture so-called semi-structured information:
information that has a small amount of mandatory attributes but many optional at-
tributes. Furthermore, the relationships have properties as well. In this example, all
relationships have an “age” property to describe how long two people have known
each other and some relationships have a “disclosure” property to describe whether
the acquaintance is secret.
Working with nodes and relationships is easy. The basic operations are as follows:
This is an intuitive representation of a network and probably similar to many other
implementations that want to represent a network of data in an object-oriented
It’s worth noting, however, that relationships in this model are full-blown objects
and not just implicit associations between nodes. If you have another look at the
social network example, you’ll see that there’s more information in the relationships
between nodes than in the nodes themselves. The value of a network is in the con-
nections between the nodes and Neo’s model captures that.
Creating a Node Space
And now, ﬁnally some code. Here’s how we would create the Matrix social network
from ﬁgure 1:
Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
// Create Trinity, Cypher, Agent Smith, Architect similarly
As you can see in the code above: It is rather easy to construct the node space for our
Matrix example. And, of course, our network is made persistent once we commit.
Traversing a Node Space
Now that we know how to represent our domain model in the node space, how do
we get information out of it? Unlike a relational database, Neo does not support a
declarative query language. Instead, Neo provides an object-oriented traverser frame-
work that allows us to express complex queries in plain Java.
Working with the traverser framework is very straight-forward. The core abstrac-
tion is, unsurprisingly, the Traverser interface. A Traverser is a Java
encapsulates a “query” – i.e. a traversal on the node space such as “give me all Mor-
pheus’ friends and his friends’ friends” or “does Trinity know someone who is acquainted with an agent?”. The most complex part of working with a Traverser is instantiating it. Here’s an example of how we would create a Traverser that will return all the
(transitive) friends of the “Thomas Anderson” node of the example above:
// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(
Here we can see that traversers are created by invoking the
on a start node with a number of parameters. The parameters control the traver-
sal and in this example they tell Neo to traverse the network breadth-ﬁrst (rather
than depth-ﬁ rst), to traverse until it has covered all reachable nodes in the network
(StopEvaluator.END_OF_NETWORK), to return all nodes except the ﬁ rst
(Returna-, , and to traverse all OUTGOING relation-
ships of type KNOWS.
How would we go about if we wanted to list the output of this traversal? After
we’ve created a Traverser, working with it is as easy as working with any Java
// Traverse the node space and print out the result
for ( Node friend : friendsTraverser )
System.out.println( friend.getProperty( “name” ) + “ at depth “ +
Running the traversal above on the Matrix example would yield the following out-
Morpheus at depth 1
Trinity at depth 1
Cypher at depth 2
Agent Smith at depth 3
As you can see, the Traverser has started at the “Thomas Anderson” node and run
through the entire network along the KNOWS relationship type, breadth ﬁ rst, and
returned all nodes except the ﬁ rst one. “The Architect” is missing from this output
since the relationship connecting him is of a different type, CODED_BY. This is a
small, contrived example. But the code would work equally well on a network with
hundreds of millions of nodes, relationships and properties.
Now, let’s look at a more complex traversal. Going with our example, suppose
that we wanted to ﬁ nd all “hackers of the Matrix,” where we deﬁ ne a hacker of the
Matrix as any node that you reach through a CODED_BY relationship. How would
we create a Traverser that gives us those nodes?
First off, we want to traverse both our relationship types (KNOWS and COD-
ED_BY). Secondly, we want to traverse until the end of the network and lastly, we
want to return only nodes which we came to through a CODED_BY relationship.
Here’s the code:
// Instantiate a traverser that returns all hackers of the Matrix
Traverser hackerTraverser = mrAnderson.traverse(
public boolean isReturnableNode( TraversalPosition pos )
isType( MatrixRelationshipTypes.CODED_BY );</strong>
Now it’s getting interesting! The
stant from the previous example was actually a convenience implementation of the
ReturnableEvaluator interface. This interface contains a single method and you
can supply a custom implementation of it to the traverser framework. It turns out
that this is a simple but powerful way to express complex queries.
Setting aside the anonymous inner class cruft surrounding the code in bold, we
basically pass in a snippet of code that checks whether we traversed a relationship of
type CODED_BY to get to the current node. If this statement is evaluated to “true”
then the current node will be included in the set of nodes that is returned from the
When executed with a simple print loop, the above code prints the following:
StopEvaluators work the same way. In our experience, writing custom evaluators
is very easy. Even the most advanced applications we have developed with Neo
– applications that traverse extremely large and complex networks – are based on
evaluators that are rarely more than a few lines of code.
Neo is not a silver bullet and some areas needs to improve, for instance tools, stand-
ardizing the model and a query language.
However, if your data is naturally ordered in a network or is semi-structured or you
just need to go truly agile, give the Neo database a run for your money. We hope
you ﬁnd it, as we do, to be an elegant and ﬂ exible alternative that is both robust and
Emil Eifrém, Neo Technology
Björn Granvik, Jayway
Originally published in JayView.