Neo – a netbase

Neo is a network-oriented database for semi-structured information.
Too complicated, let us try again. Neo handles data in networks
– nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is diffi cult to handle in static tables. It could mean we can go agile all the way into the persistence layer.

The relational database represents one of the most important developments in the
history of computer science. Upon its arrival some 30 years ago, it revolutionized the
way the industry views data management and today it is practically ubiquitous.

In fact, it is so taken for granted that we, as an industry, have stopped thinking. Could
there be a better way to represent and store our data? In some cases the answer is
– yes, absolutely. The relational database is showing its age. Some telltale signs:

  • The mismatch between relational data and object oriented programming.
  • Schema evolution – updating your data model when the domain model changes
    – is just so much manual labor.

  • Semi-structured or network-oriented data is diffi cult to handle.

The Neo Database

Neo is a database that is built with a different philosophy. Where the relational da-
tabase squeezes all data into static tables, Neo uses a fl exible and dynamic network
model. This model allows data to evolve more naturally as business requirements
change. There’s no need for “alter table…” on your production databases after you
introduce a new version of the business layer and no need to rewire and migrate
your O/R mapping confi gurations. The network will evolve along with your busi-
ness logic. This spells agility.
Neo is an embedded persistence engine, which means it’s a small, lightweight and
non-intrusive Java library that is easy to include in your development environment.
It has been designed for performance and scalability and has been proven to handle
large networks of data (100+ millions of nodes, relationships and properties).
Neo is a newly founded open source project, but the software is robust. It has
been in commercial production in a highly demanding 24/7 environment for al-
most four years and has full support for enterprise-grade features such as distributed
ACID transactions, confi gurable isolation levels and full transaction recovery.
But so much for sweet talk, let’s cut to some code!

Model and Code

Representation

In the Neo model, everything is represented by nodes, relationships and properties.
A relationship connects two nodes and has a well-defi ned, mandatory type. Prop-
erties are key-value pairs that are attached to both nodes and relationships. When
you combine nodes, relationships between them and properties on both nodes and
relationships they form a node space – a coherent network representing your busi-
ness domain data.
This may sound fancy, but it’s all very intuitive. Here is how a simple social network
might be modeled:

Figure 1

Figure 1: An example of a social network from a somewhat famous movie. Note the different type on the relation between Agent Smith and his creator The Architect.

Note how all nodes have integer identifi ers and how all relationships have a type
(KNOWS or CODED_BY). In this example, all nodes have a “name” property. But
some nodes have other properties, for example, an “age” property (node 1) or a
“last name” property (node 3). There’s no overall schema that forces all nodes to
look the same. This allows Neo to capture so-called semi-structured information:
information that has a small amount of mandatory attributes but many optional at-
tributes. Furthermore, the relationships have properties as well. In this example, all
relationships have an “age” property to describe how long two people have known
each other and some relationships have a “disclosure” property to describe whether
the acquaintance is secret.

Working with nodes and relationships is easy. The basic operations are as follows:

Figure 2

This is an intuitive representation of a network and probably similar to many other
implementations that want to represent a network of data in an object-oriented
language.
It’s worth noting, however, that relationships in this model are full-blown objects
and not just implicit associations between nodes. If you have another look at the
social network example, you’ll see that there’s more information in the relationships
between nodes than in the nodes themselves. The value of a network is in the con-
nections between the nodes and Neo’s model captures that.

Creating a Node Space

And now, finally some code. Here’s how we would create the Matrix social network
from figure 1:

As you can see in the code above: It is rather easy to construct the node space for our
Matrix example. And, of course, our network is made persistent once we commit.

Traversing a Node Space

Now that we know how to represent our domain model in the node space, how do
we get information out of it? Unlike a relational database, Neo does not support a
declarative query language. Instead, Neo provides an object-oriented traverser frame-
work
that allows us to express complex queries in plain Java.
Working with the traverser framework is very straight-forward. The core abstrac-
tion is, unsurprisingly, the Traverser interface. A Traverser is a Java Iterable that
encapsulates a “query” – i.e. a traversal on the node space such as “give me all Mor-
pheus’ friends and his friends’ friends”
or “does Trinity know someone who is acquainted with an agent?”. The most complex part of working with a Traverser is instantiating it. Here’s an example of how we would create a Traverser that will return all the
(transitive) friends of the “Thomas Anderson” node of the example above:

Here we can see that traversers are created by invoking the traverse(...) method
on a start node with a number of parameters. The parameters control the traver-
sal and in this example they tell Neo to traverse the network breadth-first (rather
than depth-fi rst), to traverse until it has covered all reachable nodes in the network
(StopEvaluator.END_OF_NETWORK), to return all nodes except the fi rst (Returna-
bleEvaluator.ALL_BUT_START_NODE)
, , and to traverse all OUTGOING relation-
ships of type KNOWS.
How would we go about if we wanted to list the output of this traversal? After
we’ve created a Traverser, working with it is as easy as working with any Java Iter-
able:

Running the traversal above on the Matrix example would yield the following out-
put:

As you can see, the Traverser has started at the “Thomas Anderson” node and run
through the entire network along the KNOWS relationship type, breadth fi rst, and
returned all nodes except the fi rst one. “The Architect” is missing from this output
since the relationship connecting him is of a different type, CODED_BY. This is a
small, contrived example. But the code would work equally well on a network with
hundreds of millions of nodes, relationships and properties.
Now, let’s look at a more complex traversal. Going with our example, suppose
that we wanted to fi nd all “hackers of the Matrix,” where we defi ne a hacker of the
Matrix as any node that you reach through a CODED_BY relationship. How would
we create a Traverser that gives us those nodes?
First off, we want to traverse both our relationship types (KNOWS and COD-
ED_BY
). Secondly, we want to traverse until the end of the network and lastly, we
want to return only nodes which we came to through a CODED_BY relationship.
Here’s the code:

Now it’s getting interesting! The ReturnableEvaluator.ALL_BUT_START_NODE con-
stant from the previous example was actually a convenience implementation of the
ReturnableEvaluator interface. This interface contains a single method and you
can supply a custom implementation of it to the traverser framework. It turns out
that this is a simple but powerful way to express complex queries.
Setting aside the anonymous inner class cruft surrounding the code in bold, we
basically pass in a snippet of code that checks whether we traversed a relationship of
type CODED_BY to get to the current node. If this statement is evaluated to “true”
then the current node will be included in the set of nodes that is returned from the
traverser.
When executed with a simple print loop, the above code prints the following:

StopEvaluators work the same way. In our experience, writing custom evaluators
is very easy. Even the most advanced applications we have developed with Neo
– applications that traverse extremely large and complex networks – are based on
evaluators that are rarely more than a few lines of code.

Conclusion

Neo is not a silver bullet and some areas needs to improve, for instance tools, stand-
ardizing the model and a query language.
However, if your data is naturally ordered in a network or is semi-structured or you
just need to go truly agile, give the Neo database a run for your money. We hope
you find it, as we do, to be an elegant and fl exible alternative that is both robust and
fast.

Emil Eifrém, Neo Technology
Björn Granvik, Jayway

Links

Neo specification
www.neo4j.org

Originally published in JayView.

Leave a Reply

Close Menu