Neo4j.rb 2.0 – An Overview

On the 14th of May 2008 the first commit was made. Now, four years, 2045 RSpecs and 1480 commits by 31 contributors later Neo4j.rb has reached version 2.0. Here are some of the news:

  • The neo4j gem has been split up into three gems, neo4j-core, neo4j-wrapper and neo4j
  • Lots of refactoring (which probably also means better performance)
  • More consistent API
  • Cypher DSL with Rule integration.
  • In this blog I will give you an overview of Neo4j.rb.

    Neo4j.rb – a New Type of Object Database ?

    Graph and object databases (ODBMS) have many similar advantages. They are both good choices for data containing many relationships and composite objects. It’s natural to map objects in an object oriented programming language to nodes,properties and relationships in a graph. That means you can use a graph database like it was an object database. For example, instead of using a database schema you can specify the structure of the graph database by using Ruby classes. You can also visualize the graph database in order to understand how your objects are related by simply following relationships between nodes.

    Another reason for choosing a graph database is for solving graph related problems like finding critical paths in projects, implementing spatial algorithms or route planning. Some of these problems are impossible to solve using a different type of database. In this blog I will show how the graph database Neo4j can be used to implement a recommendation algorithm as well as using it like an object database for creating complex domain models.

    The first question to ask before using Neo4j is which flavour of Neo4j you want – the embedded or the standalone server version ?

    Embedded or Server Neo4j ?

    There are two different ways of using Neo4j – as a standalone database with a REST protocol or using it as an embedded database with a programmatic API.

    Some of the advantages of using it as a standalone database is that it’s possible to use it from any language and multiple clients that can talk REST (for example see the neography gem). It is also available on Heroku as an addon. In this blog we are going to use the “unplugged” embedded Neo4j version which only can be used from JRuby.

    The advantage of the embedded Neo4j is better performance due to the direct use of the Java API. This means you can write queries in plain Ruby! Another advantage of the embedded Neo4j is that since it’s an embedded database there is one less piece of infrastructure (the database server) to install. The embedded database is running in the same process as your (Rails) application. Since JRuby has real threads there is no need to start up several instances of the database or of the Ruby runtime since JRuby can utilize all available cores on the CPU. There is actually even no need to start the database at all as it will be started automatically when needed. Notice it’s still possible to use the REST protocol or the web admin interface from an embedded Neo4j, see the neo4j-admin gem.

    So which should I choose ? Well, if you can’t use JRuby or you don’t need an Active Model compliant Neo4j binding then the Neo4j Server is a good choice, otherwise I would suggest using the embedded Neo4j.rb gem (but I’m a bit biased :-) )

    Installation of Neo4j.rb

    Before installing Neo4j make sure you have installed JRuby. The easiest way to install JRuby is using RVM. Neo4j.rb has bin split up into three gems: neo4j-core, neo4j-wrapper and neo4j. In this example we are only going to use node, relationships and properties which mean that it’s enought to install the neo4j-core gem.

    gem install neo4j-core

    You have now installed both the JRuby binding to the Java API and the embedded Neo4j database.

    Node, Properties and Relationships

    The basic building blocks of a graph database are node, properties relationships instead of tables and columns.

    Now, open an IRB session and lets create two nodes:

    All modification to the database must be wrapped in transactions. Neo4j supports ACID transactions. Both nodes and relationship can have properties. Neo4j is schema free. That means that you do not predefine types and relationships. You can at any time change a property from one type to a different type. Example of setting properties on a node:

    In a graph database, relationships are first class citizens. (By the way, I like this tweet:
    “With neo4j it is the relationships, not the data, that count”). Relationships in Neo4j have a type and a direction. Lets create an outgoing relationship from node alice to node bob of type friends with one property.

    We have now created the following graph:
    Relationship between two nodes

    To retrieve all outgoing nodes of type friends from node alice of depth one:
    alice.outgoing(:friends)

    To navigate the same relationship type but in the opposite direction:
    bob.incoming(:friends)

    There are many more methods for doing advanced traversals of relationships of any depth.

    An Example of a  Recommendation Algorithm

    By using the traversal methods it is easy to solve problems like finding recommendations. Here is a very simple implementation of finding recommendations for twitter users. It compares similarities in tags used in tweets. Let say we have the following users which have tweeted using the tags #neo4j, #jruby and #rails.

    Database Model for twitter users

    A Database Model for Twitter Users

    We now want to recommend Alice a new user to follow, based on similarities in hash tags. Here is a rather naive algorithm. Lets first find all users who have used the same tag as Alice. We do that by traversing outgoing used_tags and incoming used_tags. We stop at depth two and find users Ted, Bob and Carol. Since Alice already follows Ted we exclude him. For all these users we find which tags they have used and compare that with Alice’s tags. Since Alice and Bob both have used tags #neo4j and #rails and Carol and Alice only have #rails in common we then recommend Alice to follow user Bob.

    A complete Rails example using this algorithm can be found here. It’s often both easier and faster to use the declarative cypher query language –  a SQL-like language for graph matching.

    Here is an example of using the new Cypher DSL support in Neo4j.rb 2.0 using the same algorithm.

    You can play around with the generated cypher query here using the Neo4j REPL console. Another alternative is using the excellent pacer gem.

    Performance Benefits of Neo4j

    Neo4j is optimized for fast traversals of the graph by navigating in and outgoing relationships. The performance of Neo4j will remain the same independently of data size and the depth of traversals. This is a huge advantage over relational database (RDBMS) where the data size and the number of join operations have a strong negative impact on the performance. This is a benefit that both graph databases and object databases share.

    How Neo4j Maps to Ruby?

    In the example above we have shown how easy and efficient it is to use Neo4j to solve graph related problems. The next thing I want to show is how Neo4j.rb maps to the database. Neo4j.rb consists of a three layers API :

  • Layer 1 (neo4j-core) is used for interacting with nodes and relationships using the native Java (!) Neo4j::Node and Neo4j::Relationship classes.
  • Layer 2 (neo4j-wrapper) enables wrapping the native layer 1 objects using the Neo4j::NodeMixin and Neo4j::RelationshipMixin modules.
  • Layer 3 (neo4j) contains an implementation of Rails Active Model and a subset of the Active Record API using the Neo4j::Rails::Model and Neo4j::Rails::Relationship classes.
  • Layer 3 creates a transaction automatically. However, if you need better performance you can always access the unwrapped Neo4j::Node objects. Another reason to use layer 2 or 3 is to declare properties and relationships. This makes it more convenient as Neo4j.rb will generate accessor methods. Using layer 3 also means it works very well together with Rails 3 and many other Rails gems (see for example Rails 3 Scaffolds and Generators or old blog Neo4j 1.0.0 and Rails 3)

    Example:

    The p.friends method is the same as p.outgoing(:friends). Notice that it’s optional to declare types on properties and relationships.

    Example of Domain Modelling

    One of the advantages of an object database over RDBMS is the ability to persist composite objects and relationships. Using RDBMS for complex domains can cause an explosion of smaller tables which will degrade the performance because more tables have to be joined. The next example will show how to persist composite objects and data structures using Neo4j.rb. Lets say we want to create a model for the following domain:

    A student can enroll in one or more seminars.
    A seminar has students and a waiting list for students.

    There are many ways to implement this in Neo4j.rb. I’ve chosen to represent the enrollment as a relationship object. The Enrollment object connects the student node to the seminar node. The waiting list can be represented as a linked list of nodes. Here is an example of what the database would look like:

    Neo4j Database Model - Linked List

    Neo4j Database Model

    In the next code example we will create this database. Notice how easy it is to implement a data structure like a linked list in Neo4j. Implementing varying-sized structures for an RDBMS is often very hard.

    How about querying a data structure like the linked list above? One very interesting approach is to build a search tree of nodes connected to the nodes in the list. Let’s say you want to find nodes in the list based on a time interval. Instead of simply traversing each node in the list (Neo4j can traverse between 1-2 million hops between nodes per second) you do a search using the tree instead.

    Neo4j Database Model linked list and a tree

    Neo4j Database Model - Linked List and a Tree

    How about finding the seminar with number 6001? One way of doing that is by using the reference node which can always be found (Neo4j.ref_node). If you create a relationship between the reference node and the seminar node then you can find it. In fact, Neo4j.rb automatically creates class nodes which are connected to the reference node. All instances of a class are automatically linked to the class node. The all method on the Seminar class returns all instances, which can be used like this:


     Seminar.all.find{|seminar| seminar.number == 6001}

    Another way of finding the seminar node is to use the Lucene search engine which is included in Neo4j. All that is needed is to declare an index, for example:

    You can then use the the Lucene query syntax or a convenient finder method (similar to the Active Record API), for example:

    Object Database vs. Graph Database

    Does a graph database have the same disadvantages which are often found in object databases?

  • Schema Changes? – There is a clear but simple separation between the code base and the database model. It is therefore possible to update a class without updating all nodes that represent that class.
  • Language Dependence? Since Neo4j is running on the JVM it can be accessed from any JVM language. There is also a REST protocol available.
  • Ad-Hoc Queries? Yes, the cypher query language.
  • Lack of standard? It’s hard to create a standard for object databases since the object oriented database language has to be part of that standard. The Tinkerpop Blueprints is a good example of a minimal common API. This API is also used in the pacer Ruby gem.
  • Conclusions

    Are graph databases the right tool for you? In an awful lot of cases – yes. But a RDBMS might be a better choice if you don’t have complex data. A graph database becomes more attractive as the complexity of the data grows. Sometimes you don’t even have a choice if you need to implement things like a recommendation algorithm. I think graph database is a good fit for many different domains and believe it will become more popular as people discover that it’s not just for solving graph related problems or for social networks. There is still a lot more to explore on how best to use a graph database. You are all welcome not only as a user of Neo4j.rb but also as a contributor! Graph databases are the future and Neo4j.rb is one path to it.

    10 Comments

    1. Cyprian Kowalczyk

      I like Neo4j a lot, but for a Ruby development the inability to have a simultaneous access from both running rails app and console is a show stopper (not mentioning situations where there is more clients accessing the db, like background workers).
      The concept and all the stuff is great though, good work, pity Neo4j is designed like this.

    2. Andreas Ronge

      Yes, I agree. There is one github issue on this – https://github.com/andreasronge/neo4j/issues/99
      I will try to get some help on that one.

    3. Andreas Ronge

      There is now support for simultaneous access from both rails app and console – check https://github.com/andreasronge/neo4j/wiki/Neo4j%3A%3ARails-Config

    4. Toni

      Great job, sir!
      I’m planning to adopt your work in my next project, and wish this project will get better and better.
      Since I’m all new to Ruby/Rails, I wonder if there is anything I can do to help the project?

      But anyway, thank you for this great project!

    5. Stephen

      If one uses the JRuby buildpack on Heroku, is it possible to talk directly to the Neo4j add-on through the Neo4j.rb gem without going through the REST interface? Or does the add-on only provide REST access?

    6. Andreas Ronge

      No it is not possible, only REST access is provided to the neo4j server.

    7. Stephen

      Thanks, Andreas. What if you wanted a small, read-only Neo4j database. (Heroku permits “slug” sizes up to 200MB.) If you kept that out of the tmp directly, could you read that using a Neo4j buildpack on Heroku?

    8. This is absolute gineus! Do you pre-render the level into a BitmapData or do you redraw it every frame? I can imagine scenery that wiggles around a bit but I’m not sure what impact you’re seeing on the frame rate.

    9. Yes! Finally someone writes ahout pixelpost-2.

    Trackbacks for this post

    1. The Cypher Ruby DSL for the Neo4j Graph Database – Jayway

    Leave a Reply