The Cypher Ruby DSL for the Neo4j Graph Database

This week I released the neo4j-cypher ruby gem version 1.0.0. Cypher queries can now be written in Ruby instead of as a long string. The DSL uses common Ruby conventions and a neo4j.rb influenced API. The neo4j-cypher gem contains adaptors for both the neo4j.rb and neography gems. The DSL supports the cypher language implemented in the 1.8 release of neo4j which just have been released.

Introduction to Neo4j and Graph Databases

Using a graph database means that you store your data in a graph (instead of in tables or documents).
This means that your data is organized as nodes having relationships to other nodes where both nodes and relationship have properties. What’s really cool with this is that you can keep the structure that you probably already have drawn on the white board.

A relational database is great when your data is well defined and don’t have too complex relationships. But designing tables and writing queries for simple things like queues, hierarchies or tags is rather complicated, at least when compared to using a graph database. Also, another huge advantage for graph database is that there is no slow SQL JOIN operation involved navigating relationships. The graph database is optimized for very fast retrieval of relationship of any depth. This means that you can implement algorithms for things like recommendations, fraud detection, routing etc. which is impossible using a RDBMs.

The Neo4j graph database is schema free. That means that your data describes itself. One way of describing data is using different types of relationships. Example of relationship types between nodes can be knows, employed by, has comments etc. The type is typically the same as the text you already have written on the whiteboard above the arrows between different entities. Another way to add meaning to your data is using properties on nodes and relationships. For example, when using the neo4j.rb gem each single node has a property _classname for the Ruby class it represent.

This means that every single node describes itself by its relationships and properties !

Query Using the Traversal API

Ok, but how can I query a complex graph structure where each node can have any number of properties and different types of relationships ? The answer is that you simply traverse the graph by navigating incoming or outgoing relationships of different types.

By using the JRuby neo4j-core gem you can do it like this:

This will return all nodes by navigating outgoing relationships of type friends from node me. But how can I create a query where you don’t have a start node ? For example, how can I find people with name ‘andreas’ ?

Here are some ways of doing it:

Example of using a lucene query from neo4j.rb: Person.find_by_name('andreas') .
Notice, if you don’t declare an lucene index in neo4j.rb it will instead try to find it using cypher !

Query using Cypher

The Neo4j cypher language is great when you don’t use a JVM based language or you don’t want to
program at all (e.g. doing ad-hoc reporting). Another advantages is you focus more on what should be retrieved instead of how it should be retrieved.

So, what does the cypher query language look like ? Here is an example of finding people I know:

START me=node(42) MATCH (me)-[:knows]->(friend) RETURN friend

There is even a online console where you can play around and visualize the graph, check this.

A query usually consists of path patterns and filters. A path describes which types and directions of relationships should be navigated. Example, to match paths of outgoing relationships type friends between node a and node b:

a-[:friends]->b

In the Ruby cypher DSL this can be written in two ways:

or using neo4j.rb style syntax:

The diffrence between those two last examples is what is returned (the path or the end node).
To learn more about pattern matching read this.

Why use a DSL for Cypher ?

The cypher query language looks really great, so why should I use a DSL ? Here are some reasons:

  • Better readablitiy for (Ruby) programmers
  • Familiar syntax since it’s Ruby and has similar API to the neo4j-core gem
  • No need to inline ruby objects in strings, e.g. "START me=#{node.neo_id} ..."
  • It works great together with both neography and the neo4j gem

The Ruby Cypher DSL

The neo4j-cypher gem works for any Ruby implementation unlike the neo4j gem which only works for JRuby. The API consists of one method – Neo4j::Cypher.query which takes a block and converts it to a string.

To install and test the DSL:

rvm use 1.9.3 (for example)
gem install neo4j-cypher
irb
require 'neo4j-cypher'
Neo4j::Cypher.query { node(42) > :knows > :people}.to_s
=> "START v1=node(42) MATCH v2 = (v1)-[:knows]->(people) RETURN v2"

See below how to use it from neography or neo4j.rb to also execute the query.

Return Values

The DSL works like Ruby – the last value evaluated is the return value:

Example from the IRB console:

1.9.3p194 :002 > Neo4j::Cypher.query{ node(42) }.to_s
=> "START v1=node(42) RETURN v1"

You can specify what should be returned by using the ret method. This method is available both globally and on objects like nodes and paths. Use this to express that something else should be returned instead of the last evaluated expression.

Example:

node(42).ret; node(43)
=> "START v1=node(42),v2=node(43) RETURN v1"

Example, of using the global ret method:

ret node(42), node(43)
=> "START v1=node(42),v2=node(43) RETURN v1,v2"

To learn more about return values, read this.

Automatic Creation of Identifiers

When you reference parts of the pattern, you do so by naming them. The names you give the different parts are called identifiers. With the DSL you can avoid specifying identifiers by either using method chaining or let the DSL automatically create it for you when needed.

For example: the DSL node(42) creates an identifier v1 which is also returned, as shown in the example above. Sometimes you need to specify the name of the identifier in the DSL. This is typically needed when you fetch the result from the query.

Example:

node(42).as(:foo); ret node(:foo)
=> "START foo=node(42) RETURN foo"

The as method is available on many of the DSL’s objects, such on paths and functions.

Filters

You can filter the query by using the Ruby operators == !=, < > ! =~.
Example, find which of the nodes with id 1,2 or 3 have a property with name equal to ‘andreas’:

node(1,2,3)[:name] == 'andreas'

This corresponds to the cypher string "START v1=node(1,2,3) WHERE v1.name = "andreas" RETURN v1"

Regular Expression uses normal Ruby Syntax:
Example: node(1,2,3)[:name] =~ /andreas/

Filters can also be expressed in a block which allows you to write the query in one line.
For example, find my friends I got year 1994 by checking the since property on the relationships.

node(42).outgoing(rel(:friends).where{|r| r[:since] == 1994})

To learn more about filters, read this

Cypher Functions

Cypher has a number of functions for things like returning the length of a path or checking if a condition is true for a collection.

Example: the cypher extract function:
(node(3) >> node(4) >> node(1)).nodes.extract(&:age)

This generates the following cypher string:
START v2=node(3),v3=node(4),v4=node(1) MATCH v1 = (v2)-->(v3)-->(v4) RETURN extract(x in nodes(v1) : x.age)

In the example above the property age will be returned for all nodes in the path. Notice that the cypher extract function works like the Ruby map method.

To learn more about cypher functions read this

Aggregation

One example of an cypher aggregation is the count function.
Example: count the number of end nodes from node 42.

node(42).outgoing.count

To learn more about cypher aggregations, read this.

Advanced Examples

So far the examples has been trivial and rather unintresting. Here comes one example from the neo4j data modelling examples :

To find people similar to me based on the taggings of their favorited items, one approach could be:

  • Determine the tags associated with what I favorite.
  • What else is tagged with those tags?
  • Who favorites items tagged with the same tags?
  • Sort the result by how many of the same things these people like.

Here is the original query:

START me=node:node_auto_index(name = "Joe")
MATCH me-[:favorite]->myFavorites-[:tagged]->tag<-[:tagged]-theirFavorites<-[:favorite]-people WHERE NOT(me=people) RETURN people.name as name, count(*) as similar_favs ORDER BY similar_favs DESC

Here is the query using the Ruby DSL:

This can also be written in one line (maybe not a good idea ...).

The query can be tested here. For more examples, read this

Using neo4j-cypher gem

You probably don't want to use the Neo4j::Cypher.query method since you also want to execute the query. The neo4j-cypher gem comes included with adaptors for both the neo4j-core and neography gem.

Using neo4j-cypher from neography

Example:

require 'neo4j-cypher/neography'
...
@neo.execute_cypher(a_neography_node) { |me| me > :knows > :people }['data']

See this for complete example.

Using neo4-cypher from neo4j.rb

To use the cypher DSL from neo4j.rb or neo4j-core gem, use the Neo4j.query method:

Example:

Neo4j.query(User.find_by_name('andreas')){|me| me > :knows > :people }

How to fetch the result from a query

As shown in the examples above the DSL will generate identifiers for you.
For example the DSL node(42) > :knows > :people will return an identifier named :v2 for the path.
You need to know the identifier when you fetch the result.

Example of printing all the properties of the first path (node,relationship,node,relationship...) found in the match.

Neo4j.query{node(1) > :knows > :people }.first[:v2].each {|x| puts x.props}
{"_classname"=>"Person"}
{"_neo_id"=>5, "_classname"=>"Neo4j::Rails::Relationship"}
{"_neo_id"=>3, "name"=>"kalle", "_classname"=>"Person"}

By using the as method to you can tell which identifier the DSL should use, example:

(node(42) > :knows > :people).as(:my_identifier)

Summary

The cypher DSL allows you to write queries in many different ways. If you think it's more readable you can write the query in many lines. Remember that you have full access to the Ruby language, which you can use to make your query more DRY (e.g using Ruby variables and Ruby methods). There is no need to declare start nodes first or specify clauses in a certain order. I believe that this flexibility gives the programmer more power to create really readable cypher queries.

There are much more things to explore using the DSL like how to update the graph or chaining queries together, see this. The documentation for the Ruby cypher DSL can be found here and the github project here.

1 Comment

  1. You realize therefore significantly when it comes to this topic, produced me individually imagine it from numerous numerous angles. Its like men and women are not involved unless it is something to do with Lady gaga! Your personal stuffs outstanding. All the time take care of it up!

Leave a Reply