Neo4j .NET Client over HTTP using REST and json

Here it is; a Proof of Concept of the world’s first Neo4j .NET Client. In other words: Here follows a discussion on how to create a client library for communicating with a graph database over REST.

UPDATE: There is now a live CodePlex project for the realization of this concept; A .NET Client Library for Neo4j over HTTP using REST and JSON; http://neo4jrestsharp.codeplex.com/

Intro

Neo Technologies have come out with a Neo4j REST API for their popular Neo4j graph database. Since I am employed at Jayway and a couple of colleagues working with Neo are too a little bird by the name of Peter Neubauer tweeted in my ear that there was a free T-Shirt involved in making a working sample for .NET that communicated with the Neo4j server. Not only was that a handsome offer – I wanted to check out what the Architecture of such a client library for a graph database would look like.

I’ve recently been involved in client library abstractions and good architecture with the RESTful interface to Windows Azure Storage. (More on Azure and storage on my blog here, here, here, here and here.) Enough with the side tracking (sorry)!

Said and done; We set out to code up a working sample for .NET. I hacked the client code. The Neo-guys cheered me on and offered very helpful assistance when I ran into trouble or was just plane stumped on how to proceed with the API. A very nice team effort, including a remote team debugging session, later and we have a small and running POC

Tweeting on this effort it is notable how a bunch of people answer back with an interest in this idea for potential applications in the .NET world. I’ve been approached both on twitter (@noopman) and on LinkedIn with questions on this.

This blog post is to submit our findings to the world and hope that someone will be inspired to get going with a real .NET client library for Neo4j.

Architecture

First let’s look at the architecture for creating a client library for a RESTful service. This is a more general discussion that leads to our client library POC below. If your main reason for reading is the actual sample; scroll down.

When doing a client library abstractions and domain thinking comes into play. What we want is for the forgotten users (developers) of your application to work with their normal domain objects in the business code and for them to use an API that makes sense to their domain while talking to a highly specific graph database backend store. This means you have to keep all the RESTfulness, HTTP and json away from the domain code. Let me show you a sketch and see if I can make this explanation a bit less wordy:

image

At the other side of the Cloud from our .NET code is a Java based graph database. It is friendly enough to expose a RESTful API so that we may speak to it.

Doing so we want to create a client library that is capable of a couple of things:1) Taking care of serializing objects to json (and deserializing them back again). 2) Knowing about HTTP requests and how to handle them. In our sample we employed the power of the HttpWebRequest class to handle things like HTTP Headers, Content Types, URIs etc. 3) Finally this library must understand to expose an interface to your code that you can use from a specialized Data Access Component.

To protect the business layer from knowing anything about the specifics of the Neo4j API you really should use a specialized Data Access Component. There is now law being broken if you don’t do that. However a good rule of thumb is to add abstraction layers in your system to shield your domain code in your business components from any specifics of an underlying storage structure. The details of serialization of objects to json and making requests over HTTP are not a natural fit to the business problem you are solving. The code that does this separation is code you write yourself in your application. Only you know how to adapt a foreign library and technology to your domain. The sole purpose of this code, in other words, is to abstract away from your business logic the underlying technologies used to persist and query data used in the application. This component translates between the two worlds of your business logic and the techniques used to store data in a graph database. Most of the specific technical details of storage are wrapped up in the Client Lib mind you but the Data Access Layer DAL knows about this problem domain. It understands that the persistence indeed is a REST based service over HTTP with all that this entails. This component in your code can handle synchonisity and take care of specific errors or faults that may occur when using the storage. All of this is done in a language that is fairly close to the API that Neo4j exposes. In the end this component translates into terms useful to the domain in which your business operates. Instead of saying .PostNode() in your code perhaps you instead expose a .StorePerson() action.

What we achieve is a good and sound (and classic) layering model: Business Logic – DAL – Client Lib.

There is one more observation to make in the picture and that is the matter of IoC – Inversion of Control. In order to keep your code testable, maintainable and untangled from the implementations at lower layers you should always employ the method of Inversion of Control – making your business code for instance not reference the DAL direct but instead use a common contract to do so. The Depencency Inversion Principle states briefly that a higher layer should never depend direct on a lower layer but instead depend on an abstraction. If this is true you are able to test your business logic without using the actual underlying storage. This is for instance key in a TDD – Test Driven Development development style.

Code that depends on abstractions can be tested very quickly and does not depend on specific data in a data store to be correct in order to test the intended behavior. The Business Logic uses a contract for the DAL and the DAL implements it. Then you use something called an IoC container (read on at Wikipedia in the link above) to create the DAL instances and hand them to the Business Logic employing the technique of D
I -Dependency Injection
. This separation enables you to modify, re-deploy or even exchange DAL components without having to touch your Business Layer at all. If separating the Business Logic from the DAL using contracts is important it is almost event more important to do so in the lower layer that in run time speaks direct to the data store! You never want to end up in the situation where in order to test your DAL code you have to have access to a specific data store with specific data in it. Not only is that an inconvenience; it also makes executing your tests run as slow as transferring data over the Internet compared to as fast as your CPU can execute code. From experience it is an obvious conclusion that tests that are slow to run are not in fact run. The conclusion from all of these IoC discussions can only be that it is imperative that your Client Library comes with an abstraction layer.

What did we do with the Architecture in our POC?

All of these architectural considerations are of course premature in a POC. This is why in our sample we did not really separate the DAL code from the Client Lib. However to prove the point of IoC and DI I did however separate out the Business Logic from the Data Access logic using a simple contract and an equally simple Factory Method Pattern. Instead of new:ing up a DAL component in my code I called a factory method to access an abstraction of the DAL.

Our POC sample

Our code sample creates a small node graph with a few relationships. Then it makes a traversal of the graph to find all who “knows” someone according to the relationships. Finally it deletes the graph again.

Here is the graph we create in our sample: http://wiki.neo4j.org/content/The_Matrix We only create names on nodes and types on relationships in our simple sample.

And here is also the code so that you may download and follow along with our explanation your self:

First, in accordance with architectural considerations (above) we get hold of our DAL using a factory method. We also create a new Person object and store him (in this case Thomas Anderson – Neo):

As you can see in our business code this is just standard .NET classes being instantiated and a “Neo” Person is stored in a very normal way. In the business logic code there is no real details on how this storage happens. There is of course a “node” concept and you can consider for your self if that should be hidden away from business code too.

Since this is a POC the implementation of .StoreNode(neo) is very very trivial. If this were a real client library we would have included things like a library to transform (serialize) the neo-Person node into a json string. Now we just do it statically:

The interesting part here is the existence of a NodeProtocol implementation. I’ve found it useful working with a remote API like Windows Azure Storage and also with this Neo4j REST API to implement methods in C# code that correlate very closely to the API methods available. In our case we have a PostNode operation on the REST API and consequently we have a .PostNode() method in the C# implementation of that HTTP method call.

I won’t bore you with the exact details of all of the helper classes. To dig in to those it is better if you just download the code and look at the details your self.

As you can see we create a HttpWebRequest, add some headers to it (in the CreateWebRequest method) specifying things like request.Accept = "application/json"; meaning we accept a json formatted response. Also we specify that the body we send out is formatted as json with request.ContentType = "application/json";. Finally we make the call and return the id of the node we just created.

Again this is a solution bordering on triviality but still it is to the point of making a REST request and receiving a response. Also there is no code to handle any errors present here and the request is done synchronously.

What does the actual HTTP REST and json request and response look like for this simple sample?

Request:

Response (note that the added line breaks in the response body were added by me):

As you can see the Neo4j REST API is self explanatory. Creating a node gives you back a response with all the things you can now do with this new node over the API. Since this quite a lot of data when the only thing that really matters is in the header (Location: http://try.neo4j.org:9999/node/1) I’ve already given feedback to the Neo team that I’d like a simple Uri switch (like APIDocumentation=0) to shut off the detailed response. But in fact I do like that you get so many Uris back that you may use direct. Depending on your needs this is either good or to verbose a payload to get back in your response. As you can see the node we created got the Id set to 1.

Now we proceed to create all the other nodes and in a very similar fashion the relationships between nodes. Here is the relationship that says that “Neo knows Trinity” (who has a node id of 2). This is a POST operation to /node/1/relationships:

Once the node graph for the Matrix is created we proceed to make a traversal of the graph. We want to find anyone who knows someone throughout the graph.

The request looks like this in REST (after a similar path through the code as above with a call from business logic to DAL and through to the client lib that makes the actual call using REST):

Again we simplified to a “max depth” of 10 relationships deep into the graph which works fine for our sample.

The result is a list of all the nodes that match. Because of the self explanatory API mentioned above the result also gives you actions to perform on each node that comes back in the response thus making the response body quite long. Rather than pasting the whole thing here into the blog post you can try it out your self (by downloading the application) and you will get a result exactly like this from the app:

image

Neo knows Trinity and Morpheus, who knows Cypher, who in turn knows Agent Smith.

Difficulties

We had only two real challenges in coding this POC.

The first one was an issue with streams and the optional so called Byte Order Mark (BOM) that you can put at the head of a stream. I’ve posted on this here: Would you like a Byte Order Mark to go with that?.

The other challenge was that I wrote a small app for Windows to show our results. But the guys on the Neo team are certainly not Windows users. The wanted to show the code on Mono. Just to be sure of compatibility I scaled back to a good old Windows Forms demo app in .NET Framework 3.0. This worked great on Mono which was our intent. But the drawback was that the library I’d found to handle object serialization didn’t respond well to that old version of the framework. That’s why I replaced my lib with static string creations (as you can see in the implementation of the .StoreNode() method above).

What have we learned from this?

First of all creating a client library for .NET to speak REST and json over HTTP to Neo4j is very doable; quite easy and straight forward.

Second (and that’s just a word of warning); keep your layers separated. This goes for all applications I guess but if you start to put code to create instances of web requests with web methods and content types and such in your business code you will end up in a very tangled and messy situation very fast which will be a pain and a high cost to maintain.

Finally we have considered looking into using WCF data services (and OData – the Open Data Protocol) and also possibly Linq – Language Integrated Query on the client side to make implementations in the library even more powerful and useful. And we do need a object to json serializer library.

Oh and here is another post on using the Neo4j REST API by the team itself: http://blog.neo4j.org/2010/04/neo4j-rest-server-part1-get-it-going.html

That’s it!

We hope this is inspirational enough for you to get going with your own samples or even client libraries in .NET to speak to Neo4j. If you do; please get back in touch with us!

Cheers,

M.

6 Comments

  1. John C

    Just wanted to let you know that I needed a full .net wrapper around the neo4j rest services, so I’m using your code as a starting point and filling it out. I’ll post it on CodePlex when its usable…should be soon. I’ll give you full props for your initial work

  2. Tom M.

    @John C
    Where is it? ;-) I need it :-)

  3. Happy deleting!

  4. Thank you for this. I am building an Azure based Neo4J solution, will be using Autofac on this, will see how it goes, this is a good kick start article, thanks!

  5. NEX-5N

    Hello there. Találtam a honlapon keresztül, a Google pedig keres egy hasonló ügyben, a weboldal van itt fel. Úgy tűnik, jó. Én könyvjelzővel azt a Google Bookmarks, hogy jöjjön vissza később.

  6. I visit everyday ѕome sites and blogs to read content, exϲept this webpage օffers quality
    based posts.

Leave a Reply