<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jayway Team Blog &#187; persistence</title>
	<atom:link href="http://blog.jayway.com/tag/persistence/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jayway.com</link>
	<description>Sharing Experience</description>
	<lastBuildDate>Sat, 11 Feb 2012 10:33:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Qi4j and domain model persistence</title>
		<link>http://blog.jayway.com/2009/09/24/qi4j-and-domain-model-persistence/</link>
		<comments>http://blog.jayway.com/2009/09/24/qi4j-and-domain-model-persistence/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 04:06:37 +0000</pubDate>
		<dc:creator>Rickard Öberg</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[domain model]]></category>
		<category><![CDATA[frameworks]]></category>
		<category><![CDATA[persistence]]></category>
		<category><![CDATA[qi4j]]></category>

		<guid isPermaLink="false">http://blog.jayway.com/?p=1975</guid>
		<description><![CDATA[The JavaZone 2009 conference is over, and although I couldn't make it this year due to our project, StreamFlow, going into production soon, the Qi was definitely flowing there. I've been watching the videos from the conference (available <a href="http://tcs.java.no/tcs/">here</a>, and many kudos for making them available so soon), and there's a number of presentations which either explicitly or implicitly relates to Qi4j. It seems that so many of the issues that Qi4j has been designed to deal with are things that are becoming known and annoying to a majority of developers. So, I'll try to outline below just how the topics covered at JavaZone relate to Qi4j, and how Qi4j can help you deal with those problems.]]></description>
			<content:encoded><![CDATA[<p><em>The following entry was originally posted on Rickard's blog. Jayway is a founding company of the Qi4j project.<br />
</em></p>
<p>The JavaZone 2009 conference is over, and although I couldn't make it this year due to our project, StreamFlow, going into production soon, the Qi was definitely flowing there. I've been watching the videos from the conference (available <a href="http://tcs.java.no/tcs/">here</a>, and many kudos for making them available so soon), and there's a number of presentations which either explicitly or implicitly relates to Qi4j. It seems that so many of the issues that Qi4j has been designed to deal with are things that are becoming known and annoying to a majority of developers. So, I'll try to outline below just how the topics covered at JavaZone relate to Qi4j, and how Qi4j can help you deal with those problems.</p>
<h3>Persisting domain models</h3>
<p>I'll start with Randy Staffords presentation "Patterns for persisting large and rich domain models", in which he describes the criteria for the topic, and then the top ten things to be aware of.  I basically agree with the criteria, and will go directly into the top ten list and how it relates to what we are doing.</p>
<h3>Ripple loading</h3>
<p>The first problem that Randy delves into is the one of "ripple loading", whereby a single request to an application might result in many requests to the underlying database. If "many" is on the order of hundreds or thousands, then that is going to negatively impact the time it takes to respond to the request.</p>
<p>There are many ways in which Qi4j help here. First of all, our basic philosophy for EntityStores is that their primary purpose is to efficiently store and load the domain model for the purpose of handling application requests, and for this reason we have focused on implementing EntityStores that are based on key-value stores, where lookups of entities are just one map-call away, and where "mapping" is pretty much non-existant since the value is the serialized state of the Entity. We have chosen to use JSON as the default serialization format, so if you want to "look at the data" without using Qi4j, that is possible.</p>
<p>In any case, the first way in which this helps is by simply bringing down the cost of all those requests. If the key-value store is in-process rather than across-network as a RDBMS might typically be, then the overhead for making a call is radically reduced. And since an entire entity can be loaded by one map-lookup the number of requests is reduced. Both of these factors help deal with Randys axiom "IPC affects app response time".</p>
<p>In my current project, StreamFlow, we are going one step further. We are implementing CommandQuerySeparation as described by Greg Young in the DDD-world. What this means is that all  changes generate events, and those events can be listened to and can cause further events. This infrastructure can, for example, be used to drive a denormalization of the database, specifically, to generate the specific views that I know clients will be asking for. If a particular field in the domain model is changed, that causes an event, and if that field is being shown in the UI in some specific view, I can therefore consume that event and generate a new event that creates eagerly that view as a DTO, and store it. When the client makes a request for the view I am then guaranteed to only have to make one request to the underlying datastore to get that view. Reading becomes superefficient, and makes the read/write ratio in the database more even. For the query engine, which we have separated from domain persistence as there is generally no query support in key-value stores, it is also relieved as standard client views no longer uses the query engine. Instead the query engine can be used for ad hoc querying, and only that. This improves the response time of the query engine. This strategy is a tradeoff however: there is more space on disk required as we are eagerly generating views, and there might be a few of them, and the code to keep the views updated might be more tricky than the queries needed to create them on the fly. So this is a tradeoff to keep in mind. But, at least both options are available to us.</p>
<p>Since EntityStores are always invoked in UnitOfWorks, which are specified using Usecases, you can also attach metainfo about what the Usecase will be loading, which can be used by EntityStore implementations to do eagerloading. We don't have any examples of this right now, as we have focused on key-value stores where eagerloading is not really applicable, but the infrastructure is there. For any kind of network-based EntityStores this would be useful to have.<br />
Scalability</p>
<p>The second problem Randy discusses is that of scalability. The issues he describes mostly have to do with having too many domain objects on the heap. In Qi4j, domain objects, i.e. entities, deal with this in many ways. First of all, an entity can be composed of many mixins, each dealing with a separate use-case. The mixins are lazy-loaded, so if a particular usecase only deals with 1 out of 10 mixins, then only that mixin will be instantiated. Second, references between entities are always done through proxies, so there's no way to have hard references to entities that will fill up your heap. Third, entity references are only valid within a UoW, so once the UoW is done those references can be GC'ed, although the persistence implementation can choose to cache the state for use with later UoW's. This also properly deals with Randys third problem, transaction isolation.</p>
<p>Now, onto the ten "pat-let"s that Randy describes:</p>
<h3>1 Application Transaction</h3>
<p>Application Transactions are used to deal with the fact that some interactions with users are so long that they are not appropriate to map to database transactions. Thinktime can be very long, so it just wouldn't work to use transactions. The proper way to handle this, Randy describes, is to use the UnitOfWork concept to build up a set of changes, and then apply them either along the way or at the end of the UoW.</p>
<p>In Qi4j the ONLY way to deal with entities is through a UnitOfWork, whether you use it for "short" or "long" transactions. A UoW will contain all the entities that have been referenced within that UoW, with the state that has been read, and it can be paused and resumed (i.e. removed from its association with the current thread must like a regular transaction) which is used to handle "think time". A Qi4j UoW can be applied(), which will send all changes down to the EntityStore will still keeping the UoW alive, and at the end of the UoW you can choose to complete() or discard() it, depending on what you want to do. This handles the Application Transaction "pat-let" as Randy describes it.</p>
<h3>2 Editing copy</h3>
<p>Different application transactions need to have their own copies of the entities, or else they will be doing "dirty reads" where they are seeing changes being made by other transactions.</p>
<p>In Qi4j, simply because all entities are accessed through a UoW, and all those references will be load to that UoW, this is handled properly. No two application transactions can or may share references to entities. The underlying EntityStore might provide a copy-on-write holder of the state, for efficiency reasons, but this will still enforce that the semantics is such that the UoW's will be logically separated.</p>
<p>The color commentary on this is interesting, in that it differentiates between copy-on-read and copy-on-write. As it is, all EntityStores in Qi4j use copy-on-read, but the SPI is such that it allows copy-on-write. From Randys description of the issues, especially with regard to scalability, we should probably make a default implementation that does copy-on-write, so that all stores based on the standard MapEntityStoreMixin implementation gets that for free.</p>
<h3>3 Factory-Assigned identifier</h3>
<p>Entities in the domain model should have identifiers assigned by factories, probably using a UUID strategy.</p>
<p>Qi4j is designed from the ground up to use this approach, so we have an IdentityGenerator SPI that works with the UoW and EntityStore SPI to accomplish what Randy describes. Nothing much more to say on that, other than "I agree" with what Randy says.</p>
<p>There is one thing to consider though when combining DDD with the CQS patterns a la Greg Young and Udi Dahan. If you use that, then ALL Entities are created by other Entities, and all such creations generate events. The way to get around the question "so how do I create the first entity?" is to have your system either automatically create one on startup, with a known id, or have entities with known id's created on the fly. Udi has some content about this on his blog. The main thing I have realized though, is that if your domain model is all about consuming commands and generating events, then the id assignment for new entities has to happen in the command consumption, and not the event consumption, and the creation of the entity happens in the event consumption. The reason is that if you do a replay of events, then the same state must be created, and that is dependent on the same id's for new entities being used.</p>
<p>Simple example:<br />
// Command to create Foo<br />
void createFoo()<br />
{<br />
  // Validate command is ok<br />
  // Generate id for new Foo<br />
  String id = ...;<br />
  fooCreated(id);<br />
}</p>
<p>void fooCreated(String id)<br />
{<br />
  ... create Foo entity in UoW using id ...<br />
}</p>
<p>If you are using CQS and Event Sourcing, so that the commands are generating events, then in order to allow events (such as the call to fooCreated) to be replayed in the domain model, EVERYTHING that the event needs to determine the new state has to be included in the parameters, and this includes the id of the new entity. Otherwise, when you replay the event later it might(/will) cause another id to be generated, and then the whole thing falls apart. In StreamFlow we are doing something similar to the above, which makes command/event processing pretty straightforward and not so code-intensive. The events, such as "fooCreated(id)" are later used to generate reports, update the UI, create replicas on other cluster nodes, external integration, etc. Pretty darn cool actually.</p>
<h3>4. Protected persistence variation</h3>
<p>You may need to change the persistence implementation in your project later on, so use some kind of abstraction to allow this to happen.</p>
<p>This, I think, is one of the strengths of the Entity model in Qi4j. We have an EntityStore SPI that can be implemented by key-value stores, document stores, relational stores, graph stores, etc., without changing the domain code at all. One of the key reasons we can do this, I think, is because we separated out indexing and querying from the EntityStore SPI responsibilities. Because of this there are very few things that the EntityStore needs to be able to do (create/load/store), and therefore allows many interesting implementations. Currently we have in-memory, JDBM, Preferences API (for service configuration storage), Amazon S3, and various sandbox implementations for Coherence, JGroups, JavaSpaces, BerkeleyDB. I think that for all of these new and hop key-value stores, but which don't have any application API's to make them easily usable, they should look at simply leveraging the Qi4j EntityStore SPI which gives them immediately an appliation API that people can use. Our in-memory version also makes it easy for people to run tests of their domain models without having to have the entire infrastructure available, which is crucial. As one of the main trends today seems to be to explore these kinds of non-relational storage models, I would recommend that you look at using Qi4j as your application API.</p>
<h3>5. Distributed caching</h3>
<p>Applications often need to use distributed caches for performance, and these need to be consistent.</p>
<p>To a large degree this would be an implementation detail in EntityStores in Qi4j, as we don't do any caching at all in the framework (specifically to allow this to be an implementation detail in the EntityStores). The only thing to say here is that if you are using a local key-value store and then events to keep replicas up-to-date, you don't really need a caching solution, since the key-value store is basically doing that for you. So, whether this is necessary at all is probably dependent on whether you are using a centralized database or not as the persistence solution.</p>
<h3>6. Disconnected domain model</h3>
<p>If domain model entities refer to each other using pointers, then serialization of graphs become an issue. The solution is to use reference by identifiers instead.</p>
<p>In Qi4j, references between entities are done on two levels: when an entity is loaded in the UoW it will be a strongly typed Java object, and any references to other entities will also be regular Java objects, with interfaces. However, those references are to Entity Composites, which internally maintain the identifier of the entity. So, whenever a reference in one entity is set to another entity, at the end of the UoW the EntityStore SPI will store the reference using the identifier. So, on the EntityStore SPI level there is only properties and identifier references in entities. This deals properly with the problem that Randy describes.</p>
<p>One of the main problems that we still have, and which Randy describes, is that with the key-value stores, where all data from an entity is stored as a serialized JSON string, if you want to only add/remove an entity to a collection that collection is going to be loaded/stored as a whole. The actual references will not be deserialized during a "collection add" scenario, but the state will be loaded/stored. The only way to really get around that would be to implement the EntityStore SPI using the graph database Neo4j, where massive associations would be handled natively by the Neo4j API and so collection add/remove could be done as a O(1) operation. We have discussed a proper implementation using Neo4j, but have not finished this yet.</p>
<p>If you go outside the plain Qi4j API, and use events as I'm doing in the StreamFlow project, then solving this would be easy actually. On a collection change you would generate an event "fooAdded(ref)", but the implementation wouldn't actually do anything. The reference would however logically have been added to the collection. So, an event consumer could then asynchronously get this event and update whatever view needs the actual list of entities for clients to read. Problem solved.</p>
<h3>7. Most frequently used cache contents</h3>
<p>Cache stuff that you use a lot.</p>
<p>Again, Qi4j doesn't cache anything per se, so the only zen comment I could make would be "you don't need to cache if you already have a cache", which would be true if you are using a local key-value EntityStore implementation, which has the access performance of a cache, and yet also serves as the actual store so it's not technically a cache.</p>
<h3>8. Cache warming</h3>
<p>Use cache warming so that first users don't get slow responses.</p>
<p>For Qi4j, see the previous comment.</p>
<h3>9. Query design</h3>
<p>The way your application queries for the data it needs, especially with a large and rich domain model, is a first-order determinant of its performance.</p>
<p>This is something that I spent a long time working on in the product that the Qi4j ideas are based on, the SiteVision CMS/portal product. In there, the domain model was running in the client, and so when the UI in the Swing applet client needed to load data it went to the server to get it, pretty much using the key/value ideas. However, in the first version this was pretty much using lazy-loading, and therefore became superslow as it caused a massive amount of roundtrips to display a simple tree (one HTTP request per mixin per entity). The way to get around this was to let each UI define a usecase specific loading policy, which was then used to optimize and eagerload the state. Instead of thousands of roundtrips there would be one or a couple of roundtrips, each getting a specific object but also the related objects that we knew would be used within the usecase.</p>
<p>In Qi4j, and in StreamFlow which is the first production example of using Qi4j, contrary to what Randy said I have not yet implemented this, although it is prepared for it. The reason I haven't implemented it is because I have chosen to not let the domain model execute on the client, for various reasons, mostly related to security. Instead, the Swing UI (which I still think is a GREAT way to do rich clients) uses a REST API to access DTO's which are optimized for the view that the client needs. In a sense we have specialized the server API to embody these usecase specific loading policies. The problem is then pushed to the REST resource on the server, which in turn could have the same issue when communicating with the EntityStore. But there we have cheated, and am using a low-latency key-value store, so even if we make lots of loads, the cost is manageable. We are also using events to preemptively generate the DTO's on the server, further reducing this cost.</p>
<p>All that being said, Qi4j *is* prepared for implementation of usecase-optimal queries, as a UoW can be associated with a Usecase, and a Usecase can specify metadata such as query policies. I just haven't needed it myself for the moment, for the above reasons. If anyone needs it there is a clear path for how and where to do it.</p>
<p>As for the color commentary, the Query API in Qi4j is such that we allow the domain model to be used to express queries, which is one of the patterns Randy recommends. For those queries that cannot be expressed using the domain model, yes, we do have a feature explicitly for named queries. In neither case will you have SQL in your domain model, which is the key idea. The Query API also has an implementation that can run against an Iterable, so if you want to run a query against an in-memory collection, such as a cache, then that is possible.</p>
<h3>10. Balancing by cache affinity</h3>
<p>Do loadbalancing on application servers so that the caches on each instance is likely to have the data needed.</p>
<p>Again, this is not really something that Qi4j as a framework can do anything about. This is more related to whatever framework you are using on top to do loadbalancing. For StreamFlow we are using Restlet for the REST API implementation, and it has support for these kinds of things, where we can plug in algorithms for selecting the node to use.</p>
<h3>Summary</h3>
<p>To summarize, I think Qi4j deals with all the problems and pat-lets that Randy describes, or in some cases at least allow other frameworks to deal with them without Qi4j getting in the way. I also agree with Randys initial statement that "persistence is where it's at". It is impossible to not talk about persistence when dealing with DDD, and yet at the same time we want these persistence issues to stay out of our domain models as much as possible. This is one of the key benefits of the Qi4j approach I think: instead of starting with a persistence technology and providing an abstraction over that, we instead started with our domain models and decided how we wanted to express them in code. That then drove the EntityStore SPI, which in turn drove the implementations of our persistence extensions. This way we are allowing the domain model to "pull" what it needs from persistence instead of having the persistence "push" its view of the world onto the domain model. This is a crucial point. "pull" - good. "push" - bad. For those of you who have studied the Toyota Production System or Lean, or even better the Systems Thinking variation, this will probably seem familiar.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jayway.com/2009/09/24/qi4j-and-domain-model-persistence/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Neo – a netbase</title>
		<link>http://blog.jayway.com/2007/02/01/neo-%e2%80%93-a-netbase/</link>
		<comments>http://blog.jayway.com/2007/02/01/neo-%e2%80%93-a-netbase/#comments</comments>
		<pubDate>Thu, 01 Feb 2007 10:32:01 +0000</pubDate>
		<dc:creator>Björn Granvik</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[frameworks]]></category>
		<category><![CDATA[graph db]]></category>
		<category><![CDATA[innovation]]></category>
		<category><![CDATA[persistence]]></category>

		<guid isPermaLink="false">http://blog.jayway.com/?p=3617</guid>
		<description><![CDATA[Neo is a network-oriented database for semi-structured information. Too complicated, let us try again. Neo handles data in networks – nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is difﬁ cult to handle in static tables. It could mean we can go agile all the way into [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Neo is a network-oriented database for semi-structured information.<br />
Too complicated, let us try again. Neo handles data in networks<br />
– nodes, relationships and properties – instead of tables. This means entirely new solutions for data that is difﬁ cult to handle in static tables. It could mean we can go agile all the way into the persistence layer. </strong></p>
<p>The relational database represents one of the most important developments in the<br />
history of computer science. Upon its arrival some 30 years ago, it revolutionized the<br />
way the industry views data management and today it is practically ubiquitous. </p>
<p>In fact, it is so taken for granted that we, as an industry, have stopped thinking. Could<br />
there be a better way to represent and store our data? In some cases the answer is<br />
– yes, absolutely. The relational database is showing its age. Some telltale signs: </p>
<ul>
<li> The mismatch between relational data and object oriented programming.
<li> Schema evolution – updating your data model when the domain model changes<br />
– is just so much manual labor.
<li> Semi-structured or network-oriented data is difﬁ cult to handle.
</ul>
<h2>The Neo Database</h2>
<p>Neo is a database that is built with a different philosophy. Where the relational da-<br />
tabase squeezes all data into static tables, Neo uses a ﬂ exible and dynamic network<br />
model. This model allows data to evolve more naturally as business requirements<br />
change. There’s no need for “alter table...” on your production databases after you<br />
introduce a new version of the business layer and no need to rewire and migrate<br />
your O/R mapping conﬁ gurations. The network will evolve along with your busi-<br />
ness logic. This spells agility.<br />
Neo is an embedded persistence engine, which means it’s a small, lightweight and<br />
non-intrusive Java library that is easy to include in your development environment.<br />
It has been designed for performance and scalability and has been proven to handle<br />
large networks of data (100+ millions of nodes, relationships and properties).<br />
Neo is a newly founded open source project, but the software is robust. It has<br />
been in commercial production in a highly demanding 24/7 environment for al-<br />
most four years and has full support for enterprise-grade features such as distributed<br />
ACID transactions, conﬁ gurable isolation levels and full transaction recovery.<br />
But so much for sweet talk, let’s cut to some code! </p>
<h2>Model and Code</h2>
<h3>Representation</h3>
<p>In the Neo model, everything is represented by nodes, relationships and properties.<br />
A relationship connects two nodes and has a well-deﬁ ned, mandatory type. Prop-<br />
erties are key-value pairs that are attached to both nodes and relationships. When<br />
you combine nodes, relationships between them and properties on both nodes and<br />
relationships they form a node space – a coherent network representing your busi-<br />
ness domain data.<br />
This may sound fancy, but it’s all very intuitive. Here is how a simple social network<br />
might be modeled: </p>
<p><img src="http://blog.jayway.com/wordpress/wp-content/uploads/2009/12/Picture-89.png" alt="Figure 1" title="Figure 1" width="412" height="211" class="alignnone size-full wp-image-3618" /></p>
<p><strong>Figure 1:</strong> An example of a social network from a somewhat famous movie. Note the different type on the relation between Agent Smith and his creator The Architect. </p>
<p>Note how all nodes have integer identiﬁ ers and how all relationships have a type<br />
(KNOWS or CODED_BY). In this example, all nodes have a “name” property. But<br />
some nodes have other properties, for example, an “age” property (node 1) or a<br />
“last name” property (node 3). There’s no overall schema that forces all nodes to<br />
look the same. This allows Neo to capture so-called semi-structured information:<br />
information that has a small amount of mandatory attributes but many optional at-<br />
tributes. Furthermore, the relationships have properties as well. In this example, all<br />
relationships have an “age” property to describe how long two people have known<br />
each other and some relationships have a “disclosure” property to describe whether<br />
the acquaintance is secret. </p>
<p>Working with nodes and relationships is easy. The basic operations are as follows: </p>
<p><img src="http://blog.jayway.com/wordpress/wp-content/uploads/2009/12/Picture-90.png" alt="Figure 2" title="Figure 2" width="387" height="114" class="alignnone size-full wp-image-3619" /></p>
<p>This is an intuitive representation of a network and probably similar to many other<br />
implementations that want to represent a network of data in an object-oriented<br />
language.<br />
It’s worth noting, however, that relationships in this model are full-blown objects<br />
and not just implicit associations between nodes. If you have another look at the<br />
social network example, you’ll see that there’s more information in the relationships<br />
between nodes than in the nodes themselves. The value of a network is in the con-<br />
nections between the nodes and Neo’s model captures that. </p>
<h3>Creating a Node Space</h3>
<p>And now, ﬁnally some code. Here’s how we would create the Matrix social network<br />
from ﬁgure 1: </p>
<pre>Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus,
   MatrixRelationshipTypes.KNOWS );
// Create Trinity, Cypher, Agent Smith, Architect similarly
...
tx.commit();
</pre>
<p>As you can see in the code above: It is rather easy to construct the node space for our<br />
Matrix example. And, of course, our network is made persistent once we commit. </p>
<h3>Traversing a Node Space</h3>
<p>Now that we know how to represent our domain model in the node space, how do<br />
we get information out of it? Unlike a relational database, Neo does not support a<br />
declarative query language. Instead, Neo provides an object-oriented <em>traverser frame-<br />
work</em> that allows us to express complex queries in plain Java.<br />
Working with the traverser framework is very straight-forward. The core abstrac-<br />
tion is, unsurprisingly, the Traverser interface. A Traverser is a Java <code>Iterable</code> that<br />
encapsulates a “query” – i.e. a traversal on the node space such as <em>“give me all Mor-<br />
pheus’ friends and his friends’ friends”</em> or <em>“does Trinity know someone who is acquainted with an agent?”</em>. The most complex part of working with a Traverser is instantiating it. Here’s an example of how we would create a Traverser that will return all the<br />
(transitive) friends of the “Thomas Anderson” node of the example above: </p>
<pre>// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    MatrixRelationshipTypes.KNOWS,
    Direction.OUTGOING );</pre>
<p>Here we can see that traversers are created by invoking the <code>traverse(...)</code> method<br />
on a start node with a number of parameters. The parameters control the traver-<br />
sal and in this example they tell Neo to traverse the network breadth-ﬁrst (rather<br />
than depth-ﬁ rst), to traverse until it has covered all reachable nodes in the network<br />
<code>(StopEvaluator.END_OF_NETWORK)</code>, to return all nodes except the ﬁ rst <code>(Returna-<br />
bleEvaluator.ALL_BUT_START_NODE)</code>, , and to traverse all <strong>OUTGOING</strong> relation-<br />
ships of type <strong>KNOWS</strong>.<br />
How would we go about if we wanted to list the output of this traversal? After<br />
we’ve created a Traverser, working with it is as easy as working with any Java <code>Iter-<br />
able:</code></p>
<pre>// Traverse the node space and print out the result
for ( Node friend : friendsTraverser )
{
    System.out.println( friend.getProperty( “name” ) + “ at depth “ +
        friendsTraverser.currentPosition().getDepth() );
}
</pre>
<p>Running the traversal above on the Matrix example would yield the following out-<br />
put:</p>
<pre>$ bin/run-neo-example
Morpheus at depth 1
Trinity at depth 1
Cypher at depth 2
Agent Smith at depth 3
$
</pre>
<p>As you can see, the Traverser has started at the “Thomas Anderson” node and run<br />
through the entire network along the <strong>KNOWS</strong> relationship type, breadth ﬁ rst, and<br />
returned all nodes except the ﬁ rst one. “The Architect” is missing from this output<br />
since the relationship connecting him is of a different type, <strong>CODED_BY</strong>. This is a<br />
small, contrived example. But the code would work equally well on a network with<br />
hundreds of millions of nodes, relationships and properties.<br />
Now, let’s look at a more complex traversal. Going with our example, suppose<br />
that we wanted to ﬁ nd all “hackers of the Matrix,” where we deﬁ ne a hacker of the<br />
Matrix as any node that you reach through a <strong>CODED_BY</strong> relationship. How would<br />
we create a Traverser that gives us those nodes?<br />
First off, we want to traverse both our relationship types (<strong>KNOWS</strong> and <strong>COD-<br />
ED_BY</strong>). Secondly, we want to traverse until the end of the network and lastly, we<br />
want to return only nodes which we came to through a <strong>CODED_BY</strong> relationship.<br />
Here’s the code: </p>
<pre>
// Instantiate a traverser that returns all hackers of the Matrix
Traverser hackerTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    new ReturnableEvaluator()
    {
        public boolean isReturnableNode( TraversalPosition pos )
        {
            <strong>return pos.getLastRelationshipTraversed().
                isType( MatrixRelationshipTypes.CODED_BY );</strong>
        }
 },
 MatrixRelationshipTypes.CODED_BY,
 Direction.OUTGOING,
 MatrixRelationshipTypes.KNOWS,
 Direction.OUTGOING ); </pre>
<p>Now it’s getting interesting! The <code>ReturnableEvaluator.ALL_BUT_START_NODE</code> con-<br />
stant from the previous example was actually a convenience implementation of the<br />
<code>ReturnableEvaluator</code> interface. This interface contains a single method and you<br />
can supply a custom implementation of it to the traverser framework. It turns out<br />
that this is a simple but powerful way to express complex queries.<br />
Setting aside the anonymous inner class cruft surrounding the code in bold, we<br />
basically pass in a snippet of code that checks whether we traversed a relationship of<br />
type <strong>CODED_BY</strong> to get to the current node. If this statement is evaluated to “true”<br />
then the current node will be included in the set of nodes that is returned from the<br />
traverser.<br />
When executed with a simple print loop, the above code prints the following: </p>
<pre>$ bin/run-neo-example
The Architect
$
</pre>
<p>StopEvaluators work the same way. In our experience, writing custom evaluators<br />
is very easy. Even the most advanced applications we have developed with Neo<br />
–  applications that traverse extremely large and complex networks – are based on<br />
evaluators that are rarely more than a few lines of code. </p>
<h2>Conclusion</h2>
<p>Neo is not a silver bullet and some areas needs to improve, for instance tools, stand-<br />
ardizing the model and a query language.<br />
However, if your data is naturally ordered in a network or is semi-structured or you<br />
just need to go truly agile, give the Neo database a run for your money. We hope<br />
you ﬁnd it, as we do, to be an elegant and ﬂ exible alternative that is both robust and<br />
fast. </p>
<p>Emil Eifrém, Neo Technology<br />
Björn Granvik, Jayway</p>
<h2>Links </h2>
<p>Neo speciﬁcation<br />
<a href="www.neo4j.org">www.neo4j.org</a></p>
<p><em>Originally published in <a href="http://jayway.se/jayview">JayView</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jayway.com/2007/02/01/neo-%e2%80%93-a-netbase/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

