Finding Out Who Changed What with Datomic

Imagine that you one day were asked to implement the following requirement:

“As an Administrator, I want to know what has changed in the database since a given timestamp, and also who made the changes, so that I can feel secure that nobody will screw up the database.”

In most projects, this would give you a major headache. There is no notion of time in a database. There might be a transaction log you could look in, or perhaps some audit functionality that you could enable, but in most systems, you would have to make significant changes to the data model. Even if you already have implemented some history functionality, you have most likely only done it for just a few of your entities, not all of them. In Datomic, however, this is a one-liner. I’m serious. The notion of time is built into Datomic, and getting a database from back in time is a single function call away.

In this article, I will first describe how to get the individual entities that have changed since a given time, and then how to add provenance information, such as who performed a change. Let’s start by creating a database with some sample data. If you want to follow along, you can download Datomic here, and look at the Datomic Tutorial. I’ll demonstrate this in Clojure, but it works perfectly fine with Java as well. You can quite easily translate it into Java with some help of the tutorial. Queries are identical, and the function calls are very similar; it’s the manipulation of the collections that differs, mostly.

Setup

I’ll unzip the Datomic installation zip and change to the Datomic directory:

$ unzip datomic-0.1.3164.zip
$ cd datomic-0.1.3164

First I’ll start an interactive REPL:

$ bin/repl-jline
Clojure 1.4.0
user=>

The user=> part is the prompt. Now I’ll import the functions I’ll need, making two of them directly callable as q and db, while the rest will be available under the d namespace (d/transact, d/entity, etc):

user=> (use '[datomic.api :only (q db) :as d])

From the REPL, I’ll now create an in-memory database, create a connection to it, load a schema, and then load some sample data. The function d/transact returns a Future, and in Clojure, instead of calling get() on a Future, we can “dereference” it, either using the deref function, as in (deref future), or the equivalent reader macro @, as in @future.

user=> (def uri "datomic:mem://seattle")
user=> (d/create-database uri)
user=> (def conn (d/connect uri))
user=> @(d/transact conn (read-string (slurp "samples/seattle/seattle-schema.dtm")))
user=> @(d/transact conn (read-string (slurp "samples/seattle/seattle-data0.dtm")))

As a test, I’ll query for all communities with a name that begins with an ‘A’. First we’ll get all entities that have a community name attribute, and bind the entity ids to ?c and the attribute value to ?n. Note that it’s perfectly fine to call regular Java or Clojure functions from within a query. Here I call the Java String compareTo method on the community name ?n, and bind the result to ?res. Then I call the less-than, <, Clojure predicate function with the result from the compareTo call, limiting the result to those community names that are less than ‘B’, ie those that begin with an ‘A’.

Let’s first see how many communities we have that begin with an ‘A’:

user=> (count (q '[:find ?c ?n
                   :where [?c :community/name ?n]
                          [ (.compareTo ?n "B") ?res ]
                          [ (< ?res 0) ] ] (db conn)))
9

OK, just nine. Let's pretty-print them. The result set is a regular Java HashSet. In order for it to pretty-print nicely, I'll turn the set into a Clojure sequence using the function seq:

user=> (use 'clojure.pprint)
user=> (pprint (seq (q '[:find ?c ?n
                         :where [?c :community/name ?n]
                                [ (.compareTo ?n "B") ?res ]
                                [ (< ?res 0) ] ] (db conn))))
([17592186045463 "Aurora Seattle"]
 [17592186045454 "All About South Park"]
 [17592186045448 "Alki News/Alki Community Council"]
 [17592186045446 "Alki News"]
 [17592186045460 "At Large in Ballard"]
 [17592186045457 "ArtsWest"]
 [17592186045451 "All About Belltown"]
 [17592186045440 "15th Ave Community"]
 [17592186045443 "Admiral Neighborhood Association"])
nil

Seeing What Changed

Every attribute value belongs to a transaction. It's easy to query for the associated transaction and get its timestamp. I'll just add a fourth ?tx element to the community name data clause, join with the transaction entity, and retrieve the transaction time as ?t:

user=> (pprint (seq (q '[:find ?c ?n ?t
                         :where [?tx :db/txInstant ?t]
                                [?c :community/name ?n ?tx]
                                [ (.compareTo ?n "B") ?res ]
                                [ (< ?res 0) ] ] (db conn))))
([17592186045451 "All About Belltown"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045463 "Aurora Seattle"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045460 "At Large in Ballard"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045454 "All About South Park"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045457 "ArtsWest"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045440 "15th Ave Community"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045446 "Alki News"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045443 "Admiral Neighborhood Association"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045448 "Alki News/Alki Community Council"
  #inst "2012-06-25T10:33:16.436-00:00"])
nil

We see that all these were added at 10:33:16. Note that #inst "some time..." is the Clojure literal for a timestamp. No need to fight with SimpleDateFormat or GregorianCalendar. We can treat it as an instance of a java.util.Date:

user=> (class #inst "2012-06-25T10:33:16.436-00:00")
java.util.Date
user=> (.getTime #inst "2012-06-25T10:33:16.436-00:00")
1340620396436

Now let's make a change to one of the attributes. We'll rename 17592186045460 to "At Large in Ballard renamed":

user=> @(d/transact conn [{:db/id 17592186045460 :community/name "At Large in Ballard renamed"}])

A transaction is a list containing maps with changes. In this case, it's a single map with the id of the entity to change, and the new value. Let's print the nine communities and see what happened:

user=> (pprint (seq (q '[:find ?c ?n ?txInstant
                         :where [?tx :db/txInstant ?txInstant]
                                [?c :community/name ?n ?tx]
                                [ (.compareTo ?n "B") ?res ]
                                [ (< ?res 0) ] ] (db conn))))
([17592186045451 "All About Belltown"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045463 "Aurora Seattle"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045460 "At Large in Ballard renamed"
  #inst "2012-06-25T10:55:47.363-00:00"]
 [17592186045454 "All About South Park"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045457 "ArtsWest"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045440 "15th Ave Community"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045446 "Alki News"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045443 "Admiral Neighborhood Association"
  #inst "2012-06-25T10:33:16.436-00:00"]
 [17592186045448 "Alki News/Alki Community Council"
  #inst "2012-06-25T10:33:16.436-00:00"])
nil

We see that the community name was actually changed to "At Large in Ballard renamed" at 10:55:47. Now to the one-liner I mentioned in the beginning of this article. Here is the code for listing the entities that have changed since 10:33:16.436:

user=> (distinct (map :e (d/datoms (d/since (db conn) #inst "2012-06-25T10:33:16.436-00:00") :eavt)))
(13194139534683 17592186045460)

We get the current database using (db conn), pass that into d/since together with a timestamp to get the database as it was after that time. We pass that database value to d/datoms together with the sort order we want (:eavt = entity, attribute, value, transaction). That will give us a sequence of index elements. We then use map to get only the entities, and finally we remove duplicates using distinct.

We can make it easier to read, though. Using Clojure's threading macros, it would look like this:

user=> (->> (-> (db conn)
                (d/since #inst "2012-06-25T10:33:16.436-00:00")
                (d/datoms :eavt))
            (map :e)
            distinct)
(13194139534683 17592186045460)

The threading macros that look like arrows turn normal forms inside-out, making them read more like a sequence of events. Instead of (h (g (f x))), we get (-> x f g h), as in "pass x to f, then pass the result of that to g, then pass the result of that to h". The difference between -> and ->> is just where x is placed in case the functions take more than one argument.

Using the utility macroexpand-all and a few functions that take a number as an argument, we can see how the two macros differ. The -> macro places x and each subsequent result as the first argument, while ->> places x and each subsequent result as the last argument:

user=> (use '[clojure.walk :only (macroexpand-all)])
user=> (macroexpand-all '(-> x (h 3) (g 2) (f 1)))
(f (g (h x 3) 2) 1)
user=> (macroexpand-all '(->> x (h 3) (g 2) (f 1)))
(f 1 (g 2 (h 3 x)))

To prove it, here is the expansion of the re-written one-liner:

user=> (macroexpand-all
         '(->> (-> (db conn)
                   (d/since #inst "2012-06-25T10:33:16.436-00:00")
                   (d/datoms :eavt))
               (map :e)
               distinct))
(distinct (map :e (d/datoms (d/since (db conn) #inst "2012-06-25T10:33:16.436-00:00") :eavt)))

Now, let's name those changed entities and have a closer look at them:

user=> (def changed (->> (-> (db conn)
                             (d/since #inst "2012-06-25T10:33:16.436-00:00")
                             (d/datoms :eavt))
                         (map :e)
                         distinct))

For all of these, we'll call d/entity to get the entity, turn it into a sequence so we'll see all attribute values, and pretty-print it:

user=> (pprint (map #(seq (d/entity (db conn) %)) changed))
(([:db/txInstant #inst "2012-06-25T10:55:47.363-00:00"])
 ([:community/type :community.type/blog]
  [:community/orgtype :community.orgtype/commercial]
  [:community/category #{"news" "human interest"}]
  [:community/neighborhood {:db/id 17592186045459}]
  [:community/url "http://blog.seattlepi.com/ballard/"]
  [:community/name "At Large in Ballard renamed"]))
nil

We can see the transaction and what the entity looks like from the given timestamp.

Provenance

"Provenance, from the French provenir, "to come from", refers to the chronology of the ownership or location of a historical object."

Now let's add an attribute that we can use for keeping track of who made a change. We'll call it :audit/user and it's a string:

user=> @(d/transact conn
         [{:db/id #db/id[:db.part/db]
           :db/ident :audit/user
           :db/valueType :db.type/string
           :db/cardinality :db.cardinality/one
           :db/index true
           :db.install/_attribute :db.part/db}])

With this attribute in place, we can start sending user info in all our transactions. Here is one that resets the community name that we changed earlier:

user=> @(d/transact conn
         [{:db/id 17592186045460 :community/name "At Large in Ballard"}
          [:db/add (d/tempid :db.part/tx) :audit/user "ulsa" ]])

We can again look at what's changed:

user=> (def changed (->> (-> (db conn)
                             (d/since #inst "2012-06-25T10:33:16.436-00:00")
                             (d/datoms :eavt))
                         (map :e)
                         distinct))
user=> changed
(0 72 13194139534683 13194139534684 13194139534685 17592186045460)

Oh, that was more than we thought we would get. Of course, if we use the old timestamp, we get all the changes we've made since then, like adding the new attribute, for example. Let's use a more recent timestamp, say 11:13:23.642, to get just the latest change:

user=> (def changed (->> (-> (db conn)
                             (d/since #inst "2012-06-25T11:13:23.642-00:00")
                             (d/datoms :eavt))
                         (map :e)
                         distinct))
user=> changed
(13194139534685 17592186045460)

That looks better. Let's look at those entities:

user=> (pprint (map #(seq (d/entity (db conn) %)) changed))
(([:audit/user "ulsa"]
  [:db/txInstant #inst "2012-06-25T11:13:23.643-00:00"])
 ([:community/type :community.type/blog]
  [:community/orgtype :community.orgtype/commercial]
  [:community/category #{"news" "human interest"}]
  [:community/neighborhood {:db/id 17592186045459}]
  [:community/url "http://blog.seattlepi.com/ballard/"]
  [:community/name "At Large in Ballard"]))
nil

Here we see that the community name has been reset to "At Large in Ballard", and that my user name was included in the transaction.

Conclusion

The notion of time is built into Datomic. Datomic provides functions for retrieving a database value "since" or "as of" a given time. This enables powerful features such as to easily retrieve the individual changes made since a given time.

Schemas in Datomic controls only the attributes, not the entities. There are no restrictions on what attributes can be added to an entity. This gives the developer a welcome flexibility, at the same time as it provides just enough structure.

Transactions are regular entities in Datomic. Together with the flexible schema, this makes it trivial to add any kind of data to each transaction, such as who made the change, why it was changed, using which client, and where the data comes from.

References

Thanks to Stuart Halloway and Jonas Enlund for inspiring code examples.

This Post Has 4 Comments

  1. Paul Evans

    Thanks for this post Ulrik. Couple questions if that’s okay:
    * Why in the last output does the :db/txInstant appear with the audit-user entity, but not with the community entity? (or, why does it show at all with the audit-user entity?)
    * What about retractions. I.e., if instead of renaming the community entity, what if you retracted it? Would that knowledge reflect in the (-> (db/conn) (d/since …) (d/datoms :eavt)) call?

    1. Ulrik Sandberg

      Very good questions, Paul. Here’s my take on them.

      > Why in the last output does the :db/txInstant appear with the audit-user entity, but not with the community entity? (or, why does it show at all with the audit-user entity?)

      The first entity in the sequence you refer to is actually the transaction entity. Every transaction gets an entity created for it. It’s the transaction entity that holds the time in Datomic, so the :db/instant attribute is always connected to that. What I did was connect also the :audit/user attribute to the transaction entity. I did that with the code in the second snippet in the Provenance section:


      [:db/add (d/tempid :db.part/tx) :audit/user “ulsa” ]]

      Note that the id is created in the :db.part/tx partition. That tells Datomic to add the attribute and value on the transaction entity for the transaction itself.

      > What about retractions. I.e., if instead of renaming the community entity, what if you retracted it? Would that knowledge reflect in the (-> (db/conn) (d/since …) (d/datoms :eavt)) call?

      No, it wouldn’t directly reflect that. The datoms function shows the index as it is now, and retractions are facts that have been removed, and thus are not included in the index. However, it will show the transaction entity of the retraction, since the transaction itself was a new fact.

      If you want to see the complete history of the database, you can use the history function. It will give you a special database containing all assertions and retractions across time. You can feed that database into datoms, extract the entity ids, and the result will contain the id of the entity which had its attribute retracted.

      (-> (db conn) (d/since …) d/history (d/datoms :eavt))

  2. Lars Jägare

    Really nice post Ulrik!
    In order to fullfil the requirement to list all changes since a given time and also see who did the change there is one more query needed to make use of the metadata you added to the transaction. Here you got it:

    (pprint (seq (q ‘[:find ?c ?n ?txInstant ?who
    :in $ ?since
    :where
    [?tx :db/txInstant ?txInstant]
    [?tx :audit/user ?who]
    [ (.after ?txInstant ?since)]
    [?c :community/name ?n ?tx]
    ] (db conn) #inst “2016-04-19T14:48:42.000-00:00”)))

    Giving the output (with a timestamp of today, when I verified it):

    ([17592186045457
    “At Large in Ballard moddad”
    #inst “2016-04-19T14:48:42.217-00:00”
    “ulsa”])

    Cheers /Lars

    1. Ulrik Sandberg

      You’re right. Thanks for calling me out and going the final stretch.

Leave a Reply