Posts Tagged ‘clojure’

My TTC OC fellow and lead Henshin developer Christian Krause has posted a simple but nice performance benchmark for graph transformation tools in this blog post.

Given models conforming to the metamodel shown in the following image, the task is to create one Couple node for every pair of persons that acted together in at least three movies.

The Movies Metamodel

The Henshin implementation of this transformation is included (currently only in the nightly builds) of the Henshin Example Plugin.

In this blog post, I’m going to compare the Henshin solution with my own solution which is implemented with the in-place transformation API of my model querying and transformation library FunnyQT.

The Transformation Specification

First, let’s have a look at the transformation specifications of both solutions.

The Henshin Transformation

Henshin is a visual graph transformation language for EMF models, i.e., rules are defined as diagrams with a quite good Eclipse-based visual editor.

The transformation consists of one single transformation rule shown below, and one Java class that acts as a test-driver applying the rule to a set of increasingly large models. (In fact, there are actually two more rules that are used to create the test models, but those aren’t important here.)

The Henshin Transformation Rule

I don’t want to recap everything Christian said, so here’s a brief overview of the rule’s concepts.

Basically, a rule mimics the structure to be matched in the model using node and edge symbols annotated with either preserve or require stereotypes. During the pattern matching process, an isomorphic mapping from node and edge symbols in the pattern to actual nodes and edges in the model is computed. If such a match can be found, new nodes and edges are created in the model as defined by the elements annotated with create stereotype in the pattern.

That all annotations are starred, e.g., require*, tells Henshin to apply the rule to all matches in one go. The usual graph transformation semantics is to apply a rule just once to some arbitrary match which is found first, and then maybe to iterate the rule application until no more matches can be found. In that case, however, we’d negative application conditions (stereotype forbid) in order not to create new Couple nodes for persons that are already coupled. Therefore, such forall-rules perform better than iterating normal rules, but you can only use them if a rule’s effect does not interfere with what’s matched by the rule, e.g., if a rule can invalidate a later match, a forall-rule won’t do.

The difference between preserve and require is that the former would produce a separate match for any combination of three movies the two persons have in common. However, here we are only interested in the existence of at least three common movies but that’s it. require does exactly that.

However, since both persons are (and need to be because there are incident create edges) annotated with preserve, the Henshin solution actually creates twice as many Couple nodes as are needed because the symmetry of the persons. That is, if two persons P1 and P2 act in at least three movies, one couple node is created for the match (P1, P2) and another one for (P2, P1).

Christian suggests enforcing an alphabetic order using a constraint on attribute values to circumvent that issue. However, the minimal metamodel doesn’t define any attributes for the Person class (or any other class). Of course, a realistic metamodel like the IMDB-based one he talks about would do so.

The FunnyQT Transformation

FunnyQT is a model querying and transformation library implemented in the functional Lisp-dialect Clojure. FunnyQT has several sub-APIs (namespaces) for different querying and transformation tasks. For example, there’s a usual out-place model-to-model transformation API (namespace funnyqt.model2model), a bidirectional transformation API (namespace funnyqt.bidi), and several more.

FunnyQT supports EMF models just like Henshin, but it also supports JGraLab TGraphs. Furthermore, it’s designed with extensibility in mind, so most parts are realized generically using Clojure protocols that can be extended dynamically to other model representations without having to touch FunnyQT’s internals (or the classes of the other model representation).

Since FunnyQT is a Clojure library, queries and transformations are just Clojure code. But as a Lisp-dialect, Clojure provides strong metaprogramming capabilities (macros) that FunnyQT uses to provide several task-oriented embedded (or internal) DSLs to the transformation developer.

Ok, that said, here goes the transformation specification. The solution project is also published in this github project.

The first thing one usually does is to define a namespace for the transformation which requires the needed parts of FunnyQT.

(ns ^{:pattern-expansion-context :emf}
  (:require [clojure.set      :as set]
            [funnyqt.emf      :as emf]
            [ :as ip]))

So the name of the transformation namespace is funnyqt-movie-couples.core, and we’re requiring the Clojure namespace clojure.set plus the two FunnyQT namespaces funnyqt.emf and clojure.set provides functions on sets like intersection, funnyqt.emf provides functions for accessing EMF model elements, and provides constructs for graph transformation-like in-place transformations.

The :as clauses define short aliases for the required namespaces. So their functions need to be qualified (making it obvious where a function called in this namespace originates from), but the qualification can be done with the short alias instead of the complete fully qualified namespace name.

The strange notation ^{:pattern-expansion-context :emf} is a Clojure metadata annotation attached to the namespace name. This concrete one tells FunnyQT that all transformation rules defined in this namespace should expand to pattern matching code suitable for EMF models.

Like the Henshin solution, the FunnyQT solution consists of one single rule shown in the next listing.

(ip/defrule ^:forall make-couples [m]
  [p1<Person> -<movies>-> <> -<persons>-> p2
   :when (and (not (identical? p1 p2))
              (three-common-movies? p1 p2))
   :as #{p1 p2}
  (emf/ecreate! m 'Couple :p1 p1 :p2 p2))

The macro defines an in-place transformation rule given a name (make-couples), an argument vector ([m]) where the first argument has to be the model, a pattern vector ([p1...]), and arbitrarily many actions that should be executed on a match ((emf/ecreate ...)).

The ^:forall metadata annotation attached to the rule’s name is FunnyQT’s equivalent to the starred stereotypes in Henshin, that is, when this rule gets applied it takes action on all matches in the model m in one go.

The structural part of the pattern (p1<Person> -<movies>-> <> -<persons>-> p2) defines that the rule matches a Person node p1 that’s connected to some other node p2 via a path of first a movies reference to some arbitrary intermediate node and then a persons reference to p2. In simple words, it defines that the persons p1 and p2 act together in some movie. Note that except for p1 there are no types declared because those are clear from the metamodel anyway. Adding the type Movie to the anonymous intermediate node and Person to p2 wouldn’t change any semantics.

In contrast to Henshin, FunnyQT’s pattern matching facilities default to homomorphic matching rather than isomorphic matching. That is, matches where p1 is matched to the very same node in the model as p2 are perfectly valid. Therefore, the pattern contains a :when constraint ensuring that p1 and p2 are not identical.

Furthermore, the structural part of the pattern just says that there is at least one movie where p1 and p2 act together, not three of them. So that additional constraint is handled by another predicate three-common-movies? that’s going to be discussed below.

You still remember that the Henshin solution created twice as many couples as needed because of the symmetry of the two persons? The last two lines of the pattern solve that issue for the FunnyQT solution. The :as clause defines that matches of the pattern should be represented as a set #{p1 p2} (rather than a tuple) thus making the order insignificant. The :distinct modifier then defines that the rule should only be applied to distinct matches.

Such a rule definition done with the macro expands into a plain Clojure function at compile-time. Thus applying the rule to a given model is just a matter of (make-couples my-model).

The missing part of the solution is the three-common-movies? predicate that returns true only if the given two persons act in at least three common movies.

(defn three-common-movies? [p1 p2]
  (>= (count (set/intersection
              (into #{} (emf/eget-raw p1 :movies))
              (into #{} (emf/eget-raw p2 :movies))))

The predicate simply checks is the intersection of p1‘s movies and p2‘s movies is larger than or equal to 3. funnyqt.emf/eget-raw is a function that gets an EObject and a structural feature name given as keyword and returns its value as-is (e.g., as an EList, here). One would usually use funnyqt.emf/eget which does the same but returns the value as a persistent, immutable Clojure data structure in case it’s a collection. Concretely, if the value is an EList, eget would return a Clojure vector. Since we’re coercing to sets anyway, using eget-raw omits a double-conversion.

Performance Comparison

Both the Henshin and the FunnyQT solution perform about equally well. They both scale linearly for the provided test models. Most probably their pattern matching approach is pretty similar. Although not clearly stated in Christian’s blog post linked above, it sounds to me that Henshin’s Giraph code generator generates code that does a breadth- or depth-first search starting at some person node.

FunnyQT patterns are also matched by transforming the textual pattern DSL into a comprehension that effectively does a depth-first search anchored at the node that occurs first in the pattern, i.e., for every Person p1 all nodes p2 connected via one movies reference followed by a persons referenced are searched one after the other.

The following table shows the pure transformation runtimes (excluding model load time, and after a warmup run) for the provided test models ranging from the size of 170,000 nodes to 1,700,000 nodes.

All tests were run on my 5 years old ThinkPad with dual-core 2.1 GHz CPU (but none of the solutions is multi-threaded anyway), and the JVM process was given 2 GB maximal heap space.

Model Size (#nodes) Henshin Time FunnyQT Time
170,000 3.3 sec 2.2 sec
340,000 7.3 sec 4.4 sec
510,000 5.7 sec 6.9 sec
680,000 7.2 sec 10.7 sec
850,000 14.1 sec 12.8 sec
1,020,000 17.4 sec 14.7 sec
1,190,000 19.4 sec 17.4 sec
1,360,000 13.2 sec 14.1 sec
1,530,000 15.9 sec 17.0 sec
1,700,000 17.8 sec 17.6 sec

Wow, quite good for both. So whatever tool you choose for solving that kind of in-place transformation problems seems to be mostly a matter of if you prefer visual or textual transformation languages.

(Of course, visual languages are for kiddies… SCNR :-P)

Gunzipping files with Clojure

Posted: February 17, 2012 in Clojure, Lisp
Tags: ,

This is just a quicky that might be useful to others, too. The following function unzips the input to the output.

Update: As Ben pointed out, tis will only work correctly for gzipped text files encoded in UTF-8 as input (ASCII, ISO-5589-1 will also be fine).

(ns foobar
  (:require [ :as io]))

(defn gunzip
  [fi fo]
  (with-open [i (io/reader
                  (io/input-stream fi)))
              o ( (io/writer fo))]
    (doseq [l (line-seq i)]
      (.println o l))))

With a lot of help from Ambrose, I managed to make Clojure’s core.logic library work with my custom Java data structures. In this posting, I’ll explain the code.  I assume that you are already familiar with Clojure in general, and you know core.logic and relational programming at least from a user’s point of view.

Ok, so let’s start.  My custom data structures are TGraphs that we develop here at our institute.  You don’t need to know more than that a graph consists of vertices, and vertices can be connected by edges.  Furthermore, both vertices and edges may have attributes.  When you use such a graph, the graph itself, every vertex, and every edge is one Java object in your memory that implements the interface Graph, Vertex, and Edge, respectively.

I wrote a nice functional Clojure API for working with TGraphs (funtg on clojars; don’t use it for serious purposes, I’m constantly changing things without thinking about compatibility at all).  So probably, if you are reading this, you are in the same situation that I was in: You have a cool data structure, you have a cool API for it, and you are totally curious what you could do with core.logic on your data structure.  So how do I get core.logic to work with my stuff???

The answer is: you have to write relations that use your existing API to access your data structure.  The topic of this posting is mainly how to doing that in a way that core.logic wants.  So let’s start with the namespace declaration for the code:

(ns extend.example
  (:refer-clojure :exclude [==])
  (:use [clojure.core.logic])
  ;; The following two are my functional API
  (:require [ :as core])
  (:require [ :as funql]))

Nothing special here, except that you can see that I require my functional TGraph API using prefixes.  So when you see core/foo or funql/bar in the following, you know that I’m calling my functional API there.  As next, I added some helper functions for testing if a logic variable is fresh or ground.  Ignore the comment about being walk-ed for now.

(defn fresh?
  "Returns true, if `x' is fresh.
  `x' must have been `walk'ed before!"
  (lvar? x))

(defn ground?
  "Returns true, if `x' is ground.
  `x' must have been `walk'ed before!"
  (not (lvar? x)))

Then, I’ll defined a constant +graph+ that holds some example TGraph (some route map like graph).  I decided to keep the graph as a var of the namespace instead of making it a parameter of relations, because my API has no way to enumerate all graphs that happen to be in memory.  If the graph was a parameter of relations, I couldn’t be fully relational, e.g., giving only fresh logic variables to my relations would have to error.

(def +graph+ (core/load-graph "/home/horn/Repos/uni/funtg/test/"))

Now we’ll come to the actual first relation.  vertexo is a relation where v is a vertex in the graph +graph+.

(defn vertexo
  "A relation where `v' is a vertex."
  (fn [a]                                 ;; (1)
    (let [gv (walk a v)]
      (if (fresh? gv)
        (to-stream                        ;; (2)
         (->> (map #(unify a v %)
                   (funql/vseq +graph+))
              (remove not)))
         (if (.containsVertex +graph+ gv)
          (fail a))))))

Basically, the stuff marked with (*) is what’s important. (1) A relation must return a function which gets a so-called substitution a. You can think of it as something like an environment map which knows what logical variables are ground, and if so, what value they have. (walk a v) gets you the value of the variable (or value) v. If v is ground, then you get a value back. If v is fresh, you get a logical variable back. So now the functions fresh? and ground? above make sense, right?

The other important part (2) is that the function returned by a relation has to return a substitution again: A relation returns a function that gets a substitution and returns a substitution. to-stream turns a seq into a choice, which essentially say which possible values are allowed for the relation’s parameters. We declare such a possible binding using unify. funql/vseq returns the lazy seq of the graph’s vertices, each of which may be unified with the parameter v in the substitution a. If v was ground, then it could only be unified with the vertex it is already bound to. Because of that, it’s a good idea to filter out false values in the sequence before giving it to to-stream.

If I had put (2) directly into (1), it would work exactly the same. The additional code is for performance purposes. If v is already ground, there’s no reason to try to unify it with every vertex in the graph just to check if it is contained. Instead, a simple check (via the Java API) to see if that vertex is in the graph is enough. If it’s contained, then the substitution a is correct, so I return it “unchanged” (quotes, because of course we don’t mutate in Clojure). Else, the vertex is not in +graph+, so the binding is invalid and we fail.

That’s it. Now let’s turn to edges which are accessed with a relation of 3 parameters denoting the edge itself, its start vertex, and its end vertex.

(defn edgeo
  "A relation where `e' is an edge from `alpha' to `omega'."
  [e alpha omega]
  (fn [a]
    (let [ge     (walk a e)
          galpha (walk a alpha)
          gomega (walk a omega)]
        (ground? ge) (unify a [alpha omega]
                            [(core/alpha ge) (core/omega ge)])
        (ground? galpha) (to-stream
                           (->> (map #(unify a [e omega] [% (core/omega %)])
                                     (funql/iseq galpha nil :out))
                                (remove not)))
        (ground? gomega) (to-stream
                           (->> (map #(unify a [e alpha] [% (core/alpha %)])
                                    (funql/iseq gomega nil :in))
                                (remove not)))
        :else (to-stream
                (->> (for [edge (funql/eseq +graph+)]
                       (unify a [e alpha omega]
                              [edge (core/alpha edge) (core/omega edge)]))
                     (remove not)))))))

Again, basically the relation would work fine if we delete everything except the expression of the :else clause.  That unifies all edges in the graph including their start and end vertices with the parameters of the relation.  The three first clauses of the cond are for performance only.  The first says, if an edge is already given, then we only check the start and end vertex, which is a constant time operation.  The second and third clause say, if either the start or end vertex are already given, then we don’t need to unify all edges in the graph with the parameters but only the edges incident to the given vertex (eseq vs. iseq), which is again much faster.

The last relation I’ll show is concerned with attributes.

(defn valueo
  "A relation where `ae' has value `val' for its `at' attribute."
  [ae at val]
  (fn [a]
    (let [gae (walk a ae)
          gat (walk a at)
          gval (walk a val)]
        (and (ground? gae)
             (ground? gat)) (or (unify a [ae at val]
                                       [gae gat (core/value gae gat)]))
        (ground? gae) (to-stream
                        (->> (for [attr (seq (.getAttributeList
                                               (core/attributed-element-class gae)))
                                   :let [an (keyword (.getName attr))]]
                                (unify a [ae at val]
                                      [gae an (core/value gae an)]))
                             (remove not)))
        :else (to-stream
                (->> (for [elem (concat (funql/vseq +graph+)
                                        (funql/eseq +graph+))
                           attr (seq (.getAttributeList
                                       (core/attributed-element-class elem)))
                           :let [an (keyword (.getName attr))]]
                        (unify a [ae at val]
                               [elem an (core/value elem an)]))
                     (remove not)))))))

Again, only the :else part is neccessary.  I unify every vertex and every edge together with every attribute that is defined for them with the given parameters.  Clearly, that’s quite some effort.  The first clause of the cond simply looks up the attribute value if the element and the attribute name are already given.  The second clause deals with the case when at least the graph element is given, in which case only all its attribute/value pairs have to be unified.

So here are some example applications.

What are the 3 first vertices in the graph?

(run 3 [q]
  (vertexo q))
;=> (#<v1: localities.Village>
     #<v2: localities.Village>
     #<v3: localities.Town>)

Is this a vertex of the graph?

(let [v1 (core/vertex +graph+ 1)]
  (run* [q]
    (vertexo v1)))
;=> (_.0)     ; succeeded, so yes, it's a vertex of +graph+

What are the first 3 edges of the graph?

(run 3 [q]
  (fresh [a o]
    (edgeo q a o)))
;=> (#<e1: connections.Footpath>
     #<e2: connections.Footpath>
     #<e3: connections.Footpath>)

What edges end at the vertex v1?

(let [v1 (core/vertex +graph+ 1)]
  (run* [q]
    (fresh [o]
      (edgeo q o v1))))
;=> (#<e-22: localities.ContainsLocality>)

Which attribute of what element has the value 251?

(run* [q]
  (fresh [e a]
    (valueo e a 251)
    (== q [e a])))
;=> ([#<v1: localities.Village> :inhabitants])

So that village has 251 inhabitants.  Pretty small.  What’s the name of that village?

(run* [q]
  (fresh [e a]
    (valueo e a 251)
    (valueo e :name q)))
;=> ("Kammerforst")

Ah, it’s Kammerforst.  Oh, wikipedia says its population has increased to 253. ;-)

Ok, that’s it.  I hope this helps you to make your custom data structures work with core.logic. Have fun!

defmacro! revisited

Posted: September 28, 2011 in Clojure
Tags: ,

In my last post, I’ve introduced the defmacro! macro, which is just like defmacro, except that it guarantees that all of the arguments are evaluated once only.

However, in contrast to Doug Hoyte’s defmacro! he introduced in Let over Lambda, my macro expanded into a normal defmacro form that expanded into a form where all args were evaluated exactly once.

Clearly, this was totally flawed, because in, say, new control structures, you may want to have some argument evaluated never.

So here’s a better version which allows for better control about evaluation. All args with trailing ! (BANG, in Clojure speak) will be evaluated exactly once, and the rest of the args stays under the programmer’s control (note that this version also takes a mandatory docstring):

(defn bang-symbol?
  "Returns true, if sym is a symbol with name ending in a exclamation
  mark (bang)."
  (and (symbol? sym)
       (= (last (name sym)) \!)))

(defmacro defmacro!
  "Defines a macro name with the given docstring, args, and body.
  All args ending in an exclamation mark (!, bang) will be evaluated only once
  in the expansion, even if they are unquoted at several places in body.  This
  is especially important for args whose evaluation has side-effecs or who are
  expensive to evaluate."
  [name docstring args & body]
  (let [bang-syms (filter bang-symbol? args)
        rep-map (apply hash-map
                       (mapcat (fn [s] [s `(quote ~(gensym))])
    `(defmacro ~name
       `(let ~~(vec (mapcat (fn [[s t]] [t s]) rep-map))
          ~(clojure.walk/postwalk-replace ~rep-map ~@body)))))

Using that, we can now easily implement the numeric if, nif, you can find in On Lisp and Let over Lambda:

(defmacro! nif
  "Numeric if: evals test! (only once) and executes either pos, zero, or neg
  depending on the result."
  [test! pos zero neg]
    (pos? ~test!)  ~pos
    (zero? ~test!) ~zero
    :else          ~neg))

When evaluating (nif 1 (println "pos") (println "zero") (println "neg")), now there’s only “pos” printed. With the previous defmacro! version, “pos”, “zero”, and “neg” were printed.

UPDATE: Stefan Kamphausen noticed that defmacro! doesn’t work as intended if destructuring is done in the argument list. So here’s yet another version that flattens the argument list when collecting the bang-symbols.

(defmacro defmacro!
  "Defines a macro name with the given docstring, args, and body.
  All args ending in an exclamation mark (!, bang) will be evaluated only once
  in the expansion, even if they are unquoted at several places in body.  This
  is especially important for args whose evaluation has side-effecs or who are
  expensive to evaluate."
  [name docstring args & body]
  (let [bang-syms (filter bang-symbol? (flatten args)) ;; <==
        rep-map (apply hash-map
                       (mapcat (fn [s] [s `(quote ~(gensym))])
    `(defmacro ~name
       `(let ~~(vec (mapcat (fn [[s t]] [t s]) rep-map))
          ~(clojure.walk/postwalk-replace ~rep-map ~@body)))))

Using that, you can define a strange nif variant that wants a vector, where the first entry is a vector containing the test, and the second entry is a vector of the pos, zero, neg entries.

(defmacro! strange-nif
  "Like nif, but with strange destructuring"
  [[[test!] [pos zero neg]]]
    (pos? ~test!)  ~pos
    (zero? ~test!) ~zero
    :else          ~neg))

;; Trying it...
user> (strange-nif [[1] [:pos :zero :neg]])
user> (macroexpand '(strange-nif [[1] [:pos :zero :neg]]))
(let [G__1974 1]
     (cond (pos? G__1974) :pos (zero? G__1974) :zero :else :neg))

Once-only evaluation for Clojure macros

Posted: September 23, 2011 in Clojure
Tags: ,

When programming macros, it’s often desired to have its arguments evaluated only once. Let’s have a look at a simple example:

user> (defmacro square [x] `(* ~x ~x))
user> (square 5)

At a first glance, it seems to work. But see what happens here:

user> (def c (let [a (atom 4)] #(swap! a inc)))

This defines a counter function, which will increase it’s count on every call. Now let’s feed that to our macro:

user> (square (c))

Oh, shouldn’t that be 25, because we’re increasing our counter to 5? No, because the macroexpansion is

user> (macroexpand-1 '(square (c)))
(clojure.core/* (c) (c))

So our counter is increased twice resulting in (* 5 6). The lesson to be learned is that when writing a macro, one should take care that every argument is evaluated once, i.e., if you feel the need to unquote an argument more than once, you have to let-bind its value to some gensym and use that later on. This is a fixed version of the square macro.

user> (defmacro square [x] `(let [x# ~x] (* x# x#)))
user> (macroexpand-1 '(square (c)))
(clojure.core/let [x__5139__auto__ (c)]
  (clojure.core/* x__5139__auto__ x__5139__auto__))

As you can see, now our counter is incremented only once, and its new value is bound to a generated variable which is used in the body.

But why not do exactly that by default? Here’s a macro that does that:

(defmacro defmacro!
  "Defines a macro in which all args are evaled only once."
  [name args & body]
  (let [rep-map (apply hash-map
                       (mapcat (fn [s] [s `(quote ~(gensym))])
    `(defmacro ~name ~args
       `(let ~~(vec (mapcat (fn [[s t]] [t s]) rep-map))
          ~(clojure.walk/postwalk-replace ~rep-map ~@body)))))

Using this macrowriting macro, we can now safely use our first implementation of square:

user> (defmacro! square [x]
  `(* ~x ~x))
user> (macroexpand-1 '(square (c)))
(clojure.core/let [G__5491 (c)] (clojure.core/* G__5491 G__5491))
user> (square (c))

Great, seems to work.

UPDATE: This version of defmacro! has one serious design flaw. The completely expanded form will evaluate all arguments exactly once, but often you want to be able to have some argument evaluated never. See this sequel post for an enhanced version.