Gunzipping files with Clojure

Posted: February 17, 2012 in Clojure, Lisp
Tags: ,

This is just a quicky that might be useful to others, too. The following function unzips the input to the output.

Update: As Ben pointed out, tis will only work correctly for gzipped text files encoded in UTF-8 as input (ASCII, ISO-5589-1 will also be fine).

(ns foobar
  (:require [ :as io]))

(defn gunzip
  [fi fo]
  (with-open [i (io/reader
                  (io/input-stream fi)))
              o ( (io/writer fo))]
    (doseq [l (line-seq i)]
      (.println o l))))

With a lot of help from Ambrose, I managed to make Clojure’s core.logic library work with my custom Java data structures. In this posting, I’ll explain the code.  I assume that you are already familiar with Clojure in general, and you know core.logic and relational programming at least from a user’s point of view.

Ok, so let’s start.  My custom data structures are TGraphs that we develop here at our institute.  You don’t need to know more than that a graph consists of vertices, and vertices can be connected by edges.  Furthermore, both vertices and edges may have attributes.  When you use such a graph, the graph itself, every vertex, and every edge is one Java object in your memory that implements the interface Graph, Vertex, and Edge, respectively.

I wrote a nice functional Clojure API for working with TGraphs (funtg on clojars; don’t use it for serious purposes, I’m constantly changing things without thinking about compatibility at all).  So probably, if you are reading this, you are in the same situation that I was in: You have a cool data structure, you have a cool API for it, and you are totally curious what you could do with core.logic on your data structure.  So how do I get core.logic to work with my stuff???

The answer is: you have to write relations that use your existing API to access your data structure.  The topic of this posting is mainly how to doing that in a way that core.logic wants.  So let’s start with the namespace declaration for the code:

(ns extend.example
  (:refer-clojure :exclude [==])
  (:use [clojure.core.logic])
  ;; The following two are my functional API
  (:require [ :as core])
  (:require [ :as funql]))

Nothing special here, except that you can see that I require my functional TGraph API using prefixes.  So when you see core/foo or funql/bar in the following, you know that I’m calling my functional API there.  As next, I added some helper functions for testing if a logic variable is fresh or ground.  Ignore the comment about being walk-ed for now.

(defn fresh?
  "Returns true, if `x' is fresh.
  `x' must have been `walk'ed before!"
  (lvar? x))

(defn ground?
  "Returns true, if `x' is ground.
  `x' must have been `walk'ed before!"
  (not (lvar? x)))

Then, I’ll defined a constant +graph+ that holds some example TGraph (some route map like graph).  I decided to keep the graph as a var of the namespace instead of making it a parameter of relations, because my API has no way to enumerate all graphs that happen to be in memory.  If the graph was a parameter of relations, I couldn’t be fully relational, e.g., giving only fresh logic variables to my relations would have to error.

(def +graph+ (core/load-graph "/home/horn/Repos/uni/funtg/test/"))

Now we’ll come to the actual first relation.  vertexo is a relation where v is a vertex in the graph +graph+.

(defn vertexo
  "A relation where `v' is a vertex."
  (fn [a]                                 ;; (1)
    (let [gv (walk a v)]
      (if (fresh? gv)
        (to-stream                        ;; (2)
         (->> (map #(unify a v %)
                   (funql/vseq +graph+))
              (remove not)))
         (if (.containsVertex +graph+ gv)
          (fail a))))))

Basically, the stuff marked with (*) is what’s important. (1) A relation must return a function which gets a so-called substitution a. You can think of it as something like an environment map which knows what logical variables are ground, and if so, what value they have. (walk a v) gets you the value of the variable (or value) v. If v is ground, then you get a value back. If v is fresh, you get a logical variable back. So now the functions fresh? and ground? above make sense, right?

The other important part (2) is that the function returned by a relation has to return a substitution again: A relation returns a function that gets a substitution and returns a substitution. to-stream turns a seq into a choice, which essentially say which possible values are allowed for the relation’s parameters. We declare such a possible binding using unify. funql/vseq returns the lazy seq of the graph’s vertices, each of which may be unified with the parameter v in the substitution a. If v was ground, then it could only be unified with the vertex it is already bound to. Because of that, it’s a good idea to filter out false values in the sequence before giving it to to-stream.

If I had put (2) directly into (1), it would work exactly the same. The additional code is for performance purposes. If v is already ground, there’s no reason to try to unify it with every vertex in the graph just to check if it is contained. Instead, a simple check (via the Java API) to see if that vertex is in the graph is enough. If it’s contained, then the substitution a is correct, so I return it “unchanged” (quotes, because of course we don’t mutate in Clojure). Else, the vertex is not in +graph+, so the binding is invalid and we fail.

That’s it. Now let’s turn to edges which are accessed with a relation of 3 parameters denoting the edge itself, its start vertex, and its end vertex.

(defn edgeo
  "A relation where `e' is an edge from `alpha' to `omega'."
  [e alpha omega]
  (fn [a]
    (let [ge     (walk a e)
          galpha (walk a alpha)
          gomega (walk a omega)]
        (ground? ge) (unify a [alpha omega]
                            [(core/alpha ge) (core/omega ge)])
        (ground? galpha) (to-stream
                           (->> (map #(unify a [e omega] [% (core/omega %)])
                                     (funql/iseq galpha nil :out))
                                (remove not)))
        (ground? gomega) (to-stream
                           (->> (map #(unify a [e alpha] [% (core/alpha %)])
                                    (funql/iseq gomega nil :in))
                                (remove not)))
        :else (to-stream
                (->> (for [edge (funql/eseq +graph+)]
                       (unify a [e alpha omega]
                              [edge (core/alpha edge) (core/omega edge)]))
                     (remove not)))))))

Again, basically the relation would work fine if we delete everything except the expression of the :else clause.  That unifies all edges in the graph including their start and end vertices with the parameters of the relation.  The three first clauses of the cond are for performance only.  The first says, if an edge is already given, then we only check the start and end vertex, which is a constant time operation.  The second and third clause say, if either the start or end vertex are already given, then we don’t need to unify all edges in the graph with the parameters but only the edges incident to the given vertex (eseq vs. iseq), which is again much faster.

The last relation I’ll show is concerned with attributes.

(defn valueo
  "A relation where `ae' has value `val' for its `at' attribute."
  [ae at val]
  (fn [a]
    (let [gae (walk a ae)
          gat (walk a at)
          gval (walk a val)]
        (and (ground? gae)
             (ground? gat)) (or (unify a [ae at val]
                                       [gae gat (core/value gae gat)]))
        (ground? gae) (to-stream
                        (->> (for [attr (seq (.getAttributeList
                                               (core/attributed-element-class gae)))
                                   :let [an (keyword (.getName attr))]]
                                (unify a [ae at val]
                                      [gae an (core/value gae an)]))
                             (remove not)))
        :else (to-stream
                (->> (for [elem (concat (funql/vseq +graph+)
                                        (funql/eseq +graph+))
                           attr (seq (.getAttributeList
                                       (core/attributed-element-class elem)))
                           :let [an (keyword (.getName attr))]]
                        (unify a [ae at val]
                               [elem an (core/value elem an)]))
                     (remove not)))))))

Again, only the :else part is neccessary.  I unify every vertex and every edge together with every attribute that is defined for them with the given parameters.  Clearly, that’s quite some effort.  The first clause of the cond simply looks up the attribute value if the element and the attribute name are already given.  The second clause deals with the case when at least the graph element is given, in which case only all its attribute/value pairs have to be unified.

So here are some example applications.

What are the 3 first vertices in the graph?

(run 3 [q]
  (vertexo q))
;=> (#<v1: localities.Village>
     #<v2: localities.Village>
     #<v3: localities.Town>)

Is this a vertex of the graph?

(let [v1 (core/vertex +graph+ 1)]
  (run* [q]
    (vertexo v1)))
;=> (_.0)     ; succeeded, so yes, it's a vertex of +graph+

What are the first 3 edges of the graph?

(run 3 [q]
  (fresh [a o]
    (edgeo q a o)))
;=> (#<e1: connections.Footpath>
     #<e2: connections.Footpath>
     #<e3: connections.Footpath>)

What edges end at the vertex v1?

(let [v1 (core/vertex +graph+ 1)]
  (run* [q]
    (fresh [o]
      (edgeo q o v1))))
;=> (#<e-22: localities.ContainsLocality>)

Which attribute of what element has the value 251?

(run* [q]
  (fresh [e a]
    (valueo e a 251)
    (== q [e a])))
;=> ([#<v1: localities.Village> :inhabitants])

So that village has 251 inhabitants.  Pretty small.  What’s the name of that village?

(run* [q]
  (fresh [e a]
    (valueo e a 251)
    (valueo e :name q)))
;=> ("Kammerforst")

Ah, it’s Kammerforst.  Oh, wikipedia says its population has increased to 253. ;-)

Ok, that’s it.  I hope this helps you to make your custom data structures work with core.logic. Have fun!

Yay, I’m a celebrity now!

Posted: December 9, 2011 in Uncategorized

I’ve just received an email from wordpress congratulating me that my blog has gotten its first follower. I run that blog for about five years now, and I didn’t even know that there is some following functionality! Well, now I can see that button.  Prior to that, it was just one of the 378 other buttons in the wordpress interface I have no clue for what they are good for.

Seems like I am an old fart, totally Web 0.1… ;-)

A quick pop-up shell for emacs

Posted: October 12, 2011 in Emacs, Emacs Lisp

Some file managers provide a shortcut to quickly embed some terminal that’s initialized with the current directory as cwd. Wouldn’t that be cool to have in emacs?

For example, you edit some file, and then you want to quickly commit it. Using the code below, you can do that like this:

  • Edit the file and save: C-x C-s
  • Popup a shell: F12
  • Check changes and commit (using svn):
    $ svn diff | colordiff
    $ svn ci -m "some changes" RET
  • Close the popup shell again: F12
  • Here’s the code:

    (defvar th-shell-popup-buffer nil)
    (defun th-shell-popup ()
      "Toggle a shell popup buffer with the current file's directory as cwd."
      (unless (buffer-live-p th-shell-popup-buffer)
        (save-window-excursion (shell "*Popup Shell*"))
        (setq th-shell-popup-buffer (get-buffer "*Popup Shell*")))
      (let ((win (get-buffer-window th-shell-popup-buffer))
    	(dir (file-name-directory (or (buffer-file-name)
    				      ;; dired
    				      ;; use HOME
        (if win
    	(quit-window nil win)
          (pop-to-buffer th-shell-popup-buffer nil t)
          (comint-send-string nil (concat "cd " dir "\n")))))
    (global-set-key (kbd "<f12>") 'th-shell-popup)

    Using that, hitting F12 will popup a *Popup Shell* buffer, initialized in the directory containing the file your are currently editing. If the current buffer is not associated with a file, then the shell’s cwd is your HOME directory.

    Hitting F12 while the popup shell buffer is visible will hide it again. So you can use it as a simple toggle.

    The code ensures that the same shell buffer is reused over and over again (unless you kill it), so that you don’t end up with hundredth of them.

    And here’s a screenshot:


    • 2011-10-13 08:03: Made it work for dired buffers.
    • 2012-02-24 14:41: Use quit-window instead of delete-window.

    defmacro! revisited

    Posted: September 28, 2011 in Clojure
    Tags: ,

    In my last post, I’ve introduced the defmacro! macro, which is just like defmacro, except that it guarantees that all of the arguments are evaluated once only.

    However, in contrast to Doug Hoyte’s defmacro! he introduced in Let over Lambda, my macro expanded into a normal defmacro form that expanded into a form where all args were evaluated exactly once.

    Clearly, this was totally flawed, because in, say, new control structures, you may want to have some argument evaluated never.

    So here’s a better version which allows for better control about evaluation. All args with trailing ! (BANG, in Clojure speak) will be evaluated exactly once, and the rest of the args stays under the programmer’s control (note that this version also takes a mandatory docstring):

    (defn bang-symbol?
      "Returns true, if sym is a symbol with name ending in a exclamation
      mark (bang)."
      (and (symbol? sym)
           (= (last (name sym)) \!)))
    (defmacro defmacro!
      "Defines a macro name with the given docstring, args, and body.
      All args ending in an exclamation mark (!, bang) will be evaluated only once
      in the expansion, even if they are unquoted at several places in body.  This
      is especially important for args whose evaluation has side-effecs or who are
      expensive to evaluate."
      [name docstring args & body]
      (let [bang-syms (filter bang-symbol? args)
            rep-map (apply hash-map
                           (mapcat (fn [s] [s `(quote ~(gensym))])
        `(defmacro ~name
           `(let ~~(vec (mapcat (fn [[s t]] [t s]) rep-map))
              ~(clojure.walk/postwalk-replace ~rep-map ~@body)))))

    Using that, we can now easily implement the numeric if, nif, you can find in On Lisp and Let over Lambda:

    (defmacro! nif
      "Numeric if: evals test! (only once) and executes either pos, zero, or neg
      depending on the result."
      [test! pos zero neg]
        (pos? ~test!)  ~pos
        (zero? ~test!) ~zero
        :else          ~neg))

    When evaluating (nif 1 (println "pos") (println "zero") (println "neg")), now there’s only “pos” printed. With the previous defmacro! version, “pos”, “zero”, and “neg” were printed.

    UPDATE: Stefan Kamphausen noticed that defmacro! doesn’t work as intended if destructuring is done in the argument list. So here’s yet another version that flattens the argument list when collecting the bang-symbols.

    (defmacro defmacro!
      "Defines a macro name with the given docstring, args, and body.
      All args ending in an exclamation mark (!, bang) will be evaluated only once
      in the expansion, even if they are unquoted at several places in body.  This
      is especially important for args whose evaluation has side-effecs or who are
      expensive to evaluate."
      [name docstring args & body]
      (let [bang-syms (filter bang-symbol? (flatten args)) ;; <==
            rep-map (apply hash-map
                           (mapcat (fn [s] [s `(quote ~(gensym))])
        `(defmacro ~name
           `(let ~~(vec (mapcat (fn [[s t]] [t s]) rep-map))
              ~(clojure.walk/postwalk-replace ~rep-map ~@body)))))

    Using that, you can define a strange nif variant that wants a vector, where the first entry is a vector containing the test, and the second entry is a vector of the pos, zero, neg entries.

    (defmacro! strange-nif
      "Like nif, but with strange destructuring"
      [[[test!] [pos zero neg]]]
        (pos? ~test!)  ~pos
        (zero? ~test!) ~zero
        :else          ~neg))
    ;; Trying it...
    user> (strange-nif [[1] [:pos :zero :neg]])
    user> (macroexpand '(strange-nif [[1] [:pos :zero :neg]]))
    (let [G__1974 1]
         (cond (pos? G__1974) :pos (zero? G__1974) :zero :else :neg))