Gunzipping files with Clojure

Posted: February 17, 2012 in Clojure, Lisp
Tags: ,

This is just a quicky that might be useful to others, too. The following function unzips the input to the output.

Update: As Ben pointed out, tis will only work correctly for gzipped text files encoded in UTF-8 as input (ASCII, ISO-5589-1 will also be fine).

(ns foobar
  (:require [clojure.java.io :as io]))

(defn gunzip
  [fi fo]
  (with-open [i (io/reader
                 (java.util.zip.GZIPInputStream.
                  (io/input-stream fi)))
              o (java.io.PrintWriter. (io/writer fo))]
    (doseq [l (line-seq i)]
      (.println o l))))
About these ads
Comments
  1. Ben says:

    So, the bytes produced by unzipping fi, will always, reliably, be UTF-8 encoded text?

    (io/reader assumes “UTF-8″ when it’s not told an :encoding.)

    • Tassilo Horn says:

      No, as I’ve written, it just did the job for me. For doing that reliably, one should read() and write() into/from a byte array. Of course, then you cannot use niceties such as line-seq.

      • Ben says:

        True enough, though line-seq is tricky.

        Since line-seq lazy, you’ll need to make sure to consume it only within the dynamic scope of with-open. If the input isn’t huge and you want to actually do something with the lines read from input, you could use slurp and split-lines to make a fully realized sequence of lines, which you can work with at your leisure even after the underlying input has been closed. (see gunzip-text-lines at https://gist.github.com/1858654 (revised)).

  2. Ben says:

    https://gist.github.com/1858654

    - Doesn’t assume the decompressed contents of input will be UTF-8 encoded text
    - Copies input to output without first converting it from bytes to characters and back again.

  3. Anthony Grimes says:

    I’ve been collecting decompression functions and utilities in fs: https://github.com/Raynes/fs

    There is a gunzip function in there. If you’ve got any improvements to those, be sure to shoot me a pull request

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s