↑ Best viewed this side up ↑

Scala Stream Hygiene III: Scalaz EphemeralStream Fills Quite A Canyon

August 11th, 2014

The source code for this post is available on GitHub.

The following is the third part in a four-part series. Part I listed the coding rules that help you avoid memory leaks when using the standard Scala Stream class. Part II demonstrated that an optimizing JIT compiler and precise GC render most of those rules superfluous, and argued that circumventing Stream memoization in production code is dangerous. This part will discuss a readily available third-party alternative that enables representing potentially infinite data structures without the risk of leaking memory:


Scalaz is an open-source Scala library that implements type classes and pure functional data structures. In particular, it provides a class (a trait, to be more precise) EphemeralStream, described as follows:

Like scala.collection.immutable.Stream, but doesn’t save computed values. As such, it can be used to represent similar things, but without the space leak problem frequently encountered using that type.

First of all, don’t let the scarcity of the method list in EphemeralStream documentation mislead you: an EphemeralStream can be implicitly converted to an Iterable, so most methods of the latter are in fact available (but not all of them, more on that below).

The "does not save computed values" part is somewhat misleading too. An EphemeralStream cell actually caches both the value and the next cell reference (once they get computed) using Java weak references.

Objects that only have weak references to them get garbage collected on first try. Therefore you can safely store an EphemeralStream in a val and pattern match on it as you please:

def tailAvg(xs: EphemeralStream[Int]): Option[Int] = {
  xs match {
    case y ##:: ys => Some(ys.sum / ys.length)
    case _ => None

Unfortunately, EphemeralStream suffers from numerous issues hindering its practical use:

  • It implements memoization using Java WeakReferences wrapped in Scala Options wrapped in closures, which all add memory overheads, so the GC gets invoked more frequently compared to the technique described in Part I. For instance, an EphemeralStream[Int] needs exactly twice as much memory as a Stream[Int] on the 32-bit Java HotSpot Server VM.

  • Because of all that wrapping, it is also way slower than a standard Stream even if each element is only accessed once. Depending on the number of elements, the EphemeralStream.length method, which does not access elements at all, was from three to six times slower than Stream.length in my tests. Computing sum of an EphemeralStream[Int] took approximately 3x more time compared to a Stream[Int].

  • It is poorly documented and comes with no usage examples. In fact, the only comment in its source is the (misleading) scaladoc comment quoted above in its entirety. Looks like an auxiliary class to me.

  • The scalaz-core jar is 9MB in size, which is a bit too big an overhead if you only need a single class. And it is not easy to extract EphemeralStream for standalone use, because it depends on several other Scalaz classes and traits.

There are also quite a few minor incompatibilities that prevent using scalaz.EphemeralStream as a drop-in replacement for the standard Scala Stream:

  • There is no empty EphemeralStream[Nothing] object, so you cannot match against a pattern similar to x #:: Stream.Empty. Workaround:

    case x ##:: xs if xs.isEmpty => ...
  • The fold methods have different signatures, with curried functions:

    def foldLeft[B](z: => B)(f: (=> B) => (=> A) => B): B
    def foldRight[B](z: => B)(f: (=> B) => (=> A) => B): B

    which means they won’t accept a shorthand such as _ + _ as the second parameter. Instead, you have to write:

    xs.foldLeft(0)(x => y => x + y)
  • EphemeralStream does not implement convenience methods from and continually.

  • As of Scala 2.10, the scalac compiler seems to be unable to infer that an implicit conversion of an EphemeralStream[Int] to an Iterable[Int] yields an Iterable[Numeric], and that Ordering is defined, so methods such as sum and min are not available either.

If scalaz.EphemeralStream is not the solution, the only option left is to roll out our own non-leaky stream class. In Part IV we’ll try to do just that. Stay tuned!

Scala Stream Hygiene II: HotSpot Kicks In

July 25th, 2014

The source code for this post is available on GitHub.

Part I discussed the Scala Stream class usage rules that help you avoid memory leaks. I will list them below for your convenience:

  1. Define streams using def and never store them in vals.

  2. Consume streams in tail-recursive functions.

  3. Pass streams around via by-name parameters.

    Corollary: When defining stream-consuming functions in traits, wrap them in methods accepting streams as by-name parameters.

  4. Do not pattern match against streams outside the consuming functions.

  5. Only call the eagerly evaluated Stream methods that are marked as "optimized for GC".

It turns out, however, that rules 2 to 5 are superfluous in the presence of a precise garbage collector.

Consider rule #2:

  1. Consume streams in tail-recursive functions.

Let’s decompile the example from Part I again:

def sum(xs: Stream[Int], z: Int = 0): Int = 
  if (xs.isEmpty) z else sum(xs.tail, z + xs.head)

Decompiler output:

public int sum(Stream<Object> xs, int z) {
  for (;;) {
    if (xs.isEmpty()) return z;
    z += BoxesRunTime.unboxToInt(xs.head());
    xs = (Stream)xs.tail();

The xs parameter gets overwritten at the end of each loop iteration with the remainder of the original stream.

However, it turns out that it was perfectly possible to consume a stream in an imperative loop in the first place. Remember that, unlike in Java, in Scala function parameters are essentially vals and hence cannot be reused. So a local var has to be introduced:

def sum(xs: Stream[Int]): Int = {
  var scan = xs
  var res = 0
  while (!scan.isEmpty) {
    res += scan.head
    scan = scan.tail

The xs parameter cannot be changed inside the function and therefore should hold a reference to the original stream, but, somehow, it does not?!

What happens here is that the JVM detects that the xs parameter is not used after the initial assignment to scan and therefore does not consider it being a GC root after that assignment.

Wait a minute.

Here is the example from Rule #3:

def sum(xs: Stream[Int]): Int = {
  def loop(acc: Int, xs: Stream[Int]): Int =
    if (xs.isEmpty) acc else loop(acc+xs.head, xs.tail)
  loop(0, xs)

Here, the xs parameter of sum is also not used after the call of loop. Why does the JVM think otherwise? How is this different from the imperative implementation of sum above?

Let’s do a small experiment in the REPL:

scala> def ones = Stream.continually(1)
ones: scala.collection.immutable.Stream[Int]

scala> println((ones take 100000000).sum)
java.lang.OutOfMemoryError: Java heap space
        at scala.collection.immutable.Stream$.continually(Stream.scala:1129)
   .  .  .
        at scala.collection.immutable.Stream.foldLeft(Stream.scala:563)
        at scala.collection.TraversableOnce$class.sum(TraversableOnce.scala:203)
   .  .  .

scala> for (i <- 1 to 10000) {
     |   (ones take 10).sum
     | }

scala> println((ones take 100000000).sum)


The first call to (ones take 100000000).sum threw an OOM error, just as anyone who’ve read the first part of this series would have expected, but the second one magically worked!

What’s going on here?

As you may see, Stream mixes in the sum implementation from the TraversableOnce trait, where it is defined as:

def sum[B >: A](implicit num: Numeric[B]): B = foldLeft(num.zero)(num.plus)

The difference is that sum got JIT-compiled in between of the two println calls! It is the HotSpot compiler that is capable of calculating the life time of variables and parameters. In this particular case, it determines that sum‘s receiver is not used after the foldLeft call.

The imperative version of sum contains a loop, so after some iterations the JVM considered it a "hot spot" and the JIT compiler kicked in. But even if a function itself does not contain a loop, applying it many times also triggers its JIT compilation. In the REPL session shown above, sum gets applied to ten-element streams 10,000 times, which happens to be the default threshold for the HotSpot Server VM (for the Client VM it is just 1,500).

That is the "magic" that causes sum to stop leaking memory. And of course, it would not have leaked memory at all if JIT compilation was forced using the HotSpot -Xcomp option, or if it was run on a JVM with a precise GC and no interpreter at all.

In fact, all "faulty" tests for Rules #2-5 pass on HotSpot with -Xcomp.

Which means that defining stream-consuming functions in traits makes no difference if the forwarders get JIT-compiled.

And also that the non-specialized TraversableOnce methods do not actually leak memory, but it takes an optimizing compiler working in collaboration with a precise GC to recognize that.

As far as pattern matching is concerned, you still have to make sure that pattern variables are not used after the call of a stream-consuming function. As you saw in Part I, those variables are implicit vals, and Rule #1 holds the sophisticatedness of the underlying JVM notwithstanding.

For instance, the following function leaks memory regardless of whether -Xcomp is present:

def tailAvg(xs: Stream[Int]): Option[Int] = {
  xs match {
    case Stream.Empty => None
    case y #:: Stream.Empty => None
    case y #:: ys => Some(ys.sum / ys.length)

Square Peg, Round Hole

As I dug through the peculiarities of Scala implementation and observed their interference with HotSpot optimizations, it has grown on me that using "infinite" Scala Streams in production code is an inherently bad idea. After all, Stream is memoizing by design; it is designed to be a lazy equivalent of List, and we’ve been trying to circumvent the intent of its authors!

That said, using the standard Stream class to illustrate the concept of potentially infinite data structures in the context of an academic exercise is probably fine. All code is under your total control, and usually there is not that much code, so sticking to The Rules and/or enforcing JIT compilation is not hard. But the teachers better warn their students against applying this particular knowledge in production, because:

  • You would normally want your production code to be JVM-agnostic, especially if you are creating a library or framework that other people will use in arbitrary contexts and environments.

  • Without tool support, enforcing any sophisticated coding rules throughout the lifetime of a project larger than a student assignment is next to impossible.

  • The authors of third-party libraries and legacy code are likely to be unaware of these rules.

For instance, consider the following scenario: suppose your code, or a third-party library you are using, breaks one of the "optional" rules, but all your load tests trigger JIT compilation of the respective classes, one way or the other. Effectively, you will be shipping an app with a latent memory leak, isolating which may be quite tricky.

So, if Stream is not the solution, what are the alternatives?

There are two options that I am aware of, and I will consider them in Parts III and IV. Stay tuned!

Update 11-Aug-2014: Part III is available.

Scala Stream Hygiene I: Avoiding Memory Leaks

July 19th, 2014

The source code for this post is available on GitHub.

Update 25-Jul-2014: Part II is out with surprise findings — don’t miss it!

Update 11-Aug-2014: Part III is available too.

Lazy evaluation, also known as call-by-need, is commonly found in functional languages. Some of them go as far as to make it the default evaluation strategy; perhaps the most prominent example is Haskell. Language authors however seem to prefer eager (strict) evaluation, whether because it results in better performance in the majority of practical use cases, or because it plays better with the imperative features of their languages, such as I/O and exceptions, or because the authors find it easier to implement. So they add a number of features to the language and the standard library that enable the developers to use lazy evaluation if they really want to.

In Scala, the language features are the lazy modifiers for vals and by-name function parameters. And in the standard library, amongs others, there is the class Stream (scala.collection.immutable.Stream). It was the subject of my recent studies, the results of which I share in this series.


Lazy evaluation often goes hand-in-hand with memoization. Without memoization, many programs implemented using lazy evaluation would exhibit terrible performance characteristics. Stream implements memoization and hence can be reasoned about as a List of elements computed on-demand.

However, there are also scenarios in which memoization is highly undesirable. One benefit of lazy evaluation is the ability to define potentially infinite data structures. In case of streams, these can be a stream of natural numbers or a stream of data packets coming in from the network. Problem is, all computers commercially available today only support a finite amount of memory, so memoization of such data structures is very capable of making your program throw an OutOfMemoryError.

When you either have no need to use stream elements more then once, as in the network packet filtering scenario, or stream elements can be re-computed very cheaply (natural numbers), you better ensure that they do not get memoized.

Below are the rules that will help you avoid memoization of Scala streams. I’ve collected them from various sources and confirmed by compiliing and decompiling test programs. If you know of any other techniques or edge cases, please post in the comments.

Rules for Avoiding Stream Memoization

  1. Define streams using def and never store them in vals.

    This should be obvious, because val ensures that memoizaion occurs – see Stream scaladoc – but obvious things are often worth stating explicitly.

  2. Consume streams in tail-recursive functions.

    Again, this is rather obvious – if the consuming function is recursive, but not tail-recursive, a reference to the original stream will remain on the call stack until the recursion completes, effectively holding the entire stream in memory. (Not to mention that such a function would likely throw a StackOverflowError when there is still plenty of memory available on the heap.)

    How the tail-recursive functions manage to avoid the OOM? Let’s decompile an example:

    def sum(xs: Stream[Int], z: Int = 0): Int = 
      if (xs.isEmpty) z else sum(xs.tail, z + xs.head)

    Here is what the decompiler produces:

    public int sum(Stream<Object> xs, int z) {
      for (;;) {
        if (xs.isEmpty()) return z;
        z += BoxesRunTime.unboxToInt(xs.head());
        xs = (Stream)xs.tail();

    Notice that the xs parameter is reused. It gets overwritten on each loop iteration, so it always holds a reference to the not-yet processed remainder of the original stream.

  3. Pass streams around via by-name parameters. (Make sure to read the corollary below.)

    Sometimes you need to pass a stream trough intermediate functions before its consumption, but that would leave references to the stream on the call stack.

    Typical example:

    def sum(xs: Stream[Int]): Int = {
      def loop(acc: Int, xs: Stream[Int]): Int =
        if (xs.isEmpty) acc else loop(acc + xs.head, xs.tail)
      loop(0, xs)

    Although the inner function loop is tail-recursive, the sum function that calls it will hold a reference to the head of the stream in its parameter.

    The advice commonly found on the Net is to pass the stream around in a "container", such as a single-element array or an AtomicReference, and nullify its contents in the consuming function. But this results in awkward-looking, impure code. I am not sure why the built-in language feature that achieves the same effect gets overlooked.

    In the above example, if you make xs a by-name parameter to sum, what gets actually passed is a function, computed right before the call to loop, so its result does not hold the entire stream:

    def sum(xs: => Stream[Int]): Int = {
       .  .  .

    As you may have noticed when reading about Rule #2, you could also get rid of the outer function altogether using a default parameter value:

    def sum(xs: Stream[Int], z: Int = 0): Int = 
      if (xs.isEmpty) z else sum(xs.tail, z + xs.head)

    but that is not always possible.

    Corollary: When defining stream-consuming functions in traits, wrap them in functions accepting streams as by-name parameters.

    This one is subtle, and I would say the most unfortunate, because you have no control over the root cause of this restriction. The root cause is that trait methods are not called directly, but via a forwarder method generated by the compiler, even if the caller is a member of the same trait. The forwarder method will hold a reference to the entire stream, that is, unless the stream is passed as a by-name parameter.


    trait StreamConsumers {
      final def sum(xs: Stream[Int], z: Int = 0): Int = {
        if (xs.isEmpty) z else sum(xs.tail, z + xs.head)
      def sumByName(xs: => Stream[Int]): Int = {
        @tailrec def loop(acc: Int, xs: Stream[Int]): Int =
          if (xs.isEmpty) acc else loop(acc+xs.head, xs.tail)
        loop(0, xs)
       .  .  .
    object Main extends StreamConsumers {
       .  .  .

    And here are the forwarders in the decompiled code of Main:

    public final class Main$
      implements StreamConsumers
      public static final  MODULE$;
      public final int sum(Stream<Object> xs, int z) {
        return StreamConsumers.class.sum(this, xs, z);
      public int sumByName(Function0<Stream<Object>> xs) {
        return StreamConsumers.class.sumByName(this, xs);
       .  .  .

    You may also notice that enclosing a tail-recursive function in a wrapper method relieves you from the need to declare that method as final.

  4. Do not pattern match against streams outside the consuming functions.

    It is perfectly okay to use pattern matching inside a tail-recursive function that consumes the stream:

    def sumPatMatInner(xs: => Stream[Int]): Int = {
      def loop(acc: Int, xs: Stream[Int]): Int =
        xs match {
          case Stream.Empty => acc
          case y #:: ys => loop(acc + y, ys)
      loop(0, xs)

    Hence a pattern matching addict might write something like this:

    def sumPatMat(xs: => Stream[Int]): Int = {
      def loop(acc: Int, xs: Stream[Int]): Int =
        xs match {
          case Stream.Empty => acc
          case y #:: ys => loop(acc + y, ys)
      xs match {
        case Stream.Empty => 0
        case x #:: Stream.Empty => x
        case y #:: ys => loop(y, ys)

    Why can this lead to an OOM? Let’s consider a simpler example:

    createStream match {
      case x #:: xs => consumeStream(x, xs)
      case _ => println("No data to process")

    As of Scala 2.10, this code is an exact equivalent of the following:

    val foo: Option[(A, Stream[A])] = Stream.#::.unapply(createStream)
    if (foo.isEmpty) println("No data to process")
    else {val x = foo.get._1; val xs = foo.get._2; consumeStream(x, xs)}

    where A is the type of stream elements and foo is a unique name not used anywhere else.

    As you may see, if the first pattern matches, val xs will hold a reference to the tail of the stream returned by createStream. In fact, the temporary val foo will contain a reference to the entire stream.

    StackOverflow user Daniel Martin described a solution for safely matching on stream tail, which I think is a nice demostration of Scala implicits, but otherwise an overkill, so I won’t reproduce it here.

  5. Only call the eagerly evaluated Stream methods that are marked as "optimized for GC". The methods foreach, foldLeft, and reduceLeft have been specialized for the class Stream. length is also "GC-safe". (Of course, they all loop forever if the receiver is an infinite stream.)

    However, the method /: was left with the default implementation from scala.collections.TraversableOnce, which simply calls foldLeft, effectively holding the reference to the receiver on its stack frame:

    def /:[B](z: B)(op: (B, A) => B): B = foldLeft(z)(op)

    This also applies to methods forall, exists, find, max, min, sum, product, and possibly others.

Now I have some news for you. Under certain circumstances, your program can get away with breaking Rules 2 to 5. Hop over to Part II for details.

Updated AddFLACs to Extract Track Metadata From Pathnames

May 17th, 2014

I have updated AddFLACs, my iTunes for Windows automation script that I wrote back in 2011. It helps you import FLAC files into iTunes in Apple Lossless format.

About a week ago, a reader complained in the comments to the original post:

does it still skip over FLAC with no meteadata. (None of my FLAC have metadata, i always go by filename).

Fact is, I already had half-baked code for extracting metadata from pathnames using regular expressions, so today I integrated that code and published AddFLACs 1.0.

Now, for instance, if your untagged FLACs collection is hierarchically organized by artist and then by album, you can import it into iTunes as follows:

AddFLACs -r ".*\\(.*)\\(.*)\\(.*)\.flac$" ^
  -artist "$1" -album "$2" -name "$3"

If file names begin with track numbers followed by a dash, you can extract track numbers too:

AddFLACs -r ".*\\(.*)\\(.*)\\(\d+)-(.*)\.flac$" ^
  -artist "$1" -album "$2" -tracknumber "$3" -name "$4"

Even if you are not a master of regular expressions, with some tooling you can go way further:

AddFLACs ^
  -r ".*\\(.*)\\(.*)\\(.*?)( \((\d{4})\))?\\(\d+)-(.*)\.flac$" ^
  -genre $1 -artist "$2" -album "$3" -year "$5" ^
  -tracknumber "$6" -name "$7"

I would recommend you to run it with --dry-run (-n) option first, to verify that all files match and metadata gets extracted as intended.

Refer to AddFLACs documentation for details.


I have used Regular Expressions 101, a terrific online regular expression tester and debugger, to debug the last example.

Retrieving a Web Site for Offline View Using Wget

February 23rd, 2013

I’ve volunteered to do a few minor CSS tweaks on a third-party Web site, implemented in Java and hosted on Heroku. Given the tiny scope of my task, figuring out how to build the entire thing and run it on a staging instance or locally would have been an overkill. So I’ve sought a way to create a static local mirror of the site. That turned out to be less straightforward than running wget --mirror Home-Page-URL.

First, the Web site in question has its stylesheets and other static files served from a CDN (content distribution network.) It also relies on third-party services for Web fonts, video streaming, and chat.

Second, it has some really big downloads. Fortunately, they are served from a subdomain.

Third, it has a blog section, also in a subdomain, which uses a completely different stylesheet that I did not need to touch.

To cut the long story short, here is the wget command line that worked for me:

wget --mirror \
  --page-requisites \
  --convert-links \
  --span-hosts \
  --domains domain-list \
  --reject pattern-list \

and here is the explanation:

Enable infinite recursion and time-stamping.
Also download files required to view the web page: images, stylesheets and so on.
Edit links in the downloaded documents so as to enable offline viewing. This includes links to page requisites. As a result, links to the also-downloaded files point to local copies, all other links get replaced with complete URLs.
Permit recursion and retrieval of page requisites to span across hosts. (Use with caution or you’d download the entire Internet.)
--domains domain-list
Restrict the list of domains to download files from. In my case, those were the “www” subdomain of the Web site being mirrored and the domain of the CDN serving its static files.

Example: --domains www.example.com,somecdn.net

--reject pattern-list
Do not mirror certain files. list is a comma-separated list of file name suffixes or patterns.

Example: --reject mp3,ogg

KB2756872 Windows 8 Update Fails to Install? Remove Realtek Audio Drivers

December 22nd, 2012

My notebook, just upgraded to Windows 8, has spent quite some time trying to install the KB2756872 update and rolling back after failures.

For the record, I have run delmigprov.exe accompanying the standalone download of that update, but that alone did not help.

The solution in my case was the removal of Realtek audio drivers. (It is interesting that audio has continued to function, so now I am not sure whether I need those drivers at all.)

It’s Been a Long Time Since I Had To Patch a Binary Executable…

December 11th, 2012

Needed to push a file created on a notebook the other night to a Git repo on a desktop, both running Windows and connected to my home router. Thought the git protocol would work. It did not due to a bug in msysgit. (TL/DR: the bug has been open since March 2010, but nobody so far has volunteered to find the root cause and remedy, or sponsor such an effort. The only sensible workaround is to recompile msysgit from source with the side-band-64k protocol capability disabled, as the older side-band does not exhibit the problem, but the newer, faster alternative always takes precedence if both client and server support it.)

Followed the advice from Eli Billauer’s blog and patched git.exe on my desktop, which plays the role of a “central” Git server.

Here is a patch script that worked for me. It requires Gsar for Windows from the GnuWin32 collection; CygWin likely includes gsar too.

@echo off
copy /-y git.exe git.exe~
if errorlevel 1 goto copyfailed
gsar -o -sside-band-64k -rKW6YzEZbBv584 git.exe
if errorlevel 1 goto patchfailed
echo git.exe successfully patched
goto quit
echo Could not create a backup copy of git.exe
goto quit
echo Could not patch git.exe with GSAR
goto quit

The value of the -r option is just a random 13-character string generated by DuckDuckGo using the query password 13. You may wish to use different values on each Windows machine you may be pushing to using the git protocol.

Outlook Macro to Nicely Format Skype Chat Excerpts

December 1st, 2012

My day job involves a lot of communication, mostly via email and Skype IM. From time to time, I need to file an important excerpt from a Skype chat for later retrieval, or email it to a customer, partner, or colleague.

For years, I would have select that excerpt, copied it to the clipboard and pasted into a new Exchange Mail or Post item.

However, what got pasted was unformatted plain text, way harder to read than the original chat displayed in Skype:

Raw paste

I used to format the lengthier excerpts manually, out of respect to the recipients and/or future readers. Tedious work.

Earlier this year, I had proposed to celebrate our company’s 13th anniversary with a hackathon. Excelsior Hack Day I was a success, and I used it as a chance to take one bit off the routine part of my work.

My solution

Skype IM Pretty Printer is a VBA Macro for Microsoft Outlook that takes a Skype chat from the clipboard, formats it nicely and pastes into a new HTML message:

Paste using Skype IM Pretty Printer

If you want to give Skype IM Pretty Printer a shot, I have open sourced it under the MIT/X11 license. You can fork it on GitHub or visit the official page for download and installation instructions.

Running Online Python Tutor in a Local Linux VM

October 29th, 2012

Online Python Tutor (OPT) enables first-year CS students to watch the nicely visualized execution of their Python programs step-by-step.

A fresh edX student very much liked OPT but had two problems with its online nature: sometimes the OPT Web site was not responding, sometimes she had no Internet connection. Fortunately, OPT is open sourced on GitHub, so I was able to set it up on her Windows notebook as follows:

OPT runs on Google App Engine, but there is a local development server in the GAE SDK. I’ve set it up up on top of a small VirtualBox VM running Linux, so as to minimize interference with other software and simplify migration.

  1. Set up or clone a baseline Linux VM. I had a baseline Ubuntu 12.04 LTS disk image already, so just followed my own VirtualBox VM cloning recipe.
  2. In the meantime, download the Linux version of the Google App Engine SDK for Python:

    wget -c http://googleappengine.googlecode.com/files/google_appengine_1.7.3.zip

    (look up the current URL on the SDK download page)

  3. Fetch the latest version of OPT from GitHub:

    wget -c -O online-python-tutor.zip https://github.com/pgbovine/OnlinePythonTutor/zipball/master
  4. It turned out that Ubuntu 12.04 Server has Python installed even in the minimal configuration. I had to install unzip though:

    sudo apt-get install unzip
  5. Unpack both packages. I have chosen to put them under /opt (pun not intended – that is where FHS says you should put optional packages):

    cd /opt
    sudo unzip ~/google_appengine_1.7.3.zip
    sudo unzip ~/online-python-tutor.zip
  6. (optional) Rename the OPT directory:

    sudo mv pgbovine-OnlinePythonTutor-c4880ea online-python-tutor
  7. Try running OPT:

    sudo /opt/google_appengine/dev_appserver.py \
      -a \
      -p 80 \
      --skip_sdk_update_check \

    There will be warnings about the unavailability of some APIs and such, but OPT apparently does not use those, so you may ingore the warnings.

  8. Try connecting to the VM from your browser. You should see the main OPT screen and be able to use it:

    Check that it works, then get back to the VM console/terminal and press Ctrl-C to shutdown the development server.

  9. Finally, make OPT start automatically on boot. On Ubuntu and other Upstart-enabled systems, add a .conf file to /etc/init:

    sudoedit /etc/init/pythontutor.conf

    with the following content (change installation directories if necessary):

    start on runlevel [2345]
    stop on runlevel [!2345]
    expect fork
    exec /opt/google_appengine/dev_appserver.py \
      --skip_sdk_update_check  \
      -a -p 80 \
      /opt/online-python-tutor/v3 &
  10. Start the pythontutor job:

    sudo start pythontutor

    If this time you cannot connect to OPT from your browser, look for clues in /var/log/upstart/pythontutor.

Now that everything is working, you may wish to reduce the amount of RAM allocated to the VM. 128MB is more than enough to run a copy of the OPT just for the user connecting from the host, but watch memory use if you install e.g. a shared copy for your class or something.

Running programs on Linux boot up

September 29th, 2012

The other day I needed to configure a Linux VM to run a few programs at system startup. It turned out that there is no single way to accomplish that that would work across all major Linux distros and Unix flavors.

Read the rest of this entry »