Overcoming Fear: groovy

Trampoline and Memoize

Goal

Groovy 1.8 introduces two new closure functions. Memoization is the automatic caching of closure results. Trampolining permits a form of declarative tail-call optimization. This article introduces the two concepts and demonstrates how to combine them in order to create cached, tail-recursive closures.

tl;dr

The example code from this article is available on github.

Simple Memoization

Creating a closure that caches the result of some calculation is as easy as appending .memoize() or one of the alternate memoize...(...) methods that allows more fine-grained control over cache sizes to a closure declaration. One benefit of memoization includes the caching of results of long-running calculations that have no side effects. Memory leaks are a potential pitfall, which is why a maximum cache size should generally be prefered.
The specification below contains a closure with a side-effect. This side effect happens just once, despite the closure being invoked twice.

SimpleMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class SimpleMemoizationSpec extends spock.lang.Specification {

  int count

  def identity = {
    count++
    it
  }.memoize()

  def "each call should be cached"() {
    when:
    def first = identity 0
    def second = identity 0

    then:
    count == 1
    first == second
    second == 0
  }
}

Recursive Memoization

Suppose we want to memoize the results of a recursive closure call. For example, we can unroll the call tree of this naive implementation of the fibonacci function.

  def fib = { n ->
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }

The call trace for the fourth fibonacci number looks like this.

                       ___________fib 4___________
                      /                           \
               fib 3 + fib 2                 fib 2 + fib 1
             /               \                 /
      fib 2 + fib 1     fib 2 + fib 1     fib 1 + fib 0
       /
fib 1 + fib 0

NB: The closure here is entered nine times but we can see that it only needs to be entered 5 times because 4 calls are repeated. If we cache the results of the fibonacci calls, the complexity of even a naive implementation such as this will increase linearly, rather than exponentially.

Unfortunately, because, in Groovy 1.8, a closure cannot invoke another closure directly, memoizing this closure is not entirely straightforward. The example directly below does not work.

RecursiveMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class RecursiveMemoizationSpec extends spock.lang.Specification {

  int count

  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }.memoize()

  def "calls should be cached"() {
    when:
    def actual = fib 10

    then:
    actual == 55
    count == 11
  }
}

The stack trace when a closure calls itself.

groovy.lang.MissingMethodException: No signature of method: org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.doCall() is applicable for argument types: (java.lang.Integer) values: [9]
Possible solutions: call(), call([Ljava.lang.Object;), call(java.lang.Object), call([Ljava.lang.Object;), findAll(), equals(java.lang.Object)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.$spock_initializeFields_closure1(RecursiveMemoizationSpec.groovy:11)
  at groovy.lang.Closure.call(Closure.java:410)
  at groovy.lang.Closure.call(Closure.java:423)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.calls should be cached(RecursiveMemoizationSpec.groovy:16)

Since methods can invoke memoized closures, the solution is to invoke the call method on the closure.

RecursiveMemoizationSpec.groovy

class RecursiveMemoizationSpec extends spock.lang.Specification {
....
  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib.call(n-1) + fib.call(n-2)
  }.memoize()
....
}

NB: The un-memoized version enters the closure 177 times, but the memoized version enters just 11.

Trampoline

Declarative tail-call optimization is as simple as adding .trampoline() to a closure declaration and ensuring that recursive calls to the closure invoke the trampoline method on the closure instead of the closure itself. Some problems are more clearly solved with recursive solutions, but without automatic tail-call optimization in the JVM, recursion can quickly lead to an explosion in the size of the call stack. Trampolining is one work-around for this design tradeoff (or defect).
A tail-recursive fibonacci closure:

  def fib = { n, a = 0, b = 1 ->
    if(n == 0) a
    else fib n - 1, b, a + b
  }

By tracing the call stack, we can see its linear growth without memoization.

fib 4
  |
fib 3, 1, 1
  |
fib 2, 1, 2
  |
fib 1, 2, 3
  |
fib 0, 3, 5

In order to avoid a java.lang.StackOverflowError for sufficiently large inputs, the tail-recursive closure must be explicitly trampolined.
TrampolineSpec.groovy

package timezra.groovy.trampoline_memoize

class TrampolineSpec extends spock.lang.Specification {

  int count

  def fib = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.trampoline()

  def "tail calls chould be optimized"() {
    when:
    def actual = fib 1000

    then:
    actual == 1556111435
    count == 1001
  }
}

In this example, the trampolined closure is called 1001 times. If a method in Java were to call itself 1001 times, the stack would be overwhelmed. By trampolining, Groovy turns this recursion into iteration.

Memoizing a Trampolined Closure

Suppose we want to cache the results of a computationally expensive tail-recursive closure with no side effects. A trampolined closure that calls itself can easily be memoized in a separate closure declaration.

OneTimeTrampolineMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class OneTimeTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib_aux.trampoline n - 1, b, a + b
  }.trampoline()

  def fib = fib_aux.memoize()

  def "top-level trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 1000

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}

NB: This solution caches the top-level trampolined closure, not the results of the intermediate calls.

Trampolining a Memoized Closure

Suppose we want to cache the intermediate results of a trampolined function call. For example, in our trace above, suppose we want to cache the results of fib 4 and fib 3, 1, 1 and fib 2, 1, 2 and fib 1, 2, 3 and fib 0, 3, 5.
Again, this situation is not straightforward because no closure can call a memoized closure directly, so in this case, the trampolined closure cannot call a memoized version of itself directly. Again, we can use the call function on the memoized closure.

FullTrampolineMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class FullTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a, b ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.memoize()

  def fib = { n, a = 0, b = 1 ->
    fib_aux.call n, a, b
  }.trampoline()

  def "all trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 500, 315178285, -1898383934

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}

Conclusion

Trampolining and memoization are two powerful new features in Groovy 1.8, but the combination of the two is not always straightforward. This tutorial has presented strategies for combining the two and for working around some of the limitations of their combination.

Goal

While I generally try to configure, rather than script, Maven builds, sometimes the publicly-available plug-ins do not provide enough flexibility to work-around limitations in third-party libraries through configuration alone. Fortunately, GMaven exposes the flexibility of Groovy in a Maven plug-in. This post demonstrates how to use a Groovy script to transform a Maven project property. Such a transformation is sometimes necessary, for example, for transforming a Windows-style path to Unix. For myself, I formalized this solution while trying to install a jar in PostgreSQL automatically during the artifact deployment phase.

Display a Maven Project Property

We can start by creating a simple m2eclipse project, by adding a single property with a default value and by configuring the pom.xml to display this property on the console during a lifecycle event (here, during compilation).

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>timezra.blog.maven.groovy.properties</groupId>
  <artifactId>timezra.blog.maven.groovy.properties</artifactId>
  <name>timezra.blog.maven.groovy.properties</name>
  <version>0.0.1-SNAPSHOT</version>
  <description>An example of setting Maven properties using Groovy.</description>
  <properties>
    <unixy_build_directory>${project.build.directory}</unixy_build_directory>
  </properties>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-antrun-plugin</artifactId>
        <executions>
          <execution>
            <id>show-unixy_build_directory!</id>
            <phase>compile</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <tasks>
                <echo>unixy_build_directory: ${unixy_build_directory}</echo>
              </tasks>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

If we run the compile goal, we will see the bound Maven property in the build output.

  > mvn compile
  ....
  [INFO] [antrun:run {execution: show-unixy_build_directory!}]
  [INFO] Executing tasks
       [echo] unixy_build_directory: C:\programming\workspaces\blog\timezra.blog.maven.groovy.properties\target
  [INFO] Executed tasks
  ....

Setting up the gmaven-plugin is straightforward, as is re-binding the property with Groovy in the pom.xml.

<project ....>
  ....
  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.groovy.maven</groupId>
        <artifactId>gmaven-plugin</artifactId>
        <executions>
          <execution>
            <id>set-unixy_build_directory!</id>
            <phase>compile</phase>
            <goals>
              <goal>execute</goal>
            </goals>
            <configuration>
              <classpath>
                <element>
                  <groupId>commons-lang</groupId>
                  <artifactId>commons-lang</artifactId>
                  <version>2.4</version>
                 </element>
              </classpath>
              <source>
                if (org.apache.commons.lang.SystemUtils.IS_OS_WINDOWS) {
                  project.properties.unixy_build_directory =
                  project.build.directory.replace("\\", "/");
                }
              </source>
            </configuration>
          </execution>
        </executions>
      </plugin>
      ....
    </plugins>
  </build>
</project>

NB: This example takes a Windows file path and transforms the backslashes to forward slashes. If you are not running Windows, then this example is moot. Please, experiment with the bindings in the pom.xml to demonstrate clearly to yourself that the property has really, really, really been re-bound.

When compiling, we will now see output indicating that the Groovy script has mutated the Maven property.

  > mvn compile
  ....
  [INFO] [antrun:run {execution: show-unixy_build_directory!}]
  [INFO] Executing tasks
       [echo] unixy_build_directory: C:/programming/workspaces/blog/timezra.blog.maven.groovy.properties/target
  [INFO] Executed tasks
  ....

Conclusion

With a few lines of configuration and a simple Groovy script, we are able to modify Maven properties as part of a lifecycle event. This tool adds even more power to your Maven builds, but, as stated above, it should be used cautiously and only when absolutely necessary. If there is a solution already in the Maven toolkit, then that is generally better. Not all project requirements fit in the box, however, and this example exposes a simple way to handle the non-ideal.

Overcoming Fear

About Me

Blog Archive

Other XP Bloggers

Thursday, November 10, 2011

Trampoline and Memoize

Goal

tl;dr

Simple Memoization

Recursive Memoization

Trampoline

Memoizing a Trampolined Closure

Trampolining a Memoized Closure

Conclusion

Tuesday, June 30, 2009

Setting Maven Properties With Groovy

Goal

Display a Maven Project Property

Conclusion