Overcoming Fear: November 2011

Trampoline and Memoize

Goal

Groovy 1.8 introduces two new closure functions. Memoization is the automatic caching of closure results. Trampolining permits a form of declarative tail-call optimization. This article introduces the two concepts and demonstrates how to combine them in order to create cached, tail-recursive closures.

tl;dr

The example code from this article is available on github.

Simple Memoization

Creating a closure that caches the result of some calculation is as easy as appending .memoize() or one of the alternate memoize...(...) methods that allows more fine-grained control over cache sizes to a closure declaration. One benefit of memoization includes the caching of results of long-running calculations that have no side effects. Memory leaks are a potential pitfall, which is why a maximum cache size should generally be prefered.
The specification below contains a closure with a side-effect. This side effect happens just once, despite the closure being invoked twice.

SimpleMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class SimpleMemoizationSpec extends spock.lang.Specification {

  int count

  def identity = {
    count++
    it
  }.memoize()

  def "each call should be cached"() {
    when:
    def first = identity 0
    def second = identity 0

    then:
    count == 1
    first == second
    second == 0
  }
}

Recursive Memoization

Suppose we want to memoize the results of a recursive closure call. For example, we can unroll the call tree of this naive implementation of the fibonacci function.

  def fib = { n ->
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }

The call trace for the fourth fibonacci number looks like this.

                       ___________fib 4___________
                      /                           \
               fib 3 + fib 2                 fib 2 + fib 1
             /               \                 /
      fib 2 + fib 1     fib 2 + fib 1     fib 1 + fib 0
       /
fib 1 + fib 0

NB: The closure here is entered nine times but we can see that it only needs to be entered 5 times because 4 calls are repeated. If we cache the results of the fibonacci calls, the complexity of even a naive implementation such as this will increase linearly, rather than exponentially.

Unfortunately, because, in Groovy 1.8, a closure cannot invoke another closure directly, memoizing this closure is not entirely straightforward. The example directly below does not work.

RecursiveMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class RecursiveMemoizationSpec extends spock.lang.Specification {

  int count

  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }.memoize()

  def "calls should be cached"() {
    when:
    def actual = fib 10

    then:
    actual == 55
    count == 11
  }
}

The stack trace when a closure calls itself.

groovy.lang.MissingMethodException: No signature of method: org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.doCall() is applicable for argument types: (java.lang.Integer) values: [9]
Possible solutions: call(), call([Ljava.lang.Object;), call(java.lang.Object), call([Ljava.lang.Object;), findAll(), equals(java.lang.Object)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.$spock_initializeFields_closure1(RecursiveMemoizationSpec.groovy:11)
  at groovy.lang.Closure.call(Closure.java:410)
  at groovy.lang.Closure.call(Closure.java:423)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.calls should be cached(RecursiveMemoizationSpec.groovy:16)

Since methods can invoke memoized closures, the solution is to invoke the call method on the closure.

RecursiveMemoizationSpec.groovy

class RecursiveMemoizationSpec extends spock.lang.Specification {
....
  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib.call(n-1) + fib.call(n-2)
  }.memoize()
....
}

NB: The un-memoized version enters the closure 177 times, but the memoized version enters just 11.

Trampoline

Declarative tail-call optimization is as simple as adding .trampoline() to a closure declaration and ensuring that recursive calls to the closure invoke the trampoline method on the closure instead of the closure itself. Some problems are more clearly solved with recursive solutions, but without automatic tail-call optimization in the JVM, recursion can quickly lead to an explosion in the size of the call stack. Trampolining is one work-around for this design tradeoff (or defect).
A tail-recursive fibonacci closure:

  def fib = { n, a = 0, b = 1 ->
    if(n == 0) a
    else fib n - 1, b, a + b
  }

By tracing the call stack, we can see its linear growth without memoization.

fib 4
  |
fib 3, 1, 1
  |
fib 2, 1, 2
  |
fib 1, 2, 3
  |
fib 0, 3, 5

In order to avoid a java.lang.StackOverflowError for sufficiently large inputs, the tail-recursive closure must be explicitly trampolined.
TrampolineSpec.groovy

package timezra.groovy.trampoline_memoize

class TrampolineSpec extends spock.lang.Specification {

  int count

  def fib = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.trampoline()

  def "tail calls chould be optimized"() {
    when:
    def actual = fib 1000

    then:
    actual == 1556111435
    count == 1001
  }
}

In this example, the trampolined closure is called 1001 times. If a method in Java were to call itself 1001 times, the stack would be overwhelmed. By trampolining, Groovy turns this recursion into iteration.

Memoizing a Trampolined Closure

Suppose we want to cache the results of a computationally expensive tail-recursive closure with no side effects. A trampolined closure that calls itself can easily be memoized in a separate closure declaration.

OneTimeTrampolineMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class OneTimeTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib_aux.trampoline n - 1, b, a + b
  }.trampoline()

  def fib = fib_aux.memoize()

  def "top-level trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 1000

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}

NB: This solution caches the top-level trampolined closure, not the results of the intermediate calls.

Trampolining a Memoized Closure

Suppose we want to cache the intermediate results of a trampolined function call. For example, in our trace above, suppose we want to cache the results of fib 4 and fib 3, 1, 1 and fib 2, 1, 2 and fib 1, 2, 3 and fib 0, 3, 5.
Again, this situation is not straightforward because no closure can call a memoized closure directly, so in this case, the trampolined closure cannot call a memoized version of itself directly. Again, we can use the call function on the memoized closure.

FullTrampolineMemoizationSpec.groovy

package timezra.groovy.trampoline_memoize

class FullTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a, b ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.memoize()

  def fib = { n, a = 0, b = 1 ->
    fib_aux.call n, a, b
  }.trampoline()

  def "all trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 500, 315178285, -1898383934

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}

Conclusion

Trampolining and memoization are two powerful new features in Groovy 1.8, but the combination of the two is not always straightforward. This tutorial has presented strategies for combining the two and for working around some of the limitations of their combination.

Overcoming Fear

About Me

Blog Archive

Other XP Bloggers

Thursday, November 10, 2011