Thursday, November 10, 2011

Trampoline and Memoize

Trampoline and Memoize

Goal


Groovy 1.8 introduces two new closure functions. Memoization is the automatic caching of closure results. Trampolining permits a form of declarative tail-call optimization. This article introduces the two concepts and demonstrates how to combine them in order to create cached, tail-recursive closures.

tl;dr


The example code from this article is available on github.

Simple Memoization


Creating a closure that caches the result of some calculation is as easy as appending .memoize() or one of the alternate memoize...(...) methods that allows more fine-grained control over cache sizes to a closure declaration. One benefit of memoization includes the caching of results of long-running calculations that have no side effects. Memory leaks are a potential pitfall, which is why a maximum cache size should generally be prefered.
The specification below contains a closure with a side-effect. This side effect happens just once, despite the closure being invoked twice.

SimpleMemoizationSpec.groovy
package timezra.groovy.trampoline_memoize

class SimpleMemoizationSpec extends spock.lang.Specification {

  int count

  def identity = {
    count++
    it
  }.memoize()

  def "each call should be cached"() {
    when:
    def first = identity 0
    def second = identity 0

    then:
    count == 1
    first == second
    second == 0
  }
}



Recursive Memoization


Suppose we want to memoize the results of a recursive closure call. For example, we can unroll the call tree of this naive implementation of the fibonacci function.
  def fib = { n ->
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }


The call trace for the fourth fibonacci number looks like this.

                       ___________fib 4___________
                      /                           \
               fib 3 + fib 2                 fib 2 + fib 1
             /               \                 /
      fib 2 + fib 1     fib 2 + fib 1     fib 1 + fib 0
       /
fib 1 + fib 0


NB: The closure here is entered nine times but we can see that it only needs to be entered 5 times because 4 calls are repeated. If we cache the results of the fibonacci calls, the complexity of even a naive implementation such as this will increase linearly, rather than exponentially.

Unfortunately, because, in Groovy 1.8, a closure cannot invoke another closure directly, memoizing this closure is not entirely straightforward. The example directly below does not work.

RecursiveMemoizationSpec.groovy
package timezra.groovy.trampoline_memoize

class RecursiveMemoizationSpec extends spock.lang.Specification {

  int count

  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib(n-1) + fib(n-2)
  }.memoize()

  def "calls should be cached"() {
    when:
    def actual = fib 10

    then:
    actual == 55
    count == 11
  }
}



The stack trace when a closure calls itself.

groovy.lang.MissingMethodException: No signature of method: org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.doCall() is applicable for argument types: (java.lang.Integer) values: [9]
Possible solutions: call(), call([Ljava.lang.Object;), call(java.lang.Object), call([Ljava.lang.Object;), findAll(), equals(java.lang.Object)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.$spock_initializeFields_closure1(RecursiveMemoizationSpec.groovy:11)
  at groovy.lang.Closure.call(Closure.java:410)
  at groovy.lang.Closure.call(Closure.java:423)
  at timezra.groovy.trampoline_memoize.RecursiveMemoizationSpec.calls should be cached(RecursiveMemoizationSpec.groovy:16)



Since methods can invoke memoized closures, the solution is to invoke the call method on the closure.

RecursiveMemoizationSpec.groovy
class RecursiveMemoizationSpec extends spock.lang.Specification {
....
  def fib = { n ->
    count++
    if(n == 0) 0
    else if(n == 1) 1
    else fib.call(n-1) + fib.call(n-2)
  }.memoize()
....
}


NB: The un-memoized version enters the closure 177 times, but the memoized version enters just 11.


Trampoline


Declarative tail-call optimization is as simple as adding .trampoline() to a closure declaration and ensuring that recursive calls to the closure invoke the trampoline method on the closure instead of the closure itself. Some problems are more clearly solved with recursive solutions, but without automatic tail-call optimization in the JVM, recursion can quickly lead to an explosion in the size of the call stack. Trampolining is one work-around for this design tradeoff (or defect).
A tail-recursive fibonacci closure:
  def fib = { n, a = 0, b = 1 ->
    if(n == 0) a
    else fib n - 1, b, a + b
  }


By tracing the call stack, we can see its linear growth without memoization.
fib 4
  |
fib 3, 1, 1
  |
fib 2, 1, 2
  |
fib 1, 2, 3
  |
fib 0, 3, 5



In order to avoid a java.lang.StackOverflowError for sufficiently large inputs, the tail-recursive closure must be explicitly trampolined.
TrampolineSpec.groovy
package timezra.groovy.trampoline_memoize

class TrampolineSpec extends spock.lang.Specification {

  int count

  def fib = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.trampoline()

  def "tail calls chould be optimized"() {
    when:
    def actual = fib 1000

    then:
    actual == 1556111435
    count == 1001
  }
}


In this example, the trampolined closure is called 1001 times. If a method in Java were to call itself 1001 times, the stack would be overwhelmed. By trampolining, Groovy turns this recursion into iteration.

Memoizing a Trampolined Closure


Suppose we want to cache the results of a computationally expensive tail-recursive closure with no side effects. A trampolined closure that calls itself can easily be memoized in a separate closure declaration.

OneTimeTrampolineMemoizationSpec.groovy
package timezra.groovy.trampoline_memoize

class OneTimeTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a = 0, b = 1 ->
    count++
    if(n == 0) a
    else fib_aux.trampoline n - 1, b, a + b
  }.trampoline()

  def fib = fib_aux.memoize()

  def "top-level trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 1000

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}


NB: This solution caches the top-level trampolined closure, not the results of the intermediate calls.

Trampolining a Memoized Closure


Suppose we want to cache the intermediate results of a trampolined function call. For example, in our trace above, suppose we want to cache the results of fib 4 and fib 3, 1, 1 and fib 2, 1, 2 and fib 1, 2, 3 and fib 0, 3, 5.
Again, this situation is not straightforward because no closure can call a memoized closure directly, so in this case, the trampolined closure cannot call a memoized version of itself directly. Again, we can use the call function on the memoized closure.

FullTrampolineMemoizationSpec.groovy
package timezra.groovy.trampoline_memoize

class FullTrampolineMemoizationSpec extends spock.lang.Specification {

  int count

  def fib_aux = { n, a, b ->
    count++
    if(n == 0) a
    else fib.trampoline n - 1, b, a + b
  }.memoize()

  def fib = { n, a = 0, b = 1 ->
    fib_aux.call n, a, b
  }.trampoline()

  def "all trampolined calls should be cached"() {
    when:
    def first = fib 1000
    def second = fib 500, 315178285, -1898383934

    then:
    count == 1001
    first == second
    second == 1556111435
  }
}



Conclusion


Trampolining and memoization are two powerful new features in Groovy 1.8, but the combination of the two is not always straightforward. This tutorial has presented strategies for combining the two and for working around some of the limitations of their combination.

No comments: