Samstag, 7. September 2013

Inner Pattern to mimic Method Overwriting in Go

Note: I wrote a similar blog post "Is Google's Go object-oriented now or not?" that is not specifically about inheritance and Go, but about Go and object-orientedness in general.

Go [1, 2] is a relatively new statically typed programming language developed at Google that compiles to the metal. It feels like a modernized C in the way that is resembles C a lot, but has closures, garbage collection, communicating sequential processes (CSP) [3, 4, 5] and indeed very fast build times (and no exception handling, no templates, but also no macros). Build times in Go are that fast that it is almost instant and feels like working with a dynamically typed scripting language with almost no turnaround times. Performance in Go 1.2 is somewhat in the league with Java [6, 7, 8, 9]. Could be faster, but it is already a lot faster than scripting languages like Ruby, Python, or Lua. Go is also underpowered at the moment as there is room for optimization. From looking at job ads Go seems to appeal most to people in the Python, PHP or Ruby camp. Contrary to Python, Ruby, and Lua, Go does multi-threading well and makes thread synchronization easier through the use of CSP [10].

Motivation

Go relies on delegation, which by Go aficionados is called embedding (see chapter "Inheritance" in [2]). There is no inheritance and hence no method overriding. This means that "f.MoreMagic()" in the last line in the code snippet below (sample shamelessly stolen from [2]) does not print "foo magic" to the console as expected but "base magic":

package main

import "fmt"

type Base struct {}
func (Base) Magic() { fmt.Print("base magic") }
func (self Base) MoreMagic() {
  self.Magic()
}

type Foo struct {
  Base
}
func (Foo) Magic() { fmt.Print("foo magic") }

f := new(Foo)
f.Magic() //=> foo magic
f.MoreMagic() //=> base magic

So is there a way to mimic method overriding in Go at a reasonable cost? Many Go developers would consider only the idea of mimicking it in Go as not in line with the language and beside the point. But method overriding offers a great deal of flexibility being able to redefine default inherited behavior and make an object or algorithm behave as appropriate for its kind in a transparent way. 

Inner Pattern

What first stroke my mind when looking at the Magic/MoreMagic sample was the inner construct [12] in the Beta programming language, which is some kind of opposite super. Applying that idea in this case I came up with a solution, which I named the "inner pattern":

package main

import "fmt"

type Animal interface {   
    Act()

    // Definition of ActInner tells developer to define a function Act with the 
    // body calling ActInner as in Dog.Act() or Cat.Act()
    ActInner(inner Animal)

    makeNoise()
}

type Mamal struct {
    Animal
}

func (self Mamal) makeNoise() {
    fmt.Println("default noise")
}

func (self Mamal) ActMamal() {
    self.makeNoise()
}

func (self Mamal) ActInner(inner Animal) {
    inner.makeNoise()
}

type Dog struct {
    Mamal
}

func (self Dog) makeNoise() {
    fmt.Println("woof! woof!")
}

func (self Dog) Act() {
    self.ActInner(self)
}

type Cat struct {
    Mamal
}

func (self Cat) makeNoise() {
    fmt.Println("meow! meow!")
}

func (self Cat) Act() {
    self.ActInner(self)
}


func main() {
    dog := new(Dog)
    dog.ActMamal() // prints "default noise" but not "woof! woof!"
    dog.Act() // prints "woof! woof!" as expected

    cat := new(Cat)   
    cat.ActMamal() // prints "default noise" but not "meow! meow!"
    cat.Act() // prints "meow! meow!" as expected
}

Note that the function makeNoise has to be public (e.g. MakeNoise) in case structs Cat and Dog are placed in separate packages which for simplicity reasons is not the case in the sample code above. Otherwise, the code would still compile but at runtime always Mamal.makeNoise would be called instead of Cat.makeNoise or Dog.makeNoise depending on the type of the receiver object.

So we get "method overriding" this way at the cost of having to stick to some kind of convention: If there is a method in a struct delegated to that has a parameter named inner like ActInner(inner Animal), we need to add a method Act() in our "subclass" calling ActInner:

func (self Dog) Act() {
    self.ActInner(self)
}

func (self Cat) Act() {
    self.ActInner(self)
}

This solution is not nicely transparent as f.ex. in Java where you would just add a method act() to your subclass that overrides the inherited method act() and that's it. Coming to think of it in C++ you can only override an inherited method if the inherited one is marked as virtual. So in C++ and other languages like Kotlin [13] or Ceylon [14] you also need to "design ahead" and think of whether a method is intended to be overridable. And this solution with actInner(inner Animal) in Go does not even carry the runtime overhead of dynamically dispatched virtual functions.

Also, in case struct Dog or Cat does not implement the function makeNoise along with function ActInner, the function Mamal.makeNoise() will be called at runtime. The Go compiler won't complain about some "subclass" Dog or Cat not implementing the "abstract" method makeNoise as for instance in Java or other OO languages that support abstract classes. There is no way round that as in the end a price has to be paid for not having a performance penalty in Go due to dynamically dispatched method calls compared to OO languages that support method overwriting.

Conclusion

My preliminary conclusion of all this is to use the "inner pattern" in Go as an ex post refactoring measure when code starts to lack "too much" in transparency. To apply it to begin with is too much pain where the gain in the end is uncertain. Otherwise, only apply it ex ante when it is clear from the beginning that the flexibility will be needed anyway as f.ex. with templated algorithms.

By the way, I think Rob Pike is right with what he says about CSP [10]. So I had a look how it can be done in Java. Groovy has it already [15] and for Java there is [16]. When casting CSP into an object-oriented mold you end up with something like actors. The best actor implementation for Java/Scala is probably Akka.


Update: Discussion on Reddit

There has been a discussion going on on Reddit concerning the proposed approach in this blog post. Below my remarks to the various comments made in that discussion.

« The link within the post for "Inner Pattern to mimic Method Overwriting in Go" is a bit odd. The Mamal struct contains an Animal interface but it is never assigned to or used, instead he's passing around an Animal interface explicitly and having to redeclare the Act func for both Dog and Cat. This is a bit cleaner, it uses the previously unused member interface Animal, and only has one Act func at the Mamal level which can dispatch generically to either Dog or Cat. »
 
The main function in the suggested solution ("This is a bit cleaner") looks like this:

func main() {
        dog := new(Dog)
        dog.Animal = dog
        dog.Mamal.makeNoise() // prints "default noise"
        dog.makeNoise()       // prints "woof! woof!"
        dog.Act()
}

The problem with the solution above is that it is not transparent. The developer needs to do some explicit variable switching as in "dog.Animal = dog" which breaks encapsulation. In the approach proposed in this blog post a "subclass" only needs to implement a method Act() that calls ActInner(...) and that's it. Some variable switching from the outside (which breaks encapsulation) is not required and the developer does not need to know that it has to be done (applying information hiding). The is conceptually the idea of method overwriting in OOP. I'm not proficient in C, but I would guess that such a solution with field assignment in records is what has to be resorted to when using procedural languages.

« You are correct about the interfaces, here's an example based on the code in the article: http://play.golang.org/p/de5-18d6aP »

The main function in the suggested solution ("http://play.golang.org/p/de5-18d6aP") looks like this:

          func main() {
                  noise(Dog{})
                  noise(Cat{})
          }

The solution above applies a procedural approach where compared to object-oriented message passing receiver object and method argument are swapped. The receiver object (the object, the message is being sent to) is passed as the parameter to a procedure (aka function), which for OOP would be the wrong way round. The solution in my proposed approach defines an additional indirection to "restore" message passing as in OOP with Dog{} or Cat{} being the receiver objects (e.g. dog.Act(), cat.Act()).

« Yes, he is just showing that two independent structs can have a method with the same name, which isn't really surprising. »

Yes, I agree with that. My intention was only to show that this is supported compared to languages that are modular and not class based such as Modula II. I might be missing further criteria that need to be fulfilled.


References

[1] Effective Go
[2] Google Go Primer
[3] Communicating Sequential Processes by Hoare
[4] Goroutines
[5] Race Detector
[6] Benchmarks Game
[7] Performance of Rust and Dart in Sudoku Solving
[8] Benchmarks Round Two: Parallel Go, Rust, D, Scala
[9] Benchmarking Level Generation in Go, Rust, Haskell, and D
[10] Robert Pike in "Origins of Go Concurrency" (Youtube video, from about position 29:00):

"The thing that is hard to understand until you've tried it is that the whole business about finding deadlocks and things like this doesn't come up very much. If you use mutexes and locks and shared memory it comes up about all the time. And that's because it is just too low level. If you program like this (using channels) deadlocks don't happen very much. And if they do it is very clear why, because you have got this high-level state of your program 'expressing this guy is trying to send here and he can't, because he is here'. It is very very clear as opposed to being down on the mutex and shared memory level where you don't know who owns what and why. So I'm not saying it is not a problem, but it is not harder than any other class of bugs you would have."
[11] Go FAQ Overloading
[12] Super and Inner — Together at Last! (PDF)
[13] Inheritance in Kotlin
[14] Inheritance in Ceylon
[15] CSP in Groovy
[16] Go-style Goroutines in Java and Scala using HawtDispatch

Freitag, 30. August 2013

Implicits in Scala: Conversion on Steroids

Also published on java.dzone.com, see link

With the use of implicits in Scala you can define custom conversions that are applied implicitly by the Scala compiler. Other languages also provide support for conversion, e.g. C++ provides the conversion operator (). Implicits in Scala go beyond what the C++ conversion operator makes possible. At least I don't know of any other language where implicit conversion goes that far as in Scala. Let's have a look at some sample Scala code to demonstrate this:

class Foo { 
    def foo {
        print("foo")
    }
}

class Bar {
    def bar {
        print("bar")
    }
}

object Test
{
    implicit def fooToBarConverter(foo: Foo) = {
        print("before ")
        foo.foo
        print(" after ")
        new Bar
    }       
    
    def main(args: Array[String]) {
        val foo = new Foo
        foo.bar 
    }
}

Running Test.main will print to the console: "before foo after bar". What is happening here? When Test.main is run the method bar is invoked on foo, which is an instance of class Foo. However, there is no such method bar defined in class Foo (and in no superclass). So the compiler looks for any implicit conversion where Foo is converted to some other type. It finds the implicit fooToBarConverter and applies it. Then it tries again to invoke bar, but this time on an instance of class Bar. As class Bar defines some method named bar the problem is resolved and compilation continues. For a more detailed description about implicits compilation rules see this article by Martin Odersky, Lex Spoon, and Bill Venners. Note that no part of the conversion code from Foo to Bar is defined neither in class Foo nor in class Bar. This is what makes Scala implicits so powerful in the given sample (and also a bit dangerous as we shall see in the following).

If we tried to get something similar accomplished in C++ we would end up with something like this (C++ code courtesy to Paavo Helde, some helpful soul on comp.lang.c++):

#include <iostream>

class Foo {
public:
    void foo() {
        std::cout << "foo\n";
    }
};

class Bar {
public:
        Bar(Foo foo) {
                std::cout << "before\n";
                foo.foo();
                std::cout << "after\n";
        }
    void bar() {
        std::cout << "bar\n";
    }
};

void bar(Bar bar) {
    bar.bar();
}

int main() {
      Foo foo;
      bar(foo);
}

There are also "to" conversion operators defined by the syntax like:

class Foo {
            operator Bar() const {return Bar(...);}
};

Note that such conversions are often considered "too automatic" for robust C++ code, and thus commonly the "operator Bar()" style conversion operators are just avoided and the single-argument constructors like Bar(Foo foo) are marked with the 'explicit' keyword so the code must explicitly mention Bar in its invocation, e.g. bar(Bar(foo)).

The C++ code including the comment in the paragraph above is courtesy to Paavo Helde. As it can be seen it is not possible in C++ to achieve the same result as with implicits in Scala: There is no way to move the conversion code completely out of both class Foo and Bar and getting things to compile. So conversion in C++ is less powerful than in Scala on one hand. On the other hand it is also less scary than implicits in Scala where it might get difficult to maintain a large Scala code base over time if implicits are not handled with care.

Looking for a matching implicit to resolve some compilation error can keep the compiler busy if it repeatedly has to look through a large code base. This is also why the compiler only tries the first matching implicit conversion it can find and aborts compilation if applying the found implicit won't resolve the issue. Also, if implicits are overused you can run into a situation where you need to step through your code with the debugger to figure out the conversion that results in a different output being created than expected. This is an issue that made the guys developing Kotlin drop implicits from their Scala-like language (see reference). The problem that you can shoot yourself into your foot when overusing implicits is well known in the Scala community, for instance in "Programming in Scala" [1] it says on page 189: "Implicits can be perilously close to "magic". When used excessively, they obfuscate the code's behavior. (...) In general, implicits can cause mysterious behavior that is hard to debug! (...) Use implicits sparingly and cautiously.".

What remains on the positive side is a powerful language feature that has often proven to be very useful if applied with care. For instance Scala implicits do a great job in glueing together disparate APIs transparently or in achieving genericity. This article only dealt with one specific aspect of implicits. Scala implicits have many other applications, see f.ex. this article by Martin Odersky, Lex Spoon, and Bill Venners.

[1] "Programming in Scala", Dean Wampler & Alex Payne, O'Reilly, September 2009, 1st Edition.

Sonntag, 9. Juni 2013

Go-style Goroutines in Java and Scala using HawtDispatch

Also published on java.dzone.com, see link

In Google Go any method or closure can be run asynchronously when prefixed with the keyword go. According to the documentation "(...) they're called goroutines because the existing terms—threads, coroutines, processes, and so on—convey inaccurate connotations" (see reference). For that reason I also stick to the term goroutine. Goroutines are the recommended method for concurrent programming in Go. They are lightweight and you can easily create thousands of them. To make this efficient goroutines are multiplexed onto multiple OS threads. Networking and concurrency is really what Go is about.

This blog describes how to make use of HawtDispatch to achieve a very similar result in Java. HawtDispatch is a thread pooling and NIO event notification framework, which does the thread multiplexing that in Go is built into the language. There is also a Scala version of HawtDispatch. So the approach described here for Java can be applied in the same way in Scala. The code shown in this blog can be downloaded here from GitHub (includes a maven pom.xml to get HawtDispatch installed). Go provides channels as a means for goroutines to exchange information. We can model channels in Java through JDK5 BlockingQueues.

Let's have a look at some Go code that makes use of goroutines and channels (the sample code is shamelessly stolen from this article, see the chapter named "Channels"):

ch := make(chan int)

go func() {
  result := 0
  for i := 0; i < 100000000; i++ {
    result = result + i
  }
  ch <- result
}()

/* Do something for a while */

sum := <-ch // This will block if the calculation is not done yet
fmt.Println("The sum is: ", sum)
Making use of JDK8 default methods we can define in our Java world something like a keyword go. For that purpose I created one named async (using pre-JDK8 we would have to stick to little less elegant static methods):

public interface AsyncUtils {
    default public void async(Runnable runnable) {
        Dispatch.getGlobalQueue().execute(runnable);
    }
}
The async method will execute Runnables on a random thread of a fixed size thread pool. If you wanted to implement something like actors using HawtDispatch you would use serial dispatch queues. Here is a simplistic actor implemented using HawtDispatch (with queueing being serial through the use of the queue class DispatchQueue):

public class HelloWorldActor {
    private DispatchQueue queue = Dispatch.createQueue()

    public void sayHello() {
        queue.execute(()->{ System.out.println("hello world!"); });
    }
    public static void main(String[] args) {
        HelloWorldActor actor = new HelloWorldActor();
        actor.sayHello(); // asynchronously prints "hello world" 
    }
}
To be precise the HelloWorldActor in the snippet above is more of an active object as functions are scheduled rather than messages as with actors. This little actor sample was shown to demonstrate that you can do much more with HawtDispatch than just running methods asynchronously. Now it is getting time to implement the sample in Go in Java with what we have built up so far. Here we go:


public class GoroutineTest implements AsyncUtils {  
 
    @Test
    public void sumAsync() throws InterruptedException
    {
        BlockingQueue<Integer> channel = new LinkedBlockingQueue<>();

        async(()->
        {
            int result = 0;
            for(int i = 0; i < 100000000; i++) {
                result = result + i;
            }
            channel.add(result);
        });

        /* Do something for a while */
        int sum = channel.take();
        System.out.println("The sum is: " + sum);
    }
    

    @After
    public void tearDown() throws InterruptedException {
        DispatcherConfig.getDefaultDispatcher().shutdown();
    }

}

The code presented here would also work with pre-JDK8 since JDK8 is not a requirement for HawtDispatch. I just preferred to make use of JDK8 lambdas and defender methods to get the sample code more compact. 

Mittwoch, 2. Januar 2013

JDK8 lambdas and anonymous classes

Preview releases of the upcoming JDK8 including the long-awaited lambdas are available for several months meanwhile. Time to have a look at lambdas to see what they are and what you can expect from them.

So today, I downloaded the latest preview release of the JDK8 from jdk8.java.net/lambda to have a look at the upcoming lambdas in JDK8. To my despair, this code snippet did not compile: 

        List<Integer> ints = new ArrayList<>();
        ints.add(1);
        ints.add(2);
        ints.add(3);

        int sum = 0;
        ints.forEach(i -> { sum += i; });
 

The compiler error was: "value used in lambda expression should be effectively final". The compiler complains here that the variable sum had not been declared final. Also see this blog post, that is part of the JDK8 lambda FAQ, which explains the matter (I perpetually insist on having found the issue independently from this post ;-)). So lambdas in JDK8 carry exactly the same restriction as anonymous classes and you have to resort to the same kind of workaround:


int sumArray[] = new int[] { 0 };
ints.forEach(i -> {sumArray[0] += i;}); 

println(sumArray[0]);


This works and prints 6 as expected. Note, that the compiler did not complain here about sumArray not being declared final as it is effectively final: "A variable is effectively final if it is never assigned to after its initialization" (see link). This is a new feature in JDK8 as the code below does not compile with a pre-JDK8 if value is not declared final:


final long[] value = new long[] { 0 };
Runnable runnable = new Runnable() {          
    public void run() {
        value[0] = System.currentTimeMillis();
        System.out.println(value[0]);
    }
};


However, this means that JDK8 lambdas are not true closures since they cannot refer to free variables, which is a requirement for an expression to be a closure:

"When a function refers to a variable defined outside it, it's called a free variable. A function that refers to a free lexical variable is called a closure.". Paul Graham, ANSI Common Lisp, Prentice Hall, 1996, p.107.

The free variable gives the closure expression access to its environment:


"A closure is a combination of a function and an environment.". Paul Graham, ANSI Common Lisp, Prentice Hall, 1996, p.108.


In the end we can conclude that JDK8 lambdas are less verbose than anonymous classes (and there is no instantiation overhead as with anonymous classes as lambdas compile to method handles), but they carry the same restrictions as they do. The lambda specification (JSR 335) also says so explicitly: "For both lambda bodies and inner classes, local variables in the enclosing context can only be referenced if they are final or effectively final. A variable is effectively final if it is never assigned to after its initialization.". Here is also a link to an article where Neal Gafter himself (who was a member of the BGGA team) tried to explain why inner classes are no closures (read the comments section). However, all this is only a little tear drop as the usefulness of closures is preserved to a large extend. An imense amount of pre-JDK8 boilerplate code can be replaced with much more concise expressions now. And in the end, you can anyway still do this:
       
        int sum = ints.stream().reduce(0, (x, y) -> x + y);
       

Nevertheless, the difference between JDK8 lambdas and closures is worth a note as it is good to know. There is a nice write-up about many the things you can do with JDK8 lambdas in this blog post. Here is some sample code from it: 


List<String> names = Arrays.asList("Alice", "Bob", "Charlie", "Dave");
names

   .mapped(e -> { return e.length(); })
   .asIterable()
   .filter(e -> e.getValue() >= 4)
   .sorted((a, b) -> a.getValue() - b.getValue())
   .forEach(e -> { System.out.println(e.getKey() + '\t' + e.getValue()); });


We can also reference a static method:


executorService.submit(MethodReference::sayHello);
private static void sayHello() {
        System.out.println("hello");
}


The lambda FAQ says about the restriction on local variable capture explained in this article: "The restriction on local variables helps to direct developers using lambdas aways from idioms involving mutation; it does not prevent them. Mutable fields are always a potential source of concurrency problems if sharing is not properly managed; disallowing field capture by lambda expressions would reduce their usefulness without doing anything to solve this general problem.". 

The author is making the point here that immutable variables, like those declared final, cannot be changed inadvertently by some other thread. A free variable referenced from within a closure expression (but declared outside the closure) is allocated somewhere on the heap, which means that it is not local to some specific stack (hence it is free). Being allocated on the heap a free variable can be seen by all other threads as well. This way a free variable can effectively become a variable shared between threads for which access needs to be synchronized to prevent stale data from happening. So the finalness restriction for JDK8 lambdas helps in avoiding that kind of trouble. Note, however, that this is only true for simple types or objects that are not nested as can be seen in the sample code in the beginning of this text with the single-element-array sumArray: the final variable sumArray cannot be re-assigned, but the single element it holds can be changed any time.