Monday, November 14, 2011

Go Generics, in 3 Parts

For fun, I've been considering designs for generics in Go, and request comments on the feasibility of the following design. The design has 3 simple parts: generic operations, generic types, and generic operators. I've tweaked the gc compiler to accept the grammatical changes, and I can even run the 'generic types' code, but it's time to share the design before coding.

Overview

The 'generic operations' proposal would allow '_' to be used in interfaces to mean 'the receiver base type' This would allow generic operations on operands of a single data type. (This matching is possible because Go dispatches based on receiver type.) For example:

package generic
type Lesser interface { Less (b _) bool }
// Elsewhere, perhaps in sort.go, dijkstra.go, rbtree.go...
if (a.Less(b))

The 'generic types' proposal would add new default type declarations to be used subsequently as default types during compilation, but which may be overridden by package clients with more specific types for static type checking and optimization. Notably, this approach requires no change to client code or to the compiler outside parsing. A simple example from pkg/container/list:

// list.go
default T interface{}
type Element~T struct {
    next, prev *Element~T
    list *List~T
    Value T
}
// list_test.go
func checkList(t *testing.T, l *List~int, es []int) {

Finally, the 'generic operators' proposal would have the compiler use standard interfaces when there is no builtin support for the operator. For example:

var x, y big.Int
...
z := x + y  // z := x.Plus(y)
if x < y    // if x.Less(y)

Personally, I find the 'generic operations' to be compelling language enhancement. The 'generic types' and 'generic operators' are trade-offs.

1: Generic Operations (or 'Receiver Type Matching')

It would also be nice to address the problem of generic operations, especially comparison. This has previously been described as the a.Less(b) problem. It would be solved if _ means "the receiver base type" in the following:

// in package generic
type Lesser interface { Less(b _) bool }
// in package ???
func (a int) Less(b int) bool { return a < b }
func (a float64) Less(b float64) bool { return a < b }
// in package big
func (a big.Int) Less(b big.Int) bool { return a < b }
// in your own source
type T { ... }
func (a T) Less (b T) bool { return ... }

In this example, each of the Less methods implements the generic.Lesser interface.

The payoff is huge: generic binary operations become possible. For example, generic comparison makes generic container classes and sorting possible using interfaces like these:

// Lesser enables generic sorting, sorted trees, binary search...
type Lesser interface { Less(b _) bool }

// Hashable enables generic HashMap and Set implementations.
type Hashable interface {
    Equals (b _) bool
    HashCode () uint
}

Furthermore, the use of the _ type is not limited to the first parameter:

// `_` used for return values
interface {
    Plus (b _) _
    Minus (b _) _
    ...
}

// `_` used for variadic parameters
interface {
    PlusMultiple (b ..._) _
    MinusMultiple (b ..._) _
}

Skeptics might think this 'receiver base type' matching is not compatible with Go's dispatch, but it is: Go already dispatches based on the receiver type. The compiler needs only minor changes to match these new interfaces: it must replace the '_' with the actual receiver type when testing if a type implements the interface, and it must determine the implicit method type when the method called, at which point it knows the receiver type.

2: Generic Types

Obviously, we don't want to break existing code. A prototype compiler implementing the grammar below has confirmed that no change to .go files in the Go distribution is required.

Obviously, we want to be able to define things like "list of integers" and "map of keys to values". Java and C++ uses List<int> and List<<int>> syntax for "list of integers", but the semicolon insertion in the Go lexer causes problems with that approach because it does not emit semicolons after > and >> at the end of lines. One concise approach that does work is to use List~Int for "list of integers" and Map~(Key,Value) for "map of keys to values".

When defining the generic List~T, there should be a specified default type for T because,

  • We want List to act as it does today, even if list.go becomes generic, so we don't break existing code.
  • We don't want to have to recompile list.go to determine if List~int is a compatible type, as is required with C++ templates.

So, a package should be able to declare special "default types", which will naturally be interfaces since that is Go's type abstraction mechanism. These default types are declared in default declarations, which are just like current type declarations, except default types will need restricted assignment rules to enforce type contracts, below:

default T interface{}

We don't want C++ template code bloat, caused by having to recompile List<<T>> for each T. In Go, the code currently generated for List will suffice for any List~T, making recompilation entirely optional. Allowing a single generic implementation is naturally a big win for space efficiency. (It also won't hurt performance once Go runs on a VM with a JIT that inlines hotspot code. :-)

We want type contracts so the compiler can safely infer that only int is retrieved from a List~Int, obviating type assertions. For this to be safe, we need a new assignability rule for default types: Inside the package declaring default type D, x is assignable to D if and only if x is nil or x's type is identical to D. This is the type preservation contract.

We want to leverage type information to support compile-time type checking, such as catching attempts to put a Sprocket into a List~Widget. So, List and List~Widget must be distinct types, despite both being derived from List~T, and despite possibly executing the same code behind the scenes. Specifically, a List is a List~T where T is replaced by T's default type, which is interface{}. In the List~Widget type, T is replaced with Widget.

Trivially applying this approach to src/pkg/list/list.go, we get code like this:

default T interface{}
type Element~T struct {
    next, prev *Element~T
    list *List~T
    Value T
}
type List~T struct {
    front, back *Element~T
    len int
}
func (l *List~T) Init() *List~T {
    l.front = nil
    l.back = nil
    l.len = 0
    return l
}
func New() *List~T {
    return new(List~T)
}
...
func (l *List~T) PushFront(value T) *Element~T {
    e := &Element~T{nil, nil, l, value}
    l.insertFront(e)
    return e
}
...

Similarly, list_test.go code looks like this when modified to use List~int:

func checkList(t *testing.T, l *List~int, es []int) {
    if l.Len() != len(es) {
        t.Errorf("list has len=%v, want %v", l.Len(), len(es))
        return
    }
    i := 0
    for e := l.Front(); e != nil; e = e.Next() {
        le := e.Value.(int)
        if le != es[i] {
            t.Errorf("elt #%d has value=%v, want %v", i, le, es[i])
        }
        i++
    }
}

(One would expect a mature compiler to optimize away the .(int) type assertion and allow it to be omitted.)

To validate the grammar, I have minimally patched the go parser and lexer to support this syntax, to treat default declarations exactly like type declarations, and to parse but otherwise ignore the ~* type qualifiers. I've patched list.go and list_test.go to use the generic syntax. The modified compiler can compile the entire Go tree and pass the builtin test suite, so the grammar extensions do not break the extant code.

3: Generic Operators

If one supports generic operations as described above, it is natural to ask, why not add generic operator (operator overloading) support? For example, why not allow users of Int (big numbers) do if a < b with bignums by translating the currently unsupported < binary oparation on structs to execute if a.Less(b)? If generic operations are already supported, generic operators are only a step away.

Admittedly, operator overloading has a checkered past in C++, largely because the standard libraries overloaded new meanings on familiar operators, setting a bad precedent and causing an abundance of inefficient string building unless the compiler does extra work. (Oops, Go has + for strings, too.)

Go's static type system would naturally restrict binary operators to use on objects of the same type, which naturally prevents operator repurposing like "Hello" >> file, but is consistent with binary oparations on consistent types. So, Go naturally discourages some abuse.

Go could also strongly discourage operator overloading abuse by allowing the overloading of arithmetic operators only when a full standard Arithmetic interface is supported, logical operators only when the full Logical interface is supported, etc. Requiring these full interfaces is a deliberate inconvenience to those wanting to repurpose operators for non-intunitive purposes, but requires no effort for legitimate use.

package generic
type (
        // Required interface for `-`, `+`, `*`, `/`
    // Intuitive for big.Int, big.Nat, hypothetical.Matrix...
    Arithmetic interface {
        Negative () _       // `-` (unary)
        Plus (b _) bool     // `+`
        Minus (b _) _       // `-` (binary)
        Times (b _) _       // `*`
        DividedBy (b _) _   // `/`
        // Prototypes conflict with big.Int.  Resolve...
    }
    // Required interface for `&`, `|`, `^`, `&^`
    // Intuitive for sets and other containers
    Logical interface  {
        And (b _) _     // `&` (intersection)
        Or (b _) _      // `|` (union)
        Xor (b _) _     // `^` (difference)
        AndNot (b _) _      // `&^` 
    }
    // Required interface for `==`.
    Equalser interface {        
        Equals (b _) _      // ==
    }
    // Required interface for `<`, `<=`, `>`, `>=`
    Comparable interface {
        Equalser
        Lesser          // `<`
        LessEquals (b _) _  // `<=`
        Greater    (b _) _  // `>`
        GreaterEquals(b _) _    // `>=`
    }
    // More?
)

Would you like some syntactic sugar?