For fun, I've been considering designs for generics in Go, and request comments on the feasibility of the following design. The design has 3 simple parts: generic operations, generic types, and generic operators. I've tweaked the gc compiler to accept the grammatical changes, and I can even run the 'generic types' code, but it's time to share the design before coding.
Overview
The 'generic operations' proposal would allow '_' to be used in
interfaces to mean 'the receiver base type' This would allow generic
operations on operands of a single data type. (This matching is
possible because Go dispatches based on receiver type.) For example:
package generic
type Lesser interface { Less (b _) bool }
// Elsewhere, perhaps in sort.go, dijkstra.go, rbtree.go...
if (a.Less(b))
The 'generic types' proposal would add new default type
declarations to be used subsequently as default types during
compilation, but which may be overridden by package clients with more
specific types for static type checking and optimization. Notably,
this approach requires no change to client code or to the compiler
outside parsing. A simple example from pkg/container/list:
// list.go
default T interface{}
type Element~T struct {
next, prev *Element~T
list *List~T
Value T
}
// list_test.go
func checkList(t *testing.T, l *List~int, es []int) {
Finally, the 'generic operators' proposal would have the compiler use standard interfaces when there is no builtin support for the operator. For example:
var x, y big.Int
...
z := x + y // z := x.Plus(y)
if x < y // if x.Less(y)
Personally, I find the 'generic operations' to be compelling language enhancement. The 'generic types' and 'generic operators' are trade-offs.
1: Generic Operations (or 'Receiver Type Matching')
It would also be nice to address the problem of generic operations,
especially comparison. This has previously been described as the
a.Less(b) problem. It would be solved if _ means "the receiver
base type" in the following:
// in package generic
type Lesser interface { Less(b _) bool }
// in package ???
func (a int) Less(b int) bool { return a < b }
func (a float64) Less(b float64) bool { return a < b }
// in package big
func (a big.Int) Less(b big.Int) bool { return a < b }
// in your own source
type T { ... }
func (a T) Less (b T) bool { return ... }
In this example, each of the Less methods implements the generic.Lesser interface.
The payoff is huge: generic binary operations become possible. For example, generic comparison makes generic container classes and sorting possible using interfaces like these:
// Lesser enables generic sorting, sorted trees, binary search...
type Lesser interface { Less(b _) bool }
// Hashable enables generic HashMap and Set implementations.
type Hashable interface {
Equals (b _) bool
HashCode () uint
}
Furthermore, the use of the _ type is not limited to the first parameter:
// `_` used for return values
interface {
Plus (b _) _
Minus (b _) _
...
}
// `_` used for variadic parameters
interface {
PlusMultiple (b ..._) _
MinusMultiple (b ..._) _
}
Skeptics might think this 'receiver base type' matching is not compatible with Go's dispatch, but it is: Go already dispatches based on the receiver type. The compiler needs only minor changes to match these new interfaces: it must replace the '_' with the actual receiver type when testing if a type implements the interface, and it must determine the implicit method type when the method called, at which point it knows the receiver type.
2: Generic Types
Obviously, we don't want to break existing code. A prototype compiler implementing the grammar below has confirmed that no change to .go files in the Go distribution is required.
Obviously, we want to be able to define things like "list of integers"
and "map of keys to values". Java and C++ uses List<int> and
List<<int>> syntax for "list of integers", but the semicolon
insertion in the Go lexer causes problems with that approach because
it does not emit semicolons after > and >> at the end of lines.
One concise approach that does work is to use List~Int for "list of
integers" and Map~(Key,Value) for "map of keys to values".
When defining the generic List~T, there should be a specified default
type for T because,
- We want
Listto act as it does today, even if list.go becomes generic, so we don't break existing code. - We don't want to have to recompile list.go to determine if
List~intis a compatible type, as is required with C++ templates.
So, a package should be able to declare special "default types", which
will naturally be interfaces since that is Go's type abstraction
mechanism. These default types are declared in default
declarations, which are just like current type declarations, except
default types will need restricted assignment rules to enforce type
contracts, below:
default T interface{}
We don't want C++ template code bloat, caused by having to recompile
List<<T>> for each T. In Go, the code currently generated for
List will suffice for any List~T, making recompilation entirely
optional. Allowing a single generic implementation is naturally a big
win for space efficiency. (It also won't hurt performance once Go runs
on a VM with a JIT that inlines hotspot code. :-)
We want type contracts so the compiler can safely infer that only
int is retrieved from a List~Int, obviating type assertions. For
this to be safe, we need a new assignability rule for default types:
Inside the package declaring default type D, x is assignable to D if
and only if x is nil or x's type is identical to D. This is the type
preservation contract.
We want to leverage type information to support compile-time type
checking, such as catching attempts to put a Sprocket into a
List~Widget. So, List and List~Widget must be distinct types,
despite both being derived from List~T, and despite possibly
executing the same code behind the scenes. Specifically, a List is
a List~T where T is replaced by T's default type, which is
interface{}. In the List~Widget type, T is replaced with
Widget.
Trivially applying this approach to src/pkg/list/list.go, we get code like this:
default T interface{}
type Element~T struct {
next, prev *Element~T
list *List~T
Value T
}
type List~T struct {
front, back *Element~T
len int
}
func (l *List~T) Init() *List~T {
l.front = nil
l.back = nil
l.len = 0
return l
}
func New() *List~T {
return new(List~T)
}
...
func (l *List~T) PushFront(value T) *Element~T {
e := &Element~T{nil, nil, l, value}
l.insertFront(e)
return e
}
...
Similarly, list_test.go code looks like this when modified to use
List~int:
func checkList(t *testing.T, l *List~int, es []int) {
if l.Len() != len(es) {
t.Errorf("list has len=%v, want %v", l.Len(), len(es))
return
}
i := 0
for e := l.Front(); e != nil; e = e.Next() {
le := e.Value.(int)
if le != es[i] {
t.Errorf("elt #%d has value=%v, want %v", i, le, es[i])
}
i++
}
}
(One would expect a mature compiler to optimize away the .(int)
type assertion and allow it to be omitted.)
To validate the grammar, I have minimally patched the go parser and
lexer to support this syntax, to treat default declarations exactly
like type declarations, and to parse but otherwise ignore the ~*
type qualifiers. I've patched list.go and list_test.go to use the
generic syntax. The modified compiler can compile the entire Go tree
and pass the builtin test suite, so the grammar extensions do not
break the extant code.
3: Generic Operators
If one supports generic operations as described above, it is natural
to ask, why not add generic operator (operator overloading) support?
For example, why not allow users of Int (big numbers) do if a < b
with bignums by translating the currently unsupported < binary
oparation on structs to execute if a.Less(b)? If generic operations
are already supported, generic operators are only a step away.
Admittedly, operator overloading has a checkered past in C++, largely
because the standard libraries overloaded new meanings on familiar
operators, setting a bad precedent and causing an abundance of
inefficient string building unless the compiler does extra work.
(Oops, Go has + for strings, too.)
Go's static type system would naturally restrict binary operators to
use on objects of the same type, which naturally prevents operator
repurposing like "Hello" >> file, but is consistent with binary oparations
on consistent types. So, Go naturally discourages some abuse.
Go could also strongly discourage operator overloading abuse by allowing the overloading of arithmetic operators only when a full standard Arithmetic interface is supported, logical operators only when the full Logical interface is supported, etc. Requiring these full interfaces is a deliberate inconvenience to those wanting to repurpose operators for non-intunitive purposes, but requires no effort for legitimate use.
package generic
type (
// Required interface for `-`, `+`, `*`, `/`
// Intuitive for big.Int, big.Nat, hypothetical.Matrix...
Arithmetic interface {
Negative () _ // `-` (unary)
Plus (b _) bool // `+`
Minus (b _) _ // `-` (binary)
Times (b _) _ // `*`
DividedBy (b _) _ // `/`
// Prototypes conflict with big.Int. Resolve...
}
// Required interface for `&`, `|`, `^`, `&^`
// Intuitive for sets and other containers
Logical interface {
And (b _) _ // `&` (intersection)
Or (b _) _ // `|` (union)
Xor (b _) _ // `^` (difference)
AndNot (b _) _ // `&^`
}
// Required interface for `==`.
Equalser interface {
Equals (b _) _ // ==
}
// Required interface for `<`, `<=`, `>`, `>=`
Comparable interface {
Equalser
Lesser // `<`
LessEquals (b _) _ // `<=`
Greater (b _) _ // `>`
GreaterEquals(b _) _ // `>=`
}
// More?
)
Would you like some syntactic sugar?
Hi Gelnn.
ReplyDeleteWhy did you choose '_'?
It may get confusing as it is already used with the '_' meaning "i don't care about this result"
Also, did you discuss this with Russ Cox at http://research.swtch.com/generic ?
I meant to suggest ? instead of _ to mean 'the receiver base type'
ReplyDeleteor perhaps $.
ReplyDeleteI think "?" makes more "sense" here as is implies "whatever/I don't know that"
ReplyDelete$ could be then used for substituting ~, which may be HARD or different to get in non-English/Mac keyboards.
In fact I suggest to choose both from the symbols that are:
- Easy/standard to get in any keyboard: ! $ % ? _ @ #
- AND are not already used within Go 1 (that leaves $ ? @)
I vote for:
? means "the receiver base type"
AND
To use List$Int for "list of integers" and Map$(Key,Value) for "map of keys to values".
$ is easier to get (SHIFT+4) than even @ (ALTGR+2 on my PC, but different on a Mac and different on US keyboard...)
$ is horrible... List$Int is at least unreadable! List~List is better, in my opinion... And any programmer which has ever done C++ programming has access, in one way or other, to '~' symbol.
ReplyDelete? is excellent choice for "receiver base type", as for me ;)
My "problem" with '?' is that at least *I* would be very much confused with stuff like:
ReplyDeletelst := List?List?int{}
(my C/C++-formated brain would try to interpret it as somekind of ternary operator.)
perhaps less so for:
lst := List?(List?int){}
but it still feels like "ternaries."
I remember being confused reading templated D code:
lst = List!(int)()
the unary !-operator couldn't be parsed as unary by my limited brain (and it always falling back on binary !-operator...)
List$int isn't nice on the eyes, List${int} List$(int) or Map${K,V} are marginally better.
m := Map@(K,V){}
or
m := Map#(K,V){}
or
m := @Map(K,V){}
m := #Map(K,V){}
having @,$ or # in front could help clearly label and spot a template...
First, I don't claim the syntax is the best possible, but it's the most aesthetically pleasing I could quickly come up with that didn't produce an ambiguous grammar. It took several iterations to create a prototype parser+lexer.
ReplyDeleteI chose '_' for receiver base type because:
'_' is concise.
'_' is easily recognized by humans as unusual, thanks to '_' in assignments.
'_' is already a lexer token (TBLANK), so no lexer change is required.
'_' is a valid char in type names, so it's not likely to cause trouble in tools that serialize type names.
'_' means "I don't care which type... as long as it matches the receiver type. :-)
I chose '~' largely for aesthetics, and because it was unused. As I recall, '@' in type names confused some Go serialization library, and any character (or token) that acts as a binary operator would not work. Using an unused character trivially avoided conflicts. I suspect other options might possible without introducing a new character, but I failed to find one. I'm not sure I tried List(int), though...
I haven't posted to Russ's generics page, but I may after the current go-nuts discussion settles. Thanks for the pointer.
Oh, and I didn't use '?' because I miss the C's ternary operator. Some C simulation code I write would require 5 times as many lines in Go because this operator is missing, and would be far less readable.
ReplyDeleteThanks for your replies Glenn,
ReplyDeleteDo you have your prototype parser+lexer somewhere public to play with, like github or bitbucket or something like that?
I would try to play around with them if I could?
I am afraid I am not an expert in language design and compilers but I would like to learn and this generics topic is quite interesting.
I was thinking that for Go 1 (as it is quite sure that this version will NEVER have any generics solution or alternative) I could write a generic code exchange app for go. I could call it gogex (Go Generics Exchange)
It would sit as a go test app on google app engine and will use David Roundy go templates project as a base:
https://github.com/droundy/gotgo
The idea is that this app will store generic code, functions, types, interfaces, that you can then download at anytime for the command line:
gogetgen get.list int
for a list~(int)
or
gogetgen gen.list gen.list int
for a list~(list~(int))
And it will download the source code already expanded for the required type to be used within the current package.
On a second phase, it could even help you generalize some concrete type based code to a generic solution online, substituting the concrete type with the generic receiver _, default ~T or whatever is appropiate.
Do you think that kind of app will be useful to Go programmers while a proper generics solution is to be found for Go 2 or later?
(Some final word of advice, I would also discuss the notation for _ and ~ with Russ and other Go core team members, and probably make a survey to see if ~ is an easy key to type in any international keyboard)
Just for the record. On the Spanish PC keyboard ~ is AltGr+4 while on a Mac is not labeled anywhere, but it can be obtained with Alt+Ñ (tested on Mac Book Air).
ReplyDelete