Go's zero values and (de)serialising

As you might’ve noticed from other blog post entries I’m suddenly all into directory services. This happens b/c that’s what I’m currently working on. As such I find myself needing to manipulate data in a DIT quite a bit and writing ldif’s by hand is not my idea of fun. Instead I set out to create a small library that would essentially allow me to parse the result of LDAP search result entries into a Go struct and transform those back into add or modify operations.

The LDAP library I use is go-ldap and it provides most of the basic building blocks. Search() takes a SearchRequest which returns Entries, a list/slice of Entry structs. However, all these Entry objects’ values are strings (even though they’re not necessarily so) and manipulating Entry itself is a bit annoying. It’s also entirely possible that I need to create a new Entry based on getting data from somewhere, like a user provisioning system.

I wrote some code that basically allows me to do this instead:

type User struct {
  DN string
  CN string `ldap:"cn"`
  Username string `ldap:"uid"`
  Mail []string `ldap:"mail"`
  ...
}

for _, entry := range searchRequest.Entries {
  err := ldap.Unmarshal(entry, &User{})
  // handle err
}

I can now create regular Go structs and using struct tags specify which attribute maps to what field and what the type is (multivalued fields are stored in a slice). This should be very familiar to anyone that’s ever done some JSON encoding/decoding into structs in Go. Like the standard library’s encoding/json I rely heavily on reflect to transform everything into the right types, which in itself took me a few hours and lots of frustration to get right.

Similarly, I can encode my user back into an Entry along these lines:

u := User{
  DN: "uid=daenney,ou=people,dc=bubblegbum,dc=com",
  CN: "Daenney",
  Username: "daenney",
  Mail: []string{"daenney@bubblegum.com"}
}

entry, err := ldap.Marshal(u)
// handle err

It also allows me to build helpers like ToAddReq that will take my struct and generate an AddRequest which I can then Add() to a server.

This becomes very useful when you want to do ETL (extract, transform, load) type of actions between directory services, b/c you’re migrating environments or changing schemas. For example, I can do this:

type User struct {
  DN string
  CN string `ldap:"cn"`
  ...
}

type LegacyUser {
  User
  CostCenter int `ldap:"postalCode"`
}

type NewUser {
  User
  CostCenter int `ldap:"constcenterId"`
}

for _, entry := range sr.Entries {
  l := &LegacyUser{}
  err := ldap.Marshal(entry, l)
  // handle err
  u := &NewUser{}
  copier.Copy(l, u) // Copy the LegacyUser into a NewUser, essentially setting all the same fields
  u.ToAddReq()
}

This is a pretty neat trick. Since the field on both LegacyUser and NewUser is called CostCenter the copier.Copy will make NewUser have that field set too. But when calling ToAddReq() or Marshal() on it, it’ll get serialised based on the struct tag, so costcenterId and not postalCode.

Zero values

However, a problem now arises. When doing Add or Modify requests, you’re supposed to only set fields for a DN that are actually set or modified. So what we don’t want to do is have a add or modify operation that happens to set a field to an empty string (the zero value of a string), it should just not set that field as part of the add or modify. We want to omit “empty” fields.

Lets look at our user struct again:

type User struct {
  DN string
  CN string
  Username string
}

When you create a new object all three fields will be initialised to their zero value, the empty string. Here the job is easy enough, if you get an empty string you omit the field when Marshaling it or when generating the AddRequest or ModifyRequest. The empty slice is similarly easy to deal with. Essentially they imply omitempty when serialising.

However, integer is a problem. Lets say we have a field that keeps track of failed authentication attemps too:

type User struct {
  FailedAttempts int `ldap:"failedAuthenticationAttempts"`
}

Now once we Unmarshal() we will no longer be able to distinguish between having had 0 failed attempts and the field not having been set in the first place, as the zero value of int is 0. This might not seem super critical, this specific field not being set kind of implies you haven’t had any authentication failures but it might be relevant in other places. It’s also annoying when you’re trying to diff two structs where in one place the value could have been explicitly set to 0 but in the other you’re looking at the zero value.

This becomes really annoying in a number of places and the common way to solve it is to use a *int instead. Then when it was unset it’ll be nil instead of 0. You can do the same thing with *string and *bool for example. Sounds easy enough but needing to create pointers to integers all the time is a bit annoying for your end users. So you end up adding something like ldap.Int() instead which handles it for you:

type User struct {
  FailedAttempts *int `ldap:"failedAuthenticationAttempts"`
}

u := &User{}
// somewhere during Unmarshal
// check if the failedAuthenticationAttempts field was returned with a
// non-empty value and then set it to value
u.FailedAttempts = ldap.Int(value)

// Or when creating a new user:
u = User{UID: ldap.Int(1337)}

Besides the fact that this is not super elegant we now have pointers to slices, strings, ints etc. hanging around that’s bound to put more pressure on the garbage collector. It probably doesn’t matter but once you’re dealing with thousands of entries it feels iffy. The other option is to use some library that provides an Optional version of the types you need. But it suffers from similar API ickyness. It essentially forces the consumer of your library to be or become aware of an implementation detail of the language. You’ll have to ensure in your documentation that this is explained to them and why it’s important that they now use your custom ldap.Int, ldap.String and ldap.Bool types. It’s not an awesome experience and for anyone relatively new to Go it’s pretty confusing.

Another option, equally icky, is to define a value for the field that is invalid. For example the gidNumber and uidNumber can be 0 (hello root) but never negative. So when it’s unset you could set it to -1 during the Unmarshal and then explicitly deal with it in the different methods that transform your structs. This is mostly surprising to consumers and they’ll have to know that when they’re creating an object for which they want that integer field to not show up in the Entry, AddRequest or ModifyRequest that they’ll have to set it to this special value. Because of this your consumers will now need to do “magical value” checks which is not an improvement.

If you’ve done much with JSON encoding/decoding you’ve probably ran into this issue too. The GitHub API client and the protobuf implementation for Go all use the pointer trick to work around this issue, but it comes with a cost. It also means that anyone consuming your deserialised object will now have to do explicit nil checks before doing anything with the value of a field, or suffer runtime panics.

I would really like for Go to have a built-in solution to this issue that doesn’t require pointer juggling. In 9 out of 10 cases the zero value is exactly what I want but when (de)serialising things sometimes being able to distinguish between the zero value and unset is important.

There is a proposal in the form of sum/discriminated union types that, as far as I can understand, could potentially solve this in the future.