Working Review of "Practical ML Programming with SML#" (Ohori, Ueno), CHAPTER 5

Modules and partial compilation
2023-08-20

This chapter spans pages 67 - 75, including almost 2 full pages of source code, so the reading was quick work. The copying-over of the source code into my editor was quick work, too.

Only when I sat down to consider what this chapter was about and write a short review, I realized that I’d just gone through a beautiful demonstration of one of SML#’s innovative features.

What we’re building

In this chapter, we build two dictionary-like structures (modules), which allow us to store and retrieve values associated with keys. The type of this dictionary is expressed as follows:

type ('k, 'v) dict

We’ll want to be able to construct dict instances with keys 'k of various types.

Now, notice that these variables tell us nothing about what we can do with values of type 'k or 'v. In SML#, it’s not even given that we can compare two 'ks for equality.

If we can’t compare for equality, we can’t reasonably write a get function, since there’s no way for us to check that the key supplied by the user is equal to any of the keys in the dictionary.

# type ('k, 'v) dict = ('k * 'v) list;
type ('k, 'v) dict = ('k * 'v) list

# fun get (key: 'k) (dict: ('k, 'v) dict) = case dict of
   nil => NONE
 | ((aKey, aValue)::rest) => if aKey = key
                             then SOME aValue
                             else get key rest

(...)

(interactive):7.0-9.73(177) Error:
  (type inference 069) definition and occurrence of "get" don't agree.
  definition: 'k -> ('k * 'v) list -> _
  occurrence: 'k -> ('IOG#eq * 'IOM) list -> 'IOM option

There’s a bunch of errors produced, but perhaps the one above is the most informative: we defined get as working on any key type 'k and any dictionary keyed by 'k, but the compiler has determined that the function we wrote expects the 'k type to be something marked 'k#eq. (The compiler used an autogenerated type variable ‘IOG, but this is an implemenation detail).

This is because our equality function, used in the expression if aKey = key, demands that both its operands can be tested for equality. An arbitrary type 'k does not make this guarantee.

Three ways out

Based on my patchy knowledge of Standard ML, I could think of three ways to implement our dictionary get function so that it satisfies the needs of the type system.

1. Force `'k` to be an equality type when calling `get`:

Standard ML has two flavors of type variable: single-quoted ('k) and double-quoted (''k). The double-quoted flavor signifies that the type 'k has a special property, namely that the Standard ML system automatically knows how to compare two values of this type for equality.

SML# supports this as well, being a superset of Standard ML. We can leverage this feature by forcing our key type — in dictionaries used with get — to support equality testing.

fun get (key: ''k) (dict: (''k, 'v) dict) =
  case dict of nil =>
    NONE
  | (aKey, aValue)::rest => if aKey = key
                            then SOME aValue
                            else get key rest
;

> val get = fn : ['a#eq, 'b. 'a -> ('a * 'b) list -> 'b option]

Now, the mysterious #eq has shown up again in the signature for our new get function. This is the same #eq that showed up in the inferred type for our previous attempt. It tells us that all 'as involved must support equality comparisons. More succintly: 'a#eq is equivalent to ''a.

Also worth noting is that we didn’t have to extend the definition of our dict type to account for equality testing. This is nice because we can create dict values which are not compatible with our get function above, and still have them type-check correctly. I.e: only the get function needs an equality check, so it’s good to place that restriction on the function type in particular, instead of the datatype in general.

Here’s how we can demonstrate. First, a get on a dict where the keys do support equality comparisons:

# get 2 [(1, "a"), (2, "b")];
val it = SOME "b" : string option

# get 3 [(1, "a"), (2, "b")];
val it = NONE : string option

And now, a dict where the key type (functions) does not allow for equality testing:

# get (fn x => x) [ (fn y => y, "a"), (fn x => x, "b")];
(interactive):53.0-53.13(13) Error:
  (type inference 016) operator and operand don't agree
  operator domain: 'TUG#eq
          operand: 'TUF -> 'TUF

This is one way to go about it, but it ties our dictionary interface to built-in types that have built-in equality support, provided by the language implemenation. If we’d like a dictionary that’s keyed with some more exotic structure which we’ve built ourselves, we’re out of luck, as there’s no way to “register” a custom equality check that the compiler will use to fulfill the #eq requirement.

2. Always ask for a comparison function when calling `get`

We can also extend the signature of get, shifting the burden on the caller to provide a function eqTest, which will determine when keys are equal.

fun get (eqTest: ('k * 'k) -> bool) (key: 'k) (dict: ('k, 'v) dict) =
  case dict of nil =>
    NONE
  | (aKey, aValue)::rest => if eqTest(aKey, key)
                            then SOME aValue
                            else get eqTest key rest

> val get = fn : ['a, 'b.  ('a * 'a -> bool) -> 'a -> ('a * 'b) list -> 'b option]

Now, we’ve managed to hide the requirement that the key type needs to be an #eq type. What’s more, not even our eqTest function has this requirement! We’ve achieved much looser coupling at the cost of making our interface more unwieldy.

Let’s do the same exercice as above: once with integer-keys dicts, and once with function-keyed dicts.

# get (op=) 2 [(1, "a"), (2, "b")];
val it = SOME "b" : string option

# get (op=) 3 [(1, "a"), (2, "b")];
val it = NONE : string option

Now, with function-keyed dicts:

# get (fn (f1, f2) => f1 1 = f2 1) (fn x => x)  [(fn x => x + 1, "a"), (fn x => x, "b")];
val it = SOME "b" : string option

# get (fn (f1, f2) => f1 1 = f2 1) (fn x => x)  [(fn x => x + 1, "a"), (fn x => x + 2 , "b")];
val it = NONE : string option

Notice that our eqTest function makes some assumptions about the functions-as-keys. For example, that they can take an integer argument 1, and that whatever value is returned from the function can be compared for equality. This is okay, becase we, as the user of this dictionary, can make both the eqTest function, as well as the dictionary key types, more specific when we need to. The important thing is that this specificity is not required of all users of the dictionary API.

Just to demonstrate how specific our get function is in the above case, let’s partially apply it and see the resulting type:

# val getFunKey = get (fn (f1, f2) => f1 1 = f2 1) ;

val getFunKey = fn : (int -> ?X33) -> ((int -> ?X33) * ?X32) list -> ?X32 option

This specialized get function expects a function from int to any type ?X33, and a dictionary where the keys are of the same function type. We can’t apply it to a dictionary where the keys are of a slighlty different function type:

# getFunKey (fn x => x) [(fn (x,y) => x, "a"), (fn (x,y) => y, "b")];
(interactive):86.0-86.19(571) Error:
  (type inference 016) operator and operand don't agree
  operator domain: int -> ?X33
          operand: int -> int

Now, admittedly, keying dictionaries with functions and then evaulating them as part of an equality check is not the wisest move, but the above is meant as a demonstration of what happens when we require an eqTest function to be provided for get to work. Doing so lets us remove tight coupling from the data represenation out to the layer of function calls, forcing the user of the representation to provide adequatly-typed ‘glue’.

3. Functors

The idea of binding a get-function with a particular eqTest function to obtain a specialized get is taken to fuller expresivness with the idea of functors.

A functor is a function that takes a structure and returns a structure, so it’s kind of like a factory pattern for modules. So instead of dict being implemented as a struct which forces its users to supply specialization parameters with every function it exposes, we can implement dict as a functor, which takes a structure that provides all the relevant specialization parameters, and returns a structure with a simple API. This output structure will nicely encapsulate the details, so subsequent users won’t have to be aware of them.

Dict as a functor

First, we need to specify a signature which will serve as proof that our key type in question has an equality check:

signature HASEQTEST = sig
  type t
  val eqTest : (t * t) -> bool
end

Now, we need to provide the output DICT signature, which all implemenations will fulfill, regardless of their particular key type:

signature DICT = sig
  structure Key : HASEQTEST
  type 'v dict
  val get : Key.t -> 'v dict -> 'v option
end

With the input and output signatures prepared, we can take a stab at writing the functor which will generate our DICT instances when provided the structure K which fulfils HASEQTEST.

In short, when we provide the eqTest method for our key type, we get a module that encapsulates that method, giving us a nice API that hides the details of equality testing.

Note that the output signature of the functor is extended using a where clause to bind together two types: the type which HAS EQ TEST, K.t, and the output dictionary type Key.t.

functor EqDictFun
  (structure K : HASEQTEST) : DICT where type Key.t = K.t =
struct
  structure Key : HASEQTEST = K
  type 'v dict = (Key.t * 'v) list;

  fun get (key: Key.t) (dict: 'v dict) =
    case dict of nil =>
     NONE
  | (aKey, aValue)::rest => if Key.eqTest(key, aKey)
                            then SOME aValue
                            else get key rest

end

Instead of providing a specialization function everywhere where it’s needed, we can now instantiate new structures by applying our functor to structures that fulfill HASEQTEST.
Let’s create one for integer-keyed dicts of strings:

# structure IntDict = EqDictFun(structure K =
    struct
        type t = int
        fun eqTest(a,b) = (a = b) end);

structure IntDict =
  struct
    type 'v dict = (Key.t * 'v) list
    val get = fn : ['a. Key.t -> (Key.t * 'a) list -> 'a option]
    structure Key =
      struct
        type t = int32
        val eqTest = fn : t * t -> bool
      end
  end

# IntDict.get 2 [(1, "a"), (2,"b"), (3,"c")];
val it = SOME "b" : string option

# IntDict.get 22 [(1, "a"), (2,"b"), (3,"c")];
val it = NONE : string option

And of course, we can make a very silly function-keyed dictionary where functions are checked for equality at their zero point.

# structure FunDict = EqDictFun(structure K =
    struct
        type t = (int -> int)
        fun eqTest(f, g) = (f 0 = g 0) end);

structure FunDict =
  struct
    type 'v dict = ((int -> int) * 'v) list
    val get = fn : ['a. (int -> int) -> ((int -> int) * 'a) list -> 'a option]
    structure Key =
      struct
        type t = int -> int
        val eqTest = fn : (int -> int) * (int -> int) -> bool
      end
  end

Now, we can query our dictionary for the key that is specified by the identity function:

# val x : string FunDict.dict = [(fn _ => 22, "a"),
                                 (fn x => x*100, "b"),
                                 (fn x => x, "c")];

# FunDict.get (fn x => x) x;
val it = SOME "b" : string option

You were probably expecting to get back SOME "c", but under our own self-inflicted notion of function equality (same result at zero point), (fn x => x*100)(0) = (fn x => x)(0) is true.

While we learned a lesson in (not) checking functions for equality, the point of this exercise was to show that functors allow for parametrizing structures to accomodate unexpected (and even unwise) usage.

SML#’s simple answer

To recap the above: to use functors to their full effect, one has to put in place quite a lot of support machinery:

We need to have a signature outsig that the functor’s return struct will fulfill
outsig needs to have a nested structure n
We need an input signature N which must be referenced in n
We must construct instruct, an appropriate instantiation of N for each type we’d like to parametrize over
We must apply the functor to instruct, and specify which types are to be linked between instruct and outsig.

The authors of SML# have a more simplistic solution, which relies on the fact that structures are created imperatively in Standard ML, therefore we can generate functions that fulfill our type requirements from a template of sorts.

First, we define a generator structure, which will play the role of functor.

structure MakeDict =
struct
  type ('k,'v) dict = ('k * 'v) list;
  fun makeDict comp =
      let fun get _ nil = NONE
            | get key ((k,v)::rest) =
               (case comp(key,k) of
                   EQUAL => SOME v
                  | _ => get key rest)
      in {get = get}
      end
end

As you can see, the structure declares a polymorphic type, and one function which returns a record. This record has one field get in our example, but in the book it includes the insertion function and the empty dictionary.

To create specialized modules that use this API, we need 2 lines of code:

structure IntDict =
struct
  type 'a dict = (int, 'a) MakeDict.dict; (* making the type concrete *)
  val {get} = MakeDict.makeDict Int.compare
end

Now, we have obtained a structure fulfilling this (non-obligatory) interface file:

structure IntDict =
struct
  type 'a dict (= MakeDict.dict); (*  made concrete in the implementation *)
  val get : int -> 'a dict -> 'a
end

Compared to the heavyweight machinery needed to get the benefit of functors, this approach is both aesthetically pleasing and much more easily understandable. It’s not all that different from the dependency injection patterns you can find in object-oriented languages.

Let’s see if it will tolerate our cursed FunDict.

structure FunDict = struct
  type 'a dict = ((int -> int), 'a) MakeDict.dict;
  val {get} = MakeDict.makeDict (fn (f,g)=> Int.compare(f 0, g 0))
end

# val x : string FunDict.dict = [(fn _ => 22, "a"), (fn x => x*100, "b"), (fn x => x, "c")];
val x = [(fn, "a"), (fn, "b"), (fn, "c")] : ((int -> int) * string) list

# FunDict.get (fn x => x)  x;
val it = SOME "b" : string option

Sure enough! We have all the practical benefits of functors without the bureaucracy. The authors of the language definitely prefer the more straightforward mechanisms of interface files and separate compilation over functors.

Summing up

This chapter gave me a lot of food for thought, and forced me to go back and read the great section on modular programming in Robert Harper’s online book on Standard ML.

I’m still not sure about the viability of the structure-generating approach in SML# when the output struct we wish to create is ascribed opaquely (i.e. its internal types are considered inaccessbile to the outside world). Maybe subsequent chapters will demonstrate how—although based on my skimming of the book, they skew heavily towards the practical side.