Previous | Up | Next

Working Review of "Practical ML Programming with SML#" (Ohori, Ueno), CHAPTER 8

Accessing External Data

Before the year ends, let’s take a look at Chapter 8, titled “Accessing External Data”. This chapter teaches us how to read and interpret files on disk, as well as how to parse JSON data. Additionally, we are introduced to ML-style error handling, via explicit handle expressions and pattern-matching on exn constructors.

File I/O

This section gives us a very quick run-down of the TextIO structure, a Standard ML Basis module. We implement a very simple file-copy procedure.

Handling errors using the Exception mechanism

This section is quite long, but it contains a lot of information about Standard ML’s exception mechanism. For those of you who don’t know what it looks like: SML uses the same single datatype for all exceptions, and this datatype can be extended by the user via exception declarations, such as the following:

exception FailedToDownload of string;

Now, we can raise instances of these exceptions anywhere in our code, and they will stop the flow of control and bubble up to the nearest handle expression in the call-stack:

(raise FailedToDownload "howto.txt")
handle
  (FailedToDownload filename) => print ("Failed to download: " ^ filename ^ "\n");

This is a nice way to do non-local control flow, such as early-returning from a map or fold once the result is known. This meshes nicely with an imperative, effectful style of programming:

# fun find (predicate: 'a -> bool) (list: 'a list): 'a option =
>  let
>    exception Found of 'a
>  in
>   (app (fn el => if predicate el then raise Found el else ()) list; NONE)
>   handle Found x => SOME x
> end;
val find = fn : ['a. ('a -> bool) -> 'a list -> 'a option]

# find (fn x => x > 10) [1,2,3,4,5,6,7,8,9,10,11,12];
val it = SOME 11 : int option

# find (fn x => x > 10) [1,2,3];
val it = NONE : int option

It is also the preferred way of signaling exceptional circumstances, such as I/O failures. For example, trying to open an non-existent file raises an IO.Io exception, with some attached information about the fault.

# TextIO.openIn "missing.txt"; ();
uncaught exception IO.Io: openIn at src/smlnj/Basis/IO/text-io.sml:807.24(25074)

# (TextIO.openIn "missing.txt"; ())
> handle
>  (IO.Io{name, function, cause}) => print (concat ["IO error ",
>                                                    name, " ",
>                                                    function, "\n"]);
IO error missing.txt openIn

To round out the intro to SML-style error handling, the authors go on to demonstrate how to implement a generalized IO-error handler. This handler is then extended with ‘finalizer’ functions, which play the role of a (syntactically unavailable) finally clause.

Overall, the first part of this chapter is a rehash of exceptions and error handling in Standard ML. The next part is a radical departure from the standard.

Reading JSON Data

This section introduces how SML# implements dynamic typing at runtime. Yes! Even though SML# is compatible with Standard ML, it is capable of runtime typing, via the built-in Dynamic module, which is a very interesting extension to the language.

The thing to note, is (as far as I’m aware), the scope of dynamic typing is restricted to:

  1. existing types
  2. views into record types (i.e. particular fields)
  3. JSON (this chapter)
  4. SQL

That is to say: on one hand we can’t get in on the “magic”, as it is contained within the language runtime, and implement our own Dynamic converters, say, from a binary format like Protobufs. On the other hand, we can be sure that no 3rd-party SML# code will spring some outrageous dynamic typing scheme on us. To me, this is a reasonable point in the design space, although it definitely whets the apetite for what could be done if Dynamic was more open to user-level tweaks.

Here’s how the dynamic typing works, in a nutshell. Firstly, we can dynamically recover the types of existing values:

# open Dynamic
# val one_int = dynamic 1;
val one_int = _ : void dyn
# val one_real = dynamic 1.0;
val one_real = _ : void dyn

The type returned by dynamic is void dyn, which means it’s a dynamic value without a particular type-level interpretation attached. We have to provide the interpretation, using the built-in function _dynamic EXP as τ:

# _dynamic one_int as int;
val it = 1 : int
# _dynamic one_real as real;
val it = 1.0 : real

However, type coersion is not possible, and, just like all failed _dynamic invocations, raises a RuntimeTypeError.

# _dynamic one_real as int;
uncaught exception PartialDynamic.RuntimeTypeError at (interactive):7.0

Next up, we have views into record types:

# val a_car = dynamic { make = "Ford", model = "T"};
val a_car = _ : void dyn
# val a_make = _dynamic a_car as {make: string} dyn;
val a_make = _ : {make: string} dyn
# view a_make;
val it = {make = "Ford"} : {make: string}

As you can see, we have dynamically constrained the record type to just the fields that interest us, and subsequently materialized the data with view.

And finally, let’s take a look at ‘parsing’ JSON, using example data to inform the runtime type assigment.

# val json_car = "{\"make\":\"Ford\",\"model\":\"T\"}";
val json_car = "{\"make\":\"Ford\",\"model\":\"T\"}" : string
# val dyn_car = Dynamic.fromJson json_car;
val dyn_car = _ : void dyn
# val dyn_model = _dynamic dyn_car as {model: string} dyn;
val dyn_model = _ : {model: string} dyn
# view model;
val it = {model = "T"} : {model: string}

This mode of operation means that we can’t use SML#’s runtime typing as a silver bullet for ingesting external JSON data. We have to actually provide the runtime with an example of what we’re trying to extract, which means that only regular, normalized JSON data can make it into the system.

The rest of the chapter has us implement a couple practical programs, such as:

  1. fetching JSON-formatted COVID-19 statistics from Japanese government sites, and displaying the results
  2. parsing JSON on the command-line to get runtime program arguments (and displaying a nice message when the runtime JSON parsing fails)

Some thoughts

It took a long time for me to publish this post — I started implementing the code in this chapter on October 28th, and finished the exercises over the Christmas holiday break.

In between when I started and now, the Japansese COVID statistics website https://data.corona.go.jp has gone down. When the COVID pandemic was happening, it seemed to be the most important thing in the world, so much that a university textbook used a goverment stats website as a permanent reference. Now, the data has been moved to the website of the Japanese Health Ministry, and is only available as CSV.

As a result of this, subsequent chapters (SQL integration, etc.) will need some tweaking, as they all rely on the JSON data source being up. In either case, stay tuned for more in the coming year.

2024-01-02 Update

Luckily, the authors of the book have preserved a copy of the reference data in a github repo with example code.

Finally, thank you for reading this blog. May you have a Happy New Year 2024!

Previous | Up | Next