Working Review of "Practical ML Programming with SML#" (Ohori, Ueno), CHAPTER 8
Accessing External Data
2023-12-31
Before the year ends, let’s take a look at Chapter 8, titled “Accessing
External Data”. This chapter teaches us how to read and interpret files on
disk, as well as how to parse JSON data. Additionally, we are introduced to
ML-style error handling, via explicit handle
expressions and pattern-matching
on exn
constructors.
File I/O
This section gives us a very quick run-down of the TextIO
structure, a
Standard ML Basis module. We implement a very simple file-copy procedure.
Handling errors using the Exception mechanism
This section is quite long, but it contains a lot of information about Standard
ML’s exception mechanism. For those of you who don’t know what it looks like:
SML uses the same single datatype for all exceptions, and this datatype can be
extended by the user via exception
declarations, such as the following:
exception FailedToDownload of string;
Now, we can raise
instances of these exceptions anywhere in our code, and
they will stop the flow of control and bubble up to the nearest handle
expression in the call-stack:
(raise FailedToDownload "howto.txt")
handle
(FailedToDownload filename) => print ("Failed to download: " ^ filename ^ "\n");
This is a nice way to do non-local control flow, such as early-returning from a
map
or fold
once the result is known. This meshes nicely with an
imperative, effectful style of programming:
# fun find (predicate: 'a -> bool) (list: 'a list): 'a option =
> let
> exception Found of 'a
> in
> (app (fn el => if predicate el then raise Found el else ()) list; NONE)
> handle Found x => SOME x
> end;
val find = fn : ['a. ('a -> bool) -> 'a list -> 'a option]
# find (fn x => x > 10) [1,2,3,4,5,6,7,8,9,10,11,12];
val it = SOME 11 : int option
# find (fn x => x > 10) [1,2,3];
val it = NONE : int option
It is also the preferred way of signaling exceptional circumstances, such as I/O failures. For example, trying to open an non-existent file raises an IO.Io exception, with some attached information about the fault.
# TextIO.openIn "missing.txt"; ();
uncaught exception IO.Io: openIn at src/smlnj/Basis/IO/text-io.sml:807.24(25074)
# (TextIO.openIn "missing.txt"; ())
> handle
> (IO.Io{name, function, cause}) => print (concat ["IO error ",
> name, " ",
> function, "\n"]);
IO error missing.txt openIn
To round out the intro to SML-style error handling, the authors go on to
demonstrate how to implement a generalized IO-error handler. This handler is
then extended with ‘finalizer’ functions, which play the role of a
(syntactically unavailable) finally
clause.
Overall, the first part of this chapter is a rehash of exceptions and error handling in Standard ML. The next part is a radical departure from the standard.
Reading JSON Data
This section introduces how SML# implements dynamic typing at runtime. Yes!
Even though SML# is compatible with Standard ML, it is capable of runtime
typing, via the built-in Dynamic
module, which is a very interesting
extension to the language.
The thing to note, is (as far as I’m aware), the scope of dynamic typing is restricted to:
- existing types
- views into record types (i.e. particular fields)
- JSON (this chapter)
- SQL
That is to say: on one hand we can’t get in on the “magic”, as it is contained
within the language runtime, and implement our own Dynamic converters, say,
from a binary format like Protobufs. On the other hand, we can be sure that no
3rd-party SML# code will spring some outrageous dynamic typing scheme on us. To
me, this is a reasonable point in the design space, although it definitely
whets the apetite for what could be done if Dynamic
was more open to
user-level tweaks.
Here’s how the dynamic typing works, in a nutshell. Firstly, we can dynamically recover the types of existing values:
# open Dynamic
# val one_int = dynamic 1;
val one_int = _ : void dyn
# val one_real = dynamic 1.0;
val one_real = _ : void dyn
The type returned by dynamic
is void dyn
, which means it’s a dynamic value
without a particular type-level interpretation attached. We have to provide the
interpretation, using the built-in function _dynamic EXP as τ
:
# _dynamic one_int as int;
val it = 1 : int
# _dynamic one_real as real;
val it = 1.0 : real
However, type coersion is not possible, and, just like all failed _dynamic
invocations, raises a RuntimeTypeError
.
# _dynamic one_real as int;
uncaught exception PartialDynamic.RuntimeTypeError at (interactive):7.0
Next up, we have views into record types:
# val a_car = dynamic { make = "Ford", model = "T"};
val a_car = _ : void dyn
# val a_make = _dynamic a_car as {make: string} dyn;
val a_make = _ : {make: string} dyn
# view a_make;
val it = {make = "Ford"} : {make: string}
As you can see, we have dynamically constrained the record type to just the
fields that interest us, and subsequently materialized the data with view
.
And finally, let’s take a look at ‘parsing’ JSON, using example data to inform the runtime type assigment.
# val json_car = "{\"make\":\"Ford\",\"model\":\"T\"}";
val json_car = "{\"make\":\"Ford\",\"model\":\"T\"}" : string
# val dyn_car = Dynamic.fromJson json_car;
val dyn_car = _ : void dyn
# val dyn_model = _dynamic dyn_car as {model: string} dyn;
val dyn_model = _ : {model: string} dyn
# view model;
val it = {model = "T"} : {model: string}
This mode of operation means that we can’t use SML#’s runtime typing as a silver bullet for ingesting external JSON data. We have to actually provide the runtime with an example of what we’re trying to extract, which means that only regular, normalized JSON data can make it into the system.
The rest of the chapter has us implement a couple practical programs, such as:
- fetching JSON-formatted COVID-19 statistics from Japanese government sites, and displaying the results
- parsing JSON on the command-line to get runtime program arguments (and displaying a nice message when the runtime JSON parsing fails)
Some thoughts
It took a long time for me to publish this post — I started implementing the code in this chapter on October 28th, and finished the exercises over the Christmas holiday break.
In between when I started and now, the Japansese COVID statistics website
https://data.corona.go.jp has gone down. When the
COVID pandemic was happening, it seemed to be the most important thing in the
world, so much that a university textbook used a goverment stats website as a
permanent reference. Now, the data has been moved to the website of the
Japanese Health Ministry, and is only available as CSV.
2024-01-02 Update
Luckily, the authors of the book have preserved a copy of the reference data in a github repo with example code.
Finally, thank you for reading this blog. May you have a Happy New Year 2024!