parsing aemo data with clojure

2015-10-25

intro

We're going to use Clojure to quickly create a utility for parsing a RM16 file from AEMO and displaying the volume in MWh per profile per state. If you are reading this post I'm just going to assume you are familiar what both AEMO and a RM16 file. The solution will make use of Prismatic's schema library as it adds a little more documentation and formality to the code base. All resulting figures will be obfuscated as they would be commercially sensitive.

the data types

One of biggest issues I find with Clojure is that it's hard to revisit code and work out what it is doing. The main reasons I find for this are:

The lack of type annotations and the inherent documentation they provide.
Deep levels of function calls meaning you need to read the code right to left.
It's a type of Lisp.

The last one I can't change but the first two we will look to address.

Prismatic's Schema library is not a type system but it does allow you to define the shape of a data structure and then be able to validate that shape.
There has been a bit written in the Clojure community lately about writing readable Clojure and it has reinforced my practice of writing short functions using the threading macros.

For this parser we define 2 record types using schema. The first, RM16Row, represents a typed row of the CSV data from the RM16 file. The other represents our resulting RM16Summary data type which stores the per jurisdiction and profile results.

In addition to the normal behaviour of defrecord the schema.core version creates a schema that can be used to validate entities at any point of the program. We can see the validation being called in line 43 of the above snippet, validate will throw an exception if the provided data doesn't adhere to the schema.

So far I have found the little bit of additional formality provided by schema greatly enhances the readability of my code base.

extracting the demand data

The file we are looking to parse is a XML file containing a CSV block of data. Each line of the CSV block will be mapped into a RM16Row record. These will act as the base data for further transformations and calculations. To help with the xml parsing we are using the clj-xpath library. It has some helpful overloaded conversion functions like xml->doc and jquery style selectors like $:text that make it easy to extract the CSV block from the file.

transforming the data

Now that we have the data a sequence of RM16Row row records we can transform it into RM16Summary records using Clojure's standard functions. The block below shows the functions used to perform the transformations. It also represents a nice example of my preferred style of Clojure at the moment.

Specifically:

Short functions made up of let bindings to provide meaningful parameters to a result expression. Even short functions in Clojure are dense so I try and minimize the magic. Each function does a single task. It could be argued that some of the reduce operations in this snippet should themselves be extracted into functions with more descriptive names.
Thread these functions together in a pipeline that gives a descriptive step by step outline of what you are trying to achieve. I have been finding the some->> threading macro useful in these scenarios. It will short circuit the operation if any function within the pipeline returns a nil value.

the output

Below is the obfuscated output of the code after the jar has been created using lein uberjar. It was reasonably quick to get to a solution working. While it took me longer to write than it would have in Groovy but I think the solution in Clojure lends itself to better abstractions.

I do like the interactive workflow in Clojure, it helps me get to the essence of my problem quicker. Like any coding session, you need to spend some time to build momentum when working with the REPL but once you do it really does provide an immersive development experience.

arachnid studios.

services.

blog.

contact.

archives