archives


tags


parsing aemo data with clojure

2015-10-25

intro

We're going to use Clojure to quickly create a utility for parsing a RM16 file from AEMO and displaying the volume in MWh per profile per state. If you are reading this post I'm just going to assume you are familiar what both AEMO and a RM16 file. The solution will make use of Prismatic's schema library as it adds a little more documentation and formality to the code base. All resulting figures will be obfuscated as they would be commercially sensitive.

the data types

One of biggest issues I find with Clojure is that it's hard to revisit code and work out what it is doing. The main reasons I find for this are:

The last one I can't change but the first two we will look to address.

For this parser we define 2 record types using schema. The first, RM16Row, represents a typed row of the CSV data from the RM16 file. The other represents our resulting RM16Summary data type which stores the per jurisdiction and profile results.

In addition to the normal behaviour of defrecord the schema.core version creates a schema that can be used to validate entities at any point of the program. We can see the validation being called in line 43 of the above snippet, validate will throw an exception if the provided data doesn't adhere to the schema.

So far I have found the little bit of additional formality provided by schema greatly enhances the readability of my code base.

extracting the demand data

The file we are looking to parse is a XML file containing a CSV block of data. Each line of the CSV block will be mapped into a RM16Row record. These will act as the base data for further transformations and calculations. To help with the xml parsing we are using the clj-xpath library. It has some helpful overloaded conversion functions like xml->doc and jquery style selectors like $:text that make it easy to extract the CSV block from the file.

transforming the data

Now that we have the data a sequence of RM16Row row records we can transform it into RM16Summary records using Clojure's standard functions. The block below shows the functions used to perform the transformations. It also represents a nice example of my preferred style of Clojure at the moment.

Specifically:

the output

Below is the obfuscated output of the code after the jar has been created using lein uberjar. It was reasonably quick to get to a solution working. While it took me longer to write than it would have in Groovy but I think the solution in Clojure lends itself to better abstractions.

I do like the interactive workflow in Clojure, it helps me get to the essence of my problem quicker. Like any coding session, you need to spend some time to build momentum when working with the REPL but once you do it really does provide an immersive development experience.

complete code