Wednesday, January 19, 2011

Parsing Shapefile format files using the State monad

I assume that for many people who learn Haskell, there comes the moment that they are faced with this mysterious entity called "monad". Although the definition of the Monad is rather short, it's abstract nature make it somewhat difficult to fully comprehend. There are many Monad tutorials out there on the web, so I won't try to give another one.

Instead, my humble contribution for helping making Monad a bit clearer to newcomers, would be to present a real world example of one the most popular monad types - the State monad. Again, I assume that the reader is already familiar with the general definition of monads (this link provides a good starting point). Before using the State monad, it's important to examine how it is defined:

If the definition above is not clear, this link provides a thorough explanation. I will be using the State monad to parse a Shapefile. The Shapefile format is the standard file format for representing GIS data. In my parser implementation, I was only interested in parsing a subset of the shapes that can be represented in the Shapefile, namely: polygons, polylines and points. I used the following PDF as guideline on how to parse the file. We start by defining the geometric shapes that we would construct:

Notice,that the Shape type has three different constructors for the different types of shapes. Next, we define the Parser type:

The parser is defined to be a State monad, whose state is a lazy ByteString. It means that we would read the shapefile as a stream of bytes, and feed it to the parser. We start parsing the file, by skipping (and reading) the header (we are only interested in the field that tells the type of the shapes):

The above function uses two helper functions, skip and parseLittle32. The skip function simply skips the given amount of bytes. The parseLitte32 function, reads a little endian 32 bit word:

The parseAllShapes function, passes on the rest of byte stream and "decompose" it to a list of shapes. It uses a helper function, parseShape, whose definition is beyond the scope of this post:

I hope that the general idea of using the State monad in this context is clear. Here is an output example of the parser - a map of Mexico, consisting three layers (states, roads and cities). The rendering was done in the method described in my previous post:

Again, special thanks go to Zohar Kelrich for helping me figuring out the topic of monads.

No comments:

Post a Comment