»
S
I
D
E
B
A
R
«
Chunked XML parsing is the latest thing, you know
May 15th, 2010 by blackh

Uhh, hello.  Welcome to my first blog post ever – and thanks Axman6 for letting me be a “guest blogger”.

It’s rather unfashionable on #haskell, but I like XML.  So, 18 months ago, I took over the hexpat package from Evan Martin.  It was going to be a small project – a simple XML parser binding to Expat.  The fastest Haskell XML parser alive.  Or so I thought.

It’s become a passion, a way of life.  It’s XML parsing in Haskell the way I think it should be done.  The best as well as the fastest.  (I like to think big.)

I’ve finally finished adding all the features that I and a number of contributors wanted, and I would now like to announce that hexpat is going beta.  I want to make this package really, really good, so please help by testing and critiquing.  I want to stabilize hexpat, but hexpat-iteratee will be unstable for a while yet.

The future is chunky

The cherry on top of the hexpat galaxy is the still experimental hexpat-iteratee based on Oleg Kiselyov’s iteratee, which is a bit of a hot ticket these days.  It provides lazy XML parsing without the practical issues and philosophical dodginess inherent in Haskell’s lazy I/O through functions like hGetContents.

hexpat-iteratee allows for effectful XML processing done in a functional way, and the magic behind this is Yair Chuchem’s humbly named List package.  It is “merely” a generalization of lists, and I think it deserves to be a common piece of infrastructure.

The example project is a chunked XML-over-TCP movie database lookup server.  Every home should have one.  So, let’s start like all good blogs do, with imports:

{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent
import Control.Exception
import Control.Monad
import Control.Monad.IO.Class
import Control.Monad.ListT
import qualified Data.ByteString as B
import qualified Data.ByteString.Unsafe as B (unsafeUseAsCStringLen)
import Data.Iteratee
import Data.Iteratee.IO.Fd
import Data.Iteratee.WrappedByteString
import Data.List.Class as List
import Data.Maybe
import Data.Text (Text)
import qualified Data.Text as T
import Network
import System.IO
import System.Posix.IO (handleToFd, fdWriteBuf, closeFd)
import System.Posix.Types (Fd)
import Text.XML.Expat.Chunked
import qualified Text.XML.Expat.Chunked as Tree
import Text.XML.Expat.Format
import Foreign.Ptr
The first thing we want to do is listen on a socket.  I could use handles, sockets, or file descriptors. With handles, this code does not work interactively. Disabling the buffering does not seem to work at all in GHC 6.10 or 6.12. Sockets would be ideal, but to save me writing an iteratee driver, I’m left with file descriptors which unfortunately means this code only works on GHC 6.12 on a POSIX system.  fdPutStrBS is the only glue I need then – it writes a ByteString to a Fd.   Here’s the code:
main :: IO ()
main = do
    let port = 6333
    putStrLn $ "listening on port "++show port
    ls <- listenOn $ PortNumber port
    forever $ do
        (h, _, _) <- accept ls
        forkIO $ handleToFd h >>= \fd -> do
            iter <- parse defaultParserOptions (session (fdPutStrBS fd))
            result <- enumFd fd iter >>= run
            print result
          `finally`
            closeFd fd

fdPutStrBS :: Fd -> B.ByteString -> IO () fdPutStrBS fd bs = B.unsafeUseAsCStringLen bs $ \(buf, len) -> writeFully (castPtr buf) (fromIntegral len) where writeFully _ len | len == 0 = return () writeFully buf len = do written <- fdWriteBuf fd buf len if written < 0 then fail "write failed" else writeFully (buf `plusPtr` fromIntegral written) (len - written)

Once we’ve accepted the connection, we get parse (from hexpat-iteratee) to make us an iteratee.  The second argument, “session (fdPutStrBS fd)” is the handler for processing the document.  We then pass this iteratee to iteratee’s enumFd, whose job it is to pull the input data out of the Fd and feed it into the parser. parse is monadic in order that it can start the handler before it receives the first data block through the iteratee. This is necessary in case the handler wants to generate output before it gets any input, which we want to do here.

The handler is a co-routine.  When it runs out of input data, it gets suspended, and control returns to enumFd.

session :: (B.ByteString -> IO ())  -- ^ Write output data to socket
        -> ListOf (UNode IO Text)   -- ^ Input XML document
        -> XMLT IO ()
session writeOut inputXML = do
    let outputXML = formatG $ indent 2 $ Element "server" [] (processRoot inputXML)
    execute $ liftIO . writeOut =<< outputXML
    return ()
formatG is a hexpat function to take a tree node and format it as XML, returning one of Yair’s Lists of ByteStrings.  indent is a filter that adds pretty indenting.  The Element is the top level tag of our output XML tree, and its third argument “processRoot inputXML” evaluates the child nodes of the output document.  The entire processing of the document is in a functional style.

execute here makes all the IO actually happen. It iterates over a List of monadic actions and sequences them. This translates into a sequence of writes of data blocks to the socket.  The elements in the list are monadic, so execute also must execute those in order to extract each output ByteString.

In this way, even though processRoot is pure at the top level, it can contain effectful computations.

processRoot :: ListOf (UNode IO Text) -> ListOf (UNode IO Text)
processRoot root = do
    Element _ _ children <- genericTake 1 root
    child <- children
    extractElements child
  where
    extractElements :: UNode IO Text -> ListOf (UNode IO Text)
    extractElements elt | isElement elt = processCommand elt `cons` mzero
    extractElements _                   = mzero
ListOf is a type function that conceals a long-winded type name.  This function maps the input document to a list of output nodes.

The root of the input document is actually given as a List containing one item – the top-level XML tag.  The reason why we do this is so that we have to ask for it to be pulled.  If it were just passed as a UNode IO Text type, we would have to calculate it before the handler was called, and the handler wouldn’t get a chance to do output before it requests input.

The function is implemented using List’s Monad instance, which behaves exactly like a list monad.  The reason for genericTake 1 root is so we stop processing after the root node and don’t wait for a node that will never come.  I need to fix this in hexpat-iteratee.

`cons` is the generalized list cons operator like : and  `mzero` corresponds to [].

processCommand :: UNode IO Text -> UNode IO Text
processCommand elt@(Element "title" _ _) = Element "title" [] $ joinL $ do
    txt <- textContentM elt
    return $ search txt
processCommand (Element cmd _ _) = Element "unknown" [("command", cmd)] mzero
Here is our command processor.  We have one command <title>foo</title> that finds all movies whose titles contain foo.

joinL is a bit of List magic that lets you drop down into the underlying monad, which in this case is XMLT IO a.  joinL’s type is :: ItemM l (l a) -> l a where ItemM l is a type function giving the list’s monad.  So, the stuff after joinL resolves to a type of :: XMLT IO (ListOf (UNode IO Text)).

search :: Text -> ListOf (UNode IO Text)
search key = joinL $ do
    iter <- liftIO $ parse defaultParserOptions $ \root -> do
        let l = do
                elt@(Element _ _ children) <- genericTake 1 root
                movie <- List.filter isElement children
                return movie
        execute l
        return l
    eMovies <- liftIO $ fileDriver iter "movies.xml"
    case eMovies of
        Left err -> fail $ "failed to read 'movies.xml': "++show err
        Right movies -> return $ List.filter matches movies
  where
    matches elt = key `T.isInfixOf` fromMaybe "" (getAttribute elt "title")
Here’s where our handler does some real I/O.  We read our database from a flat file using the same method of parsing.  Passing possibly unexecuted nodes outside the XMLT monad is a bit wrong, and needs to be addressed in the design, but here it works as long as I execute them.  Alternatively a pure XML parse would work.  hexpat has functions to convert between pure and monadic node types.

So, I build and run the server, and here is the result, using Unix’s nc command as my client.  I typed this:

<a>
<title>of the</title>
The output is:
<?xml version="1.0" encoding="UTF-8"?>
<server>
 <title>
   <movie id="dvzrwfvryd" disc="41" title="War of the Worlds (2005)"
        director="Steven Spielberg" genre="Sci Fi Thriller" rating="6"
        description="Tom Cruise alert" imdbID="tt0407304"/>
   <movie id="xxvjgxpokp" disc="44" title="Shaun of the Dead"
        director="Edgar Wright" genre="Comedy Horror" rating="8"
        description="British send-up zombie movie" imdbID="tt0365748"/>
   <movie id="duvcjsygqi" disc="104" title="March of the Penguins (La Marche de l&apos;empereur)"
        director="Luc Jacquet" genre="Documentary" description="" imdbID="tt0428803"/>
   <movie id="dawcezoiro" disc="109" title="Pirates of the Caribbean: Dead Man&apos;s Chest"
        director="Gore Verbinski" genre="Action/Comedy" rating="7" description="" imdbID="tt0383574"/>
 </title>
(New lines added for readability)

And the session can process more commands interactively.

And pickled

I should also mention my related hexpat-pickle package which is a shameless rip-off of the picklers from Uwe Schmidt’s excellent hxt package.  I find it a very practical and quick way to bang out XML picklers.  (It doesn’t work with hexpat-iteratee yet.)

Bye bye

Here’s the code in downloadable form.  Make sure you use the monads-fd and transformers packages instead of mtl.  Also hexpat-iteratee and text.

I hope you found this interesting.  I hope the XML haters of #haskell will be miraculously transformed into XML tolerators, and I hope you’ll help me improve hexpat. – Stephen Blackheath, Manawatu, New Zealand

A small follow up
Jan 8th, 2010 by Axman6

In my previous post about why I love the cereal package, I went through the development of a bencoding parser and encoder. Brian was kind enough to point out some of the flaws I’d made in this code (which I should add had been caused from me not actually checking the spec while writing the code, obviously a bad idea), and from these comments, I think I’ve managed to fix most of the problems:

Hi, thanks for writing this stuff. I think it could be pretty cool, but it could benefit from more precise reading and implementation of the spec.

For example, bencoded integers can be negative.

Also, my alarms go off whenever I see ‘read’. In ‘getBString’, you pass ‘read count’ to ‘getByteString’, which expects Int. But check, e.g., ‘read (show $ 2^64-1) :: Int’ in ghci. So if the torrent data is malformed, you could end up passing a negative length to ‘getByteString’. Maybe it knows how to deal with that, but it’s not something you should rely on.

You also have to decide what to do about dictionaries you read whose keys aren’t in order, etc.

Basically, please be more precise, especially if you put this on Hackage. This stuff is supposed to be industrial strength. Thanks.

The first problem, not handling negative integers was pretty trivial to fix, all I needed to do was check to see if there was a ‘-’ char out the front, and if not, just get all the digits, and then read them:

-- | Parses a BInt
getBInt :: Get BCode
getBInt = BInt . read <$> getWrapped 'i' 'e' intP
    where intP = ((:) <$> char '-' <*> getDigits) <|> getDigits

Brian also pointed out something I also wasn’t particularly happy with, the use of read to read in an Int64. This should under normal circumstances be more than large enough to read any bytestring that should be in any bencoded data (.torrent files are usually less than 1-200KB), so we should never have run into a problem here, but it’s still good to make sure we can be ‘industrial strength’:

-- | Parses a BString
getBString :: Get BCode
getBString = do
    count <- getDigits
    BString <$> ( char ':' *> getStr (read count :: Integer))
    where maxInt = fromIntegral (maxBound :: Int) :: Integer
          getStr n | n >= 0 = B.concat <$> (sequence $ getStr' n)
                   | otherwise = fail $ "read a negative length string, length: " ++ show n
          getStr' n | n > maxInt = getByteString maxBound : getStr' (n-maxInt)
                    | otherwise = [getByteString . fromIntegral $ n]

Here you can see we’re now using an Integer as the read value, and taking chunks of maxBound :: Int bytes, until there are less than that many bytes left to fetch.

I’ve decided to ignore the problem with dictionaries with out of order elements, I can see this being something others may have overlooked in their implementations, and it’s entirely possible that other implementations do not put the keys in the right order. Our implementation does, but can easily handle malformed implementations. I see this is a bonus, and I hope others do too (I feel the code is more robust, and that’s always good).

I hope this has made some difference to the code, and what people think of it.

Until next time,

– Axman

Why I love Cereal
Jan 5th, 2010 by Axman6

Cereal, as you may know from my previous posts is a library for parsing binary data from strict ByteStrings. It is very similar to the binary package, but importantly, provides both an Alternative instance, and an Either String a return type for the decode function, which tells you where the parse failed.

I’ve been playing around with cereal lately in jlouis’ haskell-torrent project, rewriting the various binary parsing and producing parts of the program (the torrent file parser, and the wire protocol parser). I though it would be nice to share some of the code used for these, to demonstrate how easy cereal makes it to do such things.

To begin with, I’ll show you the part that decodes and encodes torrent files (if needed in the future). Torrent files are encoded using a very simple encoding, known as bencoding, which consists of four major primitives: Integral numbers, Strings of bytes, Arrays of bencoded objects, and Dictionaries of String, bencoded object pairs. This is very nicely represented using this datatype:

-- | BCode represents the structure of a bencoded file
data BCode = BInt Integer                       -- ^ An integer
           | BString B.ByteString               -- ^ A string of bytes
           | BArray [BCode]                     -- ^ An array
           | BDict (M.Map B.ByteString BCode)   -- ^ A key, value map
  deriving Show

the specification for bencoded data goes something like this:

Integers are encoded as the ASCII character for ‘i’ as a byte, followed by the digits of the integral value, terminated by the ASCII byte for ‘e’.

Eg: the number ‘42’ would be encoded as “i42e”

Strings are encoded as the digits of their length, followed by a colon (‘:’), then the bytes of the string. these strings are really just byte sequences, and probably shouldn’t be treated as having an encoding (as jlouis and I found out when I tried to test the current code on GHC 6.12.1, with the BString type using Strings, instead of ByteStrings, and finding out that the simple test contained byte sequences that could not be represented as Strings).

Eg: the string “hello” would become “5:hello”, “hello world” would become “11:hello world”

Arrays are encoded as ASCII ‘l’ (for list I believe), followed by any number of bencoded objects, terminated by an ASCII ‘e’. (This is where using binary became difficult, as you had to explicitly check whether you had reached the terminating ‘e’ using lookAhead when parsing before attempting to parse another bencoded object, du the the lack of actual failure handling)

Eg: ["Hello", 123] would become “l5:helloi123ee”. Notice how we’ve used the previous definitions for integral numbers, and strings.

Dictionaries are encoded as an ASCII ‘d’, followed by the String, object pairs, followed by an ASCII ‘e’.

Eg: fromList [("test",123),("arr",[1,2,"hello"])] would become “d4:testi123e3:arrli1ei2e5:helloee”.

It looks a bit of a mess, but it is quite efficient.

Encoding

When writing my Serialize instance (Cereal’s version of the Binary class) for the BCode type, I decided it would be much easier to write the put methods first. This turned out to be rather straight forward, once I’d written a few helper functions.

toW8 :: Char -> Word8
toW8 = fromIntegral . ord

fromW8 :: Word8 -> Char fromW8 = chr . fromIntegral

toBS :: String -> B.ByteString toBS = B.pack . map toW8

fromBS :: B.ByteString -> String fromBS = map fromW8 . B.unpack

-- | Put an element, wrapped by two characters wrap :: Char -> Char -> Put -> Put wrap a b m = do putWord8 (toW8 a) m putWord8 (toW8 b)

-- | Put something as it is shown using @show@ putShow :: Show a => a -> Put putShow x = mapM_ put (show x)

With these in hand, I set to work implementing the put function. The Integer and Array functions were straight forward:

instance Serialize BCode where
     put (BInt i)     = wrap 'i' 'e' $ putShow i
     put (BArray arr) = wrap 'l' 'e' . mapM_ put $ arr
 

The Dictionary and String implementations weren’t too bad either:

    put (BDict mp)   = wrap 'd' 'e' dict
                      where dict = mapM_ encPair . M.toList $ mp
                            encPair (k, v) = put (BString k) >> put v
     put (BString s)  = do
                          putShow (B.length s)
                          putWord8 (toW8 ':')
                          putByteString s

As you can see, the code is quite clear, and matches the specification quite well.

Decoding

Parsing the data was the next step. this proved a little more difficult, but with my recent (shallow) experience with Parsec, I knew what was needed.

I decided to start by writing some useful combinators (this is a lie, I wrote them when needed, but lying makes the post flow better >_>). These included the following:

-- | Get a Char. Only works with single byte characters
 getCharG :: Get Char
 getCharG = fromW8 <$> getWord8

-- | Parse a given character char :: Char -> Get () char c = do x <- getCharG if x == c then return () else fail $ "Expected char: '" ++ c:"' got: '" ++ [fromW8 x,'\'']

-- | Get something wrapped in two Chars getWrapped :: Char -> Char -> Get a -> Get a getWrapped a b p = char a > p < char b -- The same as char a >> p >>= \x -> char b >> return x

-- | Parse zero or items using a given parser many :: Get a -> Get [a] many p = many1 p mplus return []

-- | Parse one or more items using a given parser many1 :: Get a -> Get [a] many1 p = (:) <$> p <*> many p

-- | Returns a character if it is a digit, fails otherwise. uses isDigit. digit :: Get Char digit = do x <- getCharG if isDigit x then return x else fail $ "Expected digit, got: " ++ show x

-- | Get one or more digit characters getDigits :: Get String getDigits = many1 digit

My favourite two definitions here are many and many1, which nicely show the use of Alternative: they are mutually recursive, with many1 being the only one of the two to actually do and parsing, while many checks to see if many1 failed to parse one object using the parser p. It’s really quite beautiful, and makes the code that follows a hell of a lot nicer to write. This is where the love mentioned in the title comes in by the way.

With these in hand, I could now go ahead and write the actual parsers for various BCode types. Parsing BInts and BArrays is dead simple now:

-- | Parses a BInt
 getBInt :: Get BCode
 getBInt = BInt . read <$> getWrapped 'i' 'e' getDigits

-- | Parses a BArray getBArray :: Get BCode getBArray = BArray <$> getWrapped 'l' 'e' (many get)

As as side note, I’ve now come to see just what the folks on #haskell were on about when they said Applicative is nice. I think I’ve fallen in love (yet again!).

BStrings were a little more difficult, but not hard, given what I’ve just written:

-- | Parses a BString
 getBString :: Get BCode
 getBString = do
     count <- getDigits
     BString <$> ( char ':' *> getByteString (read count))

Here we get as many digits as we can, followed by a colon, and then take the number of bytes the digits specified. Finally, we have the BDict definition, which also is quite nice, if slightly annoying with its use of pattern matching (don’t get me wrong, i love pattern matching, but it’s the only place it’s used in the parser :( )

-- | Parses a BDict
 getBDict :: Get BCode
 getBDict = BDict . M.fromList <$> getWrapped 'd' 'e' (many getPairs)
     where getPairs = do
             (BString s) <- getBString
             x <- get
             return (s,x)

Putting it all together, we finally have a definition for the get function in the Serialize class.

    get = getBInt <|> getBArray <|> getBDict <|> getBString
 
A rather clean, elegant, and hopefully correct serialiser and deserialiser for the bencoded format used in torrent files. I’m considering releasing this code as a separate package on hackage, but I’m still not sure how widely it might be used. I have a string feeling that that would not be very wide at all, but that a library of more advanced combinators for cereal would make life a lot easier for others like me who have some strange binary formats that need to be parsed in an efficient manner.

Please, I implore you, do let me know what you think of this all, I’m always interested in seeing what others think of my code, and ways to improve it.

Until next time,

— Axman

Playing with Haskell-Torrent
Jan 2nd, 2010 by Axman6

Over the last week or so, I’ve been playing around with some of the minor details of jlouis’ Haskell-Torrent code. The main pieces I’ve been playing with have been the more binary centric pieces: the torrent file en/decoding and since Wednesday, the wire protocol.

After trying to compile the code using GHC 6.12.1, I noticed that a rather strange dependency: HCodecs. Investigating further, I realised that all that was being used was its fairly nice binary builder and parser, and was being used in the parsing and construction of bencoded files. This seemed to me to be the domain of one of my favourite packages, binary. I also noticed that when trying to parse the sample torrent file jlouis had provided, there was binary data that could not be represented in a String type, because there were byte sequences that could not be parses into Char’s.

So I went off to try and implement the BCode module using Data.Binary. The encoding was really simple and straight forward, but parsing was another matter. Binary has no real error handeling, which meant that decoding things like lists where you keep adding elements to the list until you reach an ‘e’, was made quite verbose due to this fact. After implementing a (semi) working module, I went in search of a more parsec like interface to binary parsing. This is when someone on #Haskell informed me of cereal, a package based on binary, which added an Alternative instance, and an Either String a return type for better error reporting.

I fell in love. This was exactly what I was after, I could create my own combinators like many and many1, with the same ease I could with parsec. Using cereal, I was able to make a really clear, concise parser, which should also be quite efficient.

I will post some of this code at a later time, when i’m not writing on my iPhone, and show you just what I mean. I’m also considering releasing the BCode module as a package of it’s own, if anyone else feels it might me useful to them. Either that, or I might write up a module of useful combinators for use with cereal.

Until next time, happy new year and I hope this year treats you all better than the last one did.

– Al

If you need speed, don’t talk to main!
Mar 4th, 2009 by axman6

Yesterday, I submitted my first program modification to the computer language shootout game, which is a change to the thread-ring program. If you’ve followed the shootout at all, you will probably know that haskell dominates at this benchmark, with the next contender (Mozart/OZ) being 50% slower than haskell already.

While i was having a look at the code, I noticed that the haskell entry was forkIO’ing a lot of threads, which is exactly what we want. But we weren’t quite forking enough; one of the 503 threads was the main thread. Bad idea!

I learnt a few weeks ago, while working on my AVar package, that communicating between bound and non bound threads (forkIO’d vs. main) when using things like MVars can be really slow. And this is exactly what was happening:

main = do
a <- newMVar . read . head =<< getArgs
z <- foldM new a [2..ring]
thread 1 z a

as you can see, the last/first thread was being run in main, so once per loop around the ring, we were talking to main via an MVar, and then talking to another forkIO’d thread via another MVar. So here’s what I did:

main = do
a <- newMVar . read . head =<< getArgs
z <- foldM new a [2..ring]
ret <- newEmptyMVar
forkIO (thread 1 z a >>= putMVar ret)
takeMVar ret

Two and a half lines of code. What were the results? Well, when the program was run with n = 50,000,000, the original ran in ~14.2s, with -N{1,2,3,4}, my modified one took on average ~9.3s, which is about a 33% improvement. It should be noted that when both programs are compiled without -threaded, there is almost no difference in speed at all (and they both run considerably faster than the non threaded version). Need to talk to the shootout peeps and find out if it’s against the rules to compile without -threaded, and get an even more massive speed boost.

– Axman

Back at uni
Feb 27th, 2009 by axman6

It’s Friday night, and I have just finished my first week of my second year at the Australian National University. This is the reason I haven’t written anything here for the last week, and my blog rate will probably remain low for the next 10 weeks… but I’ll do my best! This semester I’m doing some rather interesting courses (and some not so much…)

  • COMP2300Introduction to Computer Systems Which is about.. well computer systems. It covers topics like binary representations of numbers, C, computer architecture, etc. The thing I am however most looking forward to the most will be the third assignment, which will be based around the uni’s new T2 (the one Ben Lippmeier is using for most of his GHC on SPARC work)

  • COMP2100Software Construction This seems to be more about how to write larger software, than actual programming (though it does emphasise programming weekly to get into good habits). I’m interested to see for the Personal Software Process (PSP) stuff goes, it seems to me that it can make you a more productive programmer quite easily. After only an hour of using the time keeping for it, i’ve noticed i don’t get distracted as much, because it’s not programming time, so all i do is program in this time. Means i distract myself less, which is very helpful.

  • ENGN2211 – Electronic Circuits This is a second year electronics course, and should be fairly interesting too. Not much to say about this one…

  • ENGN2226 – Engineering Systems Analysis This course is taught by a pretty cool German maths PhD, and focuses mostly on modelling systems, which is obviously an integral part of engineering. We’ll have to use MATLAB for this, which from what I’ve heard from others may not be very fun at all. Could be a good place to use some of the nice statistical packages for haskell…

Anyway, I’ll try and keep up with the haskelling, but I’ll be doing a lot of non haskell this semester (C, Java, ASM, MATLAB), so it may have to fall by the way side for now :(

– Axman

AVar changes
Feb 22nd, 2009 by axman6

Yesterday, i stuck AVar 0.0.4 on hackage. Again, not thinking things through as well ad i should have, i realised later it should have been the 0.1.0 release, since it was the first major change to the package since i first wrote it.

So what’s new? Well, i split it up a bit, into three modules: Data.AVar, the classic interface, Data.AVar.Unsafe, the same as Data.AVar, but it throws any exceptions encountered by the variable, instead of passing them back to the user, and Data.AVar.Internal, which contains the code for the actual for AVars, and the datatypes used by them.

I think this is a nice way of doing things, and hopefully will make using the system a little easier if you don’t expect to cause exceptions. So as always, if you have any suggestions, please let me know.

– Axman

AVar released (three times)
Feb 20th, 2009 by axman6

The other day, I put my AVar package on hackage. Doing so taught me more than i expected it would, mainly that i need to do more testing of cabal packages before submitting them, and also to make sure you’ve exported all the functions you need to actually use the package (I forgot to put putAVar in the module exports -_-). So i’m up to release 0.0.3, without much work being done at all (although, I can’t think of much more to do really).

With the current release, i think that all exceptions that may occur from functions passed by the user to the variable should be caught by and handled correctly by the variable. I guess in a sense, AVars are smart variables that know what is and isn’t good for them, and will refuse to hurt themselves (hopefully!).

If you’re interested in seeing what AVar can do, check out the Hackage page. I really have no idea how it might be useful, but I’d love to hear others thoughts on it. If you have any requests, please let me know, and I’ll see what I can do (proper transactional … transactions are not something I want to tackle, especially with uni starting on monday)

– Axman

ASTM updates
Feb 18th, 2009 by axman6

Today, I’ve been talking with some people in #haskell about this ASTM thing, and I’ve had some great input, mainly from Stefan Ljungstrand (ski on #haskell), along with quicksilver and Simon Marlow, and I thought I’d share the changes I’ve put in with their help. I’m still not sure how useful this stuff is, but it’s fun and I’m learning. If you’re wondering what i’m on about, see this post.

So, the new features I’ve added are mainly to do with exception handling, so that variables don’t ‘die’ if you call tail on [], they report the error, and don’t change the ’state’. I’ve also deleted the Swap constructor, and replaced it with a more general Mod’ constructor:

data Transaction a =
Put a
| Get (MVar a)
| Mod (a -> a) (MVar (Maybe E.SomeException))
| forall b. Mod' (a -> (a,b)) (MVar (Either E.SomeException b))
| Atom (a -> Bool) (a -> a) (a -> a) (MVar Bool)

This you provide a function which takes the current ’state’, possibly modifies it, and returns something else, and return that something else back to you. This works a lot like the State monad, but allows concurrency.

I’ve changed handler as well to look like this:

handler :: Chan (Transaction a) -> a -> IO b
handler chan !x = do
req <- readChan chan
case req of
Put a         -> handler chan a
Get mvar      -> do
putMVar mvar x
handler chan x
Mod f mvar    -> do
let x' = f x
p <- E.catch (E.evaluate x' >> return Nothing)
(\e -> return (Just e))
putMVar mvar p
case p of
Nothing -> handler chan x'
_       -> handler chan x
Mod' f mvar    -> do
let y@(a,b) = f x
p <- E.try (E.evaluate a >> E.evaluate b)
case p of
Right _  -> do
putMVar mvar (Right b)
handler chan a
(Left e) -> do
putMVar mvar (Left e)
handler chan x
Atom test y n res ->
if test x
then do
putMVar res True
handler chan (y x)
else do
putMVar res False
handler chan (n x)

As you can see, I’ve added some exception handling, but I still need to add some more to the Atom/condModAVar case. What I want is something that can keep running even after there’s been an exception. I mean, who wants a mutable variable to just stop working huh?. I think once I get that done, along with haddock docs, I’ll stick it on hackage.

– Axman

ASTM: redundant STMish fun
Feb 17th, 2009 by axman6

So, tonight, I had an idea for something that I’m not sure if it’s either useful, efficient, interesting or anything other than a waste of time. It was an idea for the emulation of mutable variables, using functions as a way of storing the values in an accumulating parameter of sorts. They could safely be shared between threads, and could (and sort of do) support atomic actions.

It turned out to be really simple to implement, and I think just showing you the code would explain what I’m on about more than anything else. I don’t know how else this could be extended (I’m sure it could be though, for some fairly interesting ideas), and since it has been written in the wee hours of the morning, I may be doing some pretty stupid stuff…

So, I’d love to hear some discussion (though yes, I do already know this could probably be done with MVars or TVars or what have you) and ideas, and here’s the code:

module ASTM where
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Concurrent.Chan
data Transaction a =
Put a
| Get (MVar a)
| Mod (a -> a)
| Atom (a -> Bool) (a -> a) (a -> a) (MVar Bool)
data AVar a = AVar (Chan (Transaction a))
newAVar :: a -> IO (AVar a)
newAVar x = do
chan <- newChan :: IO (Chan (Transaction a))
forkIO (handler chan x)
return (AVar chan)
where
handler chan x = do
req <- readChan chan
case req of
Put a         -> handler chan a
Get mvar      -> do
putMVar mvar x
handler chan x
Mod f         -> handler chan (f x)
Atom test y n res ->
if test x
then do
putMVar res True
handler chan (y x)
else do
putMVar res False
handler chan (n x)
putAVar (AVar chan) x = writeChan chan (Put x)
modAVar (AVar chan) f = writeChan chan (Mod f)
getAVar (AVar chan)   = do
res <- newEmptyMVar
writeChan chan (Get res)
takeMVar res
condModAVar (AVar chan) p t f = do
res <- newEmptyMVar
writeChan chan (Atom p t f res)
takeMVar res

»  Substance: WordPress   »  Style: Ahren Ahimsa
© Alex Mason (Axman6) 2009