Over the last week or so, Ivan Miljenovic and I have been busy organising AusHac2010. We’ve made a lot of progress, and are announcing the dates as the 16th-18th of July. If you’d like to come along and work on projects like:
then please put your name down on the sign up page.
This should be a great opportunity for Aussie (and non aussie!) haskell hackers to come and meet all those people you know from Planet Haskell and #haskell, and give something back to the community, while having a great time.
Hope to see you there, – Alex Mason
In my previous post about why I love the cereal package, I went through the development of a bencoding parser and encoder. Brian was kind enough to point out some of the flaws I’d made in this code (which I should add had been caused from me not actually checking the spec while writing the code, obviously a bad idea), and from these comments, I think I’ve managed to fix most of the problems:
Hi, thanks for writing this stuff. I think it could be pretty cool, but it could benefit from more precise reading and implementation of the spec. For example, bencoded integers can be negative. Also, my alarms go off whenever I see ‘read’. In ‘getBString’, you pass ‘read count’ to ‘getByteString’, which expects Int. But check, e.g., ‘read (show $ 2^64-1) :: Int’ in ghci. So if the torrent data is malformed, you could end up passing a negative length to ‘getByteString’. Maybe it knows how to deal with that, but it’s not something you should rely on. You also have to decide what to do about dictionaries you read whose keys aren’t in order, etc. Basically, please be more precise, especially if you put this on Hackage. This stuff is supposed to be industrial strength. Thanks.
The first problem, not handling negative integers was pretty trivial to fix, all I needed to do was check to see if there was a ‘-’ char out the front, and if not, just get all the digits, and then read them:
-- | Parses a BInt getBInt :: Get BCode getBInt = BInt . read <$> getWrapped 'i' 'e' intP where intP = ((:) <$> char '-' <*> getDigits) <|> getDigits
Brian also pointed out something I also wasn’t particularly happy with, the use of read to read in an Int64. This should under normal circumstances be more than large enough to read any bytestring that should be in any bencoded data (.torrent files are usually less than 1-200KB), so we should never have run into a problem here, but it’s still good to make sure we can be ‘industrial strength’:
read
-- | Parses a BString getBString :: Get BCode getBString = do count <- getDigits BString <$> ( char ':' *> getStr (read count :: Integer)) where maxInt = fromIntegral (maxBound :: Int) :: Integer getStr n | n >= 0 = B.concat <$> (sequence $ getStr' n) | otherwise = fail $ "read a negative length string, length: " ++ show n getStr' n | n > maxInt = getByteString maxBound : getStr' (n-maxInt) | otherwise = [getByteString . fromIntegral $ n]
Here you can see we’re now using an Integer as the read value, and taking chunks of maxBound :: Int bytes, until there are less than that many bytes left to fetch.
Integer
maxBound :: Int
I’ve decided to ignore the problem with dictionaries with out of order elements, I can see this being something others may have overlooked in their implementations, and it’s entirely possible that other implementations do not put the keys in the right order. Our implementation does, but can easily handle malformed implementations. I see this is a bonus, and I hope others do too (I feel the code is more robust, and that’s always good).
I hope this has made some difference to the code, and what people think of it.
Until next time,
– Axman
Cereal, as you may know from my previous posts is a library for parsing binary data from strict ByteStrings. It is very similar to the binary package, but importantly, provides both an Alternative instance, and an Either String a return type for the decode function, which tells you where the parse failed.
Either String a
I’ve been playing around with cereal lately in jlouis’ haskell-torrent project, rewriting the various binary parsing and producing parts of the program (the torrent file parser, and the wire protocol parser). I though it would be nice to share some of the code used for these, to demonstrate how easy cereal makes it to do such things.
To begin with, I’ll show you the part that decodes and encodes torrent files (if needed in the future). Torrent files are encoded using a very simple encoding, known as bencoding, which consists of four major primitives: Integral numbers, Strings of bytes, Arrays of bencoded objects, and Dictionaries of String, bencoded object pairs. This is very nicely represented using this datatype:
-- | BCode represents the structure of a bencoded file data BCode = BInt Integer -- ^ An integer | BString B.ByteString -- ^ A string of bytes | BArray [BCode] -- ^ An array | BDict (M.Map B.ByteString BCode) -- ^ A key, value map deriving Show
the specification for bencoded data goes something like this:
Integers are encoded as the ASCII character for ‘i’ as a byte, followed by the digits of the integral value, terminated by the ASCII byte for ‘e’. Eg: the number ‘42’ would be encoded as “i42e” Strings are encoded as the digits of their length, followed by a colon (‘:’), then the bytes of the string. these strings are really just byte sequences, and probably shouldn’t be treated as having an encoding (as jlouis and I found out when I tried to test the current code on GHC 6.12.1, with the BString type using Strings, instead of ByteStrings, and finding out that the simple test contained byte sequences that could not be represented as Strings). Eg: the string “hello” would become “5:hello”, “hello world” would become “11:hello world” Arrays are encoded as ASCII ‘l’ (for list I believe), followed by any number of bencoded objects, terminated by an ASCII ‘e’. (This is where using binary became difficult, as you had to explicitly check whether you had reached the terminating ‘e’ using lookAhead when parsing before attempting to parse another bencoded object, du the the lack of actual failure handling) Eg: ["Hello", 123] would become “l5:helloi123ee”. Notice how we’ve used the previous definitions for integral numbers, and strings. Dictionaries are encoded as an ASCII ‘d’, followed by the String, object pairs, followed by an ASCII ‘e’. Eg: fromList [("test",123),("arr",[1,2,"hello"])] would become “d4:testi123e3:arrli1ei2e5:helloee”. It looks a bit of a mess, but it is quite efficient.
Integers are encoded as the ASCII character for ‘i’ as a byte, followed by the digits of the integral value, terminated by the ASCII byte for ‘e’.
Eg: the number ‘42’ would be encoded as “i42e”
Strings are encoded as the digits of their length, followed by a colon (‘:’), then the bytes of the string. these strings are really just byte sequences, and probably shouldn’t be treated as having an encoding (as jlouis and I found out when I tried to test the current code on GHC 6.12.1, with the BString type using Strings, instead of ByteStrings, and finding out that the simple test contained byte sequences that could not be represented as Strings).
Eg: the string “hello” would become “5:hello”, “hello world” would become “11:hello world”
Arrays are encoded as ASCII ‘l’ (for list I believe), followed by any number of bencoded objects, terminated by an ASCII ‘e’. (This is where using binary became difficult, as you had to explicitly check whether you had reached the terminating ‘e’ using lookAhead when parsing before attempting to parse another bencoded object, du the the lack of actual failure handling)
lookAhead
Eg: ["Hello", 123] would become “l5:helloi123ee”. Notice how we’ve used the previous definitions for integral numbers, and strings.
Dictionaries are encoded as an ASCII ‘d’, followed by the String, object pairs, followed by an ASCII ‘e’.
Eg: fromList [("test",123),("arr",[1,2,"hello"])] would become “d4:testi123e3:arrli1ei2e5:helloee”.
It looks a bit of a mess, but it is quite efficient.
When writing my Serialize instance (Cereal’s version of the Binary class) for the BCode type, I decided it would be much easier to write the put methods first. This turned out to be rather straight forward, once I’d written a few helper functions.
toW8 :: Char -> Word8 toW8 = fromIntegral . ord fromW8 :: Word8 -> Char fromW8 = chr . fromIntegral toBS :: String -> B.ByteString toBS = B.pack . map toW8 fromBS :: B.ByteString -> String fromBS = map fromW8 . B.unpack -- | Put an element, wrapped by two characters wrap :: Char -> Char -> Put -> Put wrap a b m = do putWord8 (toW8 a) m putWord8 (toW8 b) -- | Put something as it is shown using @show@ putShow :: Show a => a -> Put putShow x = mapM_ put (show x)
fromW8 :: Word8 -> Char fromW8 = chr . fromIntegral
toBS :: String -> B.ByteString toBS = B.pack . map toW8
fromBS :: B.ByteString -> String fromBS = map fromW8 . B.unpack
-- | Put an element, wrapped by two characters wrap :: Char -> Char -> Put -> Put wrap a b m = do putWord8 (toW8 a) m putWord8 (toW8 b)
-- | Put something as it is shown using @show@ putShow :: Show a => a -> Put putShow x = mapM_ put (show x)
With these in hand, I set to work implementing the put function. The Integer and Array functions were straight forward:
instance Serialize BCode where put (BInt i) = wrap 'i' 'e' $ putShow i put (BArray arr) = wrap 'l' 'e' . mapM_ put $ arr
The Dictionary and String implementations weren’t too bad either:
put (BDict mp) = wrap 'd' 'e' dict where dict = mapM_ encPair . M.toList $ mp encPair (k, v) = put (BString k) >> put v put (BString s) = do putShow (B.length s) putWord8 (toW8 ':') putByteString s
As you can see, the code is quite clear, and matches the specification quite well.
Parsing the data was the next step. this proved a little more difficult, but with my recent (shallow) experience with Parsec, I knew what was needed.
I decided to start by writing some useful combinators (this is a lie, I wrote them when needed, but lying makes the post flow better >_>). These included the following:
-- | Get a Char. Only works with single byte characters getCharG :: Get Char getCharG = fromW8 <$> getWord8 -- | Parse a given character char :: Char -> Get () char c = do x <- getCharG if x == c then return () else fail $ "Expected char: '" ++ c:"' got: '" ++ [fromW8 x,'\''] -- | Get something wrapped in two Chars getWrapped :: Char -> Char -> Get a -> Get a getWrapped a b p = char a > p < char b -- The same as char a >> p >>= \x -> char b >> return x -- | Parse zero or items using a given parser many :: Get a -> Get [a] many p = many1 p mplus return [] -- | Parse one or more items using a given parser many1 :: Get a -> Get [a] many1 p = (:) <$> p <*> many p -- | Returns a character if it is a digit, fails otherwise. uses isDigit. digit :: Get Char digit = do x <- getCharG if isDigit x then return x else fail $ "Expected digit, got: " ++ show x -- | Get one or more digit characters getDigits :: Get String getDigits = many1 digit
-- | Parse a given character char :: Char -> Get () char c = do x <- getCharG if x == c then return () else fail $ "Expected char: '" ++ c:"' got: '" ++ [fromW8 x,'\'']
-- | Get something wrapped in two Chars getWrapped :: Char -> Char -> Get a -> Get a getWrapped a b p = char a > p < char b -- The same as char a >> p >>= \x -> char b >> return x
-- | Parse zero or items using a given parser many :: Get a -> Get [a] many p = many1 p mplus return []
mplus
-- | Parse one or more items using a given parser many1 :: Get a -> Get [a] many1 p = (:) <$> p <*> many p
-- | Returns a character if it is a digit, fails otherwise. uses isDigit. digit :: Get Char digit = do x <- getCharG if isDigit x then return x else fail $ "Expected digit, got: " ++ show x
-- | Get one or more digit characters getDigits :: Get String getDigits = many1 digit
My favourite two definitions here are many and many1, which nicely show the use of Alternative: they are mutually recursive, with many1 being the only one of the two to actually do and parsing, while many checks to see if many1 failed to parse one object using the parser p. It’s really quite beautiful, and makes the code that follows a hell of a lot nicer to write. This is where the love mentioned in the title comes in by the way.
many
many1
With these in hand, I could now go ahead and write the actual parsers for various BCode types. Parsing BInts and BArrays is dead simple now:
-- | Parses a BInt getBInt :: Get BCode getBInt = BInt . read <$> getWrapped 'i' 'e' getDigits -- | Parses a BArray getBArray :: Get BCode getBArray = BArray <$> getWrapped 'l' 'e' (many get)
-- | Parses a BArray getBArray :: Get BCode getBArray = BArray <$> getWrapped 'l' 'e' (many get)
As as side note, I’ve now come to see just what the folks on #haskell were on about when they said Applicative is nice. I think I’ve fallen in love (yet again!).
BStrings were a little more difficult, but not hard, given what I’ve just written:
-- | Parses a BString getBString :: Get BCode getBString = do count <- getDigits BString <$> ( char ':' *> getByteString (read count))
Here we get as many digits as we can, followed by a colon, and then take the number of bytes the digits specified. Finally, we have the BDict definition, which also is quite nice, if slightly annoying with its use of pattern matching (don’t get me wrong, i love pattern matching, but it’s the only place it’s used in the parser )
-- | Parses a BDict getBDict :: Get BCode getBDict = BDict . M.fromList <$> getWrapped 'd' 'e' (many getPairs) where getPairs = do (BString s) <- getBString x <- get return (s,x)
Putting it all together, we finally have a definition for the get function in the Serialize class.
get = getBInt <|> getBArray <|> getBDict <|> getBString
Please, I implore you, do let me know what you think of this all, I’m always interested in seeing what others think of my code, and ways to improve it.
— Axman
Over the last week or so, I’ve been playing around with some of the minor details of jlouis’ Haskell-Torrent code. The main pieces I’ve been playing with have been the more binary centric pieces: the torrent file en/decoding and since Wednesday, the wire protocol.
After trying to compile the code using GHC 6.12.1, I noticed that a rather strange dependency: HCodecs. Investigating further, I realised that all that was being used was its fairly nice binary builder and parser, and was being used in the parsing and construction of bencoded files. This seemed to me to be the domain of one of my favourite packages, binary. I also noticed that when trying to parse the sample torrent file jlouis had provided, there was binary data that could not be represented in a String type, because there were byte sequences that could not be parses into Char’s.
So I went off to try and implement the BCode module using Data.Binary. The encoding was really simple and straight forward, but parsing was another matter. Binary has no real error handeling, which meant that decoding things like lists where you keep adding elements to the list until you reach an ‘e’, was made quite verbose due to this fact. After implementing a (semi) working module, I went in search of a more parsec like interface to binary parsing. This is when someone on #Haskell informed me of cereal, a package based on binary, which added an Alternative instance, and an Either String a return type for better error reporting.
I fell in love. This was exactly what I was after, I could create my own combinators like many and many1, with the same ease I could with parsec. Using cereal, I was able to make a really clear, concise parser, which should also be quite efficient.
I will post some of this code at a later time, when i’m not writing on my iPhone, and show you just what I mean. I’m also considering releasing the BCode module as a package of it’s own, if anyone else feels it might me useful to them. Either that, or I might write up a module of useful combinators for use with cereal.
Until next time, happy new year and I hope this year treats you all better than the last one did.
– Al
Robert Greayer has just released his BlogLiterally program on hackage, so I thought I’d try it out. I’ve made some small changes to the HTML DSL I’ve been working on since i last posted, which I’ll outline here:
The HTML type is now
data HTML a = Doctype String String String (HTML a) | Content a | Text String | Node Tag Attrs [HTML a] | NoContent Tag Attrs | Comment String deriving Show
instead of
data HTML a = Doctype String String String [HTML a] | Content a | Text String | Node Tag Attrs [HTML a] | NoContent Tag Attrs | Comment String deriving Show
I’ve also added some more helper functions:
addAttrs :: Attrs -> HTML a -> HTML a addAttrs attrs (Node tag attrs' rest) = Node tag (attrs' ++ attrs) rest addAttrs attrs (NoContent tag attrs') = NoContent tag (attrs'++attrs) addAttrs _ x = x toTable :: [[String]] -> HTML a toTable xss = table . map (tr . map (\x -> td [text x])) $ xss toTable' :: Show a => [[a]] -> HTML b toTable' = toTable . map (map show)
toTable :: [[String]] -> HTML a toTable xss = table . map (tr . map (\x -> td [text x])) $ xss
toTable' :: Show a => [[a]] -> HTML b toTable' = toTable . map (map show)
And updated my example:
example :: HTML () example = doctype "html PUBLIC" "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" $ xhtml [ header [ title "My random page", meta "name" "Alex" ], body [ comment "This is a comment", para [ text "The hr tag defines a horizontal rule:" ], hr, para [ text "This is a paragraph" ], hr, para [ text "This is a paragraph" ], addAttrs [("border","1px")] . toTable' $ [[0..n] | n <- [0..3]] ] ]
Which produces this html:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head > <title > My random page </title> <meta name="name" content="Alex"/> </head> <body > <!-- This is a comment --> <p > The hr tag defines a horizontal rule: </p> <hr /> <p > This is a paragraph </p> <hr /> <p > This is a paragraph </p> <table border="1px"> <tr > <td > 0 </td> </tr> <tr > <td > 0 </td> <td > 1 </td> </tr> <tr > <td > 0 </td> <td > 1 </td> <td > 2 </td> </tr> <tr > <td > 0 </td> <td > 1 </td> <td > 2 </td> <td > 3 </td> </tr> </table> </body> </html>
So, after a discussion on #haskell, I decided to try writing a simple HTML DSL in Haskell. In about 15 minutes, I had a rather rudimentary implementation, and after a day or so of working, I’ve got a somewhat useful DSL for writing HTML, and pretty printing it.
The main type behind the package is:
where
type Attr = (String, String) type Attrs = [Attr] data Tag = Html | ARef | BlockQuote | Body ... | Title | TR | UL
I’ve also provided helper functions to make this an actual DSL (I guess?): example :: HTML () example = doctype "html PUBLIC" "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ xhtml [ header [ title "My random page", meta "name" "Alex" ], body [ comment "This is a comment", para [ text "The hr tag defines a horizontal rule:" ], hr, para [ text "This is a paragraph" ], hr, para [ text "This is a paragraph" ], table [ tr [ td [ text "data" ], td [ text "more data" ] ], tr [ td [ text "more data" ] ] ] ] ] ]
example :: HTML () example = doctype "html PUBLIC" "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ xhtml [ header [ title "My random page", meta "name" "Alex" ], body [ comment "This is a comment", para [ text "The hr tag defines a horizontal rule:" ], hr, para [ text "This is a paragraph" ], hr, para [ text "This is a paragraph" ], table [ tr [ td [ text "data" ], td [ text "more data" ] ], tr [ td [ text "more data" ] ] ] ] ] ]
produces:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head > <title > My random page </title> <meta name="name" content="Alex"/> </head> <body > <!-- This is a comment --> <p > The hr tag defines a horizontal rule: </p> <hr /> <p > This is a paragraph </p> <hr /> <p > This is a paragraph </p> <table > <tr > <td > data </td> <td > more data </td> </tr> <tr > <td > more data </td> </tr> </table> </body> </html>
I don’t claim that anyone should use this (there’s a good chance I won’t be releasing it, unless someone really wants me to), but I’m still quite proud of it.
A while ago I found Björn Edström’s quick intro to the Haskell FFI , and I decided to follow along and find out what it was to interface Haskell with C. I found it quite interesting, noticed the C was faster than the haskell version I wrote, and then sort of forgot about it all.
Then Snow Leopard came out. I read the Ars Technica review of it, and realised that Grand Central Dispatch was frigging awesome, because it makes doing things concurrently in C damn easy. I went off and implemented a few programs using it, and had a lot of fun. It was especially interesting because I knew we were coming up to learning about concurrency in C in our Concurrent and Distributed Systems course. I’d heard about pthreads, and could see they were not going to be pleasant to use, at least not without some higher level interface. Blocks along with GCD are obviously an improvement, though certainly not an alternative.
It was about then that I realised two things: Firstly, I could parallelise the dctiv function from from Björn’s article using GCD, and secondly, that I was taking a Signals Processing course, and I actually understood what the code was doing. So off I went, turning this code:
void dctiv(int points, float *in, float *out){ int k, n; float sum, cosConst, kk; for (k = 0;k < points; k++ ){ sum = 0; kk = (k + 0.5f); cosConst = (PI/points) * kk; for (n = 0; n < points; n++){ sum += in[n] * cos(cosConst * (n+0.5f)); } out[k] = sum; } }
#include <dispatch/dispatch.h> ... void dctivGCD(int points, float *in, float *out){ dispatch_apply(points, dispatch_get_global_queue(0,0), ^(size_t k){ float sum, cosConst, kk; int n; sum = 0; kk = (k + 0.5f); cosConst = (PI/points) * kk; for (n = 0; n < points; n++){ sum += in[n] * cos(cosConst * (n+0.5f)); } out[k] = sum; }); }
So I went off and benchmarked this new code, and found that on my Core 2 Duo, it was about 40% faster. Hoorah! (for some reason, I can now not get this code to execute on more than one thread, so I’m no longer getting any speedups But that’s another story). That was all well and good, but where to go next?
While I think GCD is by far the most important feature of Snow Leopard, it also has another related feature that has stirred a lot of attention in the in the lead up to Snow Leopard’s release: OpenCL, the Open Computing Language. It is supposed to provide a common interface allowing a programmer to take advantage of parallelism on their CPU, GPU, Cell or what have you. I’d gotten somewhat interested by this, and decided to take a look at the Apple OpenCL examples. I’d thought that OpenCL would be hard to use, and I was right. Luckily, the OpenCL Hello World example was an example of something that was very close to what I was doing in the dctiv function. I decided to give OpenCL a whirl, and copied and pasted the most of the hello.c file into a new function, modified it… and it didn’t work. I’m not going to go into why it didn’t work, because it was mainly my stupidity, but I will say that if you if you’re working with floats in C/OpenCL, make sure your constants are in the form of 0.5f.
After some work, I finally got my kernel to compile:
const char KernelSource = "\n" \ "__kernel void DCT( \n" \ " __global const float input, \n" \ " __global float* output, \n" \ " const unsigned int count) \n" \ "{ \n" \ " int n, i = get_global_id(0); \n" \ " float sum, cosConst, kk; \n" \ " \n" \ " sum = 0; \n" \ " kk = (i + 0.5f); \n" \ " cosConst = (3.141592653589793f/count) * kk; \n" \ " for(n = 0; n < count; ++n) \n" \ " { \n" \ " sum += input[n] * cos(cosConst * (n+0.5f)); \n" \ " } \n" \ " output[i] = sum; \n" \ "} \n" \ "\n";
I had also made a FFI binding to the function that ran this (left out, because it’s 149 lines long). I started benchmarking, and couldn’t quite believe the speed: on 215 elements, the C takes 44.43s, the OpenCL version on the same input takes 7.11s. I’m a little disappointed with my Haskell versions, running about half as fast as the C version. For some reason I can’t seem to get them to actually run in parallel, it’s rather frustrating. For those interested, here are the two haskell versions:
dctivHask :: Int -> [Float] -> [Float] dctivHask points ps = parMap rnf forK [0..fromIntegral (points-1)] where !piOn = myPi / fromIntegral points forK k = foldl' (f $ (k+0.5) piOn) 0 $ zipWith V [0.0..] ps f !kp !sum' (V !n !p) = sum' + p cos (kp * (n+0.5))
Which takes about 71.22s, and using unboxed arrays:
dctivHaskArr :: Int -> [Float] -> [Float] dctivHaskArr points ps = pmap f . range . bounds $ input where input = listArray (0,points-1) ps :: UArray Int Float f :: Int -> Float f !k = g 0.0 0 where cosConst = (myPi / fromIntegral points) (0.5 + fromIntegral k) :: Float g :: Float -> Int -> Float g !num !n = {-# SCC "dctivHaskArr-worker" #-} let nn = 0.5 + fromIntegral n :: Float in if n < points then g (num + ((input ! n) cos (cosConst * nn))) (n+1) else num
Which takes about 66.96s (these are just one off test numbers because I get bored waiting for the haskell versions to finish).
I should probably follow up this post with… more stuff… so stay tuned (and please leave comments, because i don’t know what that stuff should be yet!)
So, I posted a link to reddit yesterday, thought it was funny, and thought it might get a few votes… I was wrong. It got a lot of votes, and is now the 11th top link submitted to reddit of all time. If you’re curious what all the fuss is about, check it out here.
So, thanks to all of you that voted for it, you’ve made me feel quite loved (in a you liked something I posted but didn’t have anything to do with kind of way).
This isn’t an announcement, but a discussion of the next project I’d like to try writing in haskell. I’ve been thinking of ways in which I can use my TernaryTrees package, and I realised that a proxy server could be a nice way to do that.
I’m still trying to think through the details of what needs to be implemented, and how it should all work, but I’d like to it be something concurrent, with one thread handling each request to fetch objects if needed, and another thread keeping track of a TernaryMap (Or maybe I’ll release the StringMap in the TernaryTrees package that’s in the repo), which can be used to keep track of which elements have been cached, and other info about the objects. It occurs to me that this job seems ideally suited to my AVar package, which is a concurrent ‘variable’, that can be used by many concurrent threads, and will do as much as it can to stop itself from being destroyed by exceptions.
I think there’s a few things I’ll need to add to TernaryTrees before I can get a nice proxy working, mainly a delete function to remove elements from the trees. I’ll probably have to do some more work on AVars before i can use them too. But my main problem is not knowing how to do anything to do with HTTP, both fetching and serving. I’ll be taking a look at the HTTP and HTTP-server libraries to see what i can figure out, probably by starting with seeing if i can fetch some data, save it to a file, and send it on to somewhere else.
I’d really live to hear from others about this if they read it, and give me any ideas you may have had about the subject.
After years of wanting to, I finally bought axman6.com. I’m using the quite awesome nearlyfreespeech.net for my hosting, because they’re damn cheap, and seem to runs things very nicely.
Now for the interesting stuff. I’ve been working over the last few days on implementing a new Set and Map library for haskell, which I released yesterday on Hackage, called imaginatively TernaryTrees. I got the idea from a PC Plus article [Via: Reddit.com], and couldn’t quite get the whole idea into my head at once, so decided to try implementing it in the best language I knew, haskell.
It turns out this was a good idea; without much work at all, I’ve ended up with a rather efficient implementation, which can be used for either efficient Sets of Strings, or Sets or Maps of any Ord a => [a]. The thing I’m most happy about is how nice I managed to make the Data.Binary implementation. By using a little bit of binary number trickery, I’ve managed to save an awful lot of space (The {T,S}End notes only requite 1 bit of information saying whether they’re there or not and these bits are stored in the same byte that marked whether we has a {T,S}Node or {T,S}End). When using this to write the StringSet out to a binary file, I’ve been able to get a the output to be compressed by over 40% in one case (though this doesn’t seem to be typical sadly )
There’s not much more I want to say about this for now, maybe later, but I suggest you go and take a look at the source and see how easy this all was.