Yesterday, I submitted my first program modification to the computer language shootout game, which is a change to the thread-ring program. If you’ve followed the shootout at all, you will probably know that haskell dominates at this benchmark, with the next contender (Mozart/OZ) being 50% slower than haskell already.
While i was having a look at the code, I noticed that the haskell entry was forkIO’ing a lot of threads, which is exactly what we want. But we weren’t quite forking enough; one of the 503 threads was the main thread. Bad idea!
I learnt a few weeks ago, while working on my AVar package, that communicating between bound and non bound threads (forkIO’d vs. main) when using things like MVars can be really slow. And this is exactly what was happening:
main = do
a <- newMVar . read . head =<< getArgs
z <- foldM new a [2..ring]
thread 1 z a
as you can see, the last/first thread was being run in main, so once per loop around the ring, we were talking to main via an MVar, and then talking to another forkIO’d thread via another MVar. So here’s what I did:
main = do
a <- newMVar . read . head =<< getArgs
z <- foldM new a [2..ring]
ret <- newEmptyMVar
forkIO (thread 1 z a >>= putMVar ret)
takeMVar ret
Two and a half lines of code. What were the results? Well, when the program was run with n = 50,000,000, the original ran in ~14.2s, with -N{1,2,3,4}, my modified one took on average ~9.3s, which is about a 33% improvement. It should be noted that when both programs are compiled without -threaded, there is almost no difference in speed at all (and they both run considerably faster than the non threaded version). Need to talk to the shootout peeps and find out if it’s against the rules to compile without -threaded, and get an even more massive speed boost.
– Axman