Currently, my SHA1 implementation in Haskell is about 100 times slower than the version that ships with my Linux installation.
dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success
real 0m5.165s
user 0m4.796s
sys 0m0.060s
dom@heisenberg:~/sha1> time sha1sum eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3 eg
real 0m0.061s
user 0m0.016s
sys 0m0.004s
Profiling reveals where the time is being spent.
Sat Mar 10 07:27 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 11.95 secs (239 ticks @ 50 ms)
total alloc = 2,112,329,096 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 23.0 26.9
getWord32s Data.Digest.SHA1 19.7 17.5
rotL Data.Digest.SHA1 14.2 12.6
f Data.Digest.SHA1 13.4 10.2
$& Data.Digest.SHA1 9.2 15.2
k Data.Digest.SHA1 7.5 5.9
test2 Main 5.4 6.1
pad Data.Digest.SHA1 3.8 2.5
blockWord8sIn512 Data.Digest.SHA1 2.9 3.1
getWord32s looks like a good candidate for improvement.
getWord32s :: [Word8] -> [Word32]
getWord32s s =
map f [0..15]
where
f i = foldl (+) 0 $ map (\n -> toEnum (fromEnum (s!!(i*4+n))) `shiftL` (fromIntegral (8 * (3-n)))) [0..3]
Here's a more efficient version.
fromBytes :: (Bits a) => [a] -> a
fromBytes input =
let dofb accum [] = accum
dofb accum (x:xs) = dofb ((shiftL accum 8) .|. x) xs
in
dofb 0 input
blockWord8sIn32 :: [Word8] -> [[Word8]]
blockWord8sIn32 =
unfoldr g
where
g [] = Nothing
g xs = Just (splitAt 4 xs)
getWord32s' :: [Word8] -> [Word32]
getWord32s' =
map fromBytes . map (map fromIntegral) . blockWord8sIn32
Profiling again shows that this function now uses 3.3% of the total time as opposed to 19.7% of the total time.
Sat Mar 10 08:59 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 10.50 secs (210 ticks @ 50 ms)
total alloc = 1,972,570,248 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 20.5 28.8
rotL Data.Digest.SHA1 16.7 13.5
f Data.Digest.SHA1 16.7 11.0
$& Data.Digest.SHA1 13.8 16.2
k Data.Digest.SHA1 12.9 6.3
test2 Main 5.7 6.5
blockWord8sIn512 Data.Digest.SHA1 4.8 3.3
getWord32s' Data.Digest.SHA1 3.3 5.0
fromBytes Data.Digest.SHA1 1.9 2.6
blockWord8sIn32 Data.Digest.SHA1 1.9 4.0
pad Data.Digest.SHA1 1.4 2.6
This gives a small improvement but we are still tracking at about 80 times worse performance.
dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success
real 0m4.194s
user 0m3.952s
sys 0m0.036s