Saturday, March 10, 2007

A Small Start on SHA1 Performance

Currently, my SHA1 implementation in Haskell is about 100 times slower than the version that ships with my Linux installation.


dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success

real 0m5.165s
user 0m4.796s
sys 0m0.060s
dom@heisenberg:~/sha1> time sha1sum eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3 eg

real 0m0.061s
user 0m0.016s
sys 0m0.004s


Profiling reveals where the time is being spent.


Sat Mar 10 07:27 2007 Time and Allocation Profiling Report (Final)

perfTest +RTS -p -RTS eg

total time = 11.95 secs (239 ticks @ 50 ms)
total alloc = 2,112,329,096 bytes (excludes profiling overheads)

COST CENTRE MODULE %time %alloc

oneBlock Data.Digest.SHA1 23.0 26.9
getWord32s Data.Digest.SHA1 19.7 17.5
rotL Data.Digest.SHA1 14.2 12.6
f Data.Digest.SHA1 13.4 10.2
$& Data.Digest.SHA1 9.2 15.2
k Data.Digest.SHA1 7.5 5.9
test2 Main 5.4 6.1
pad Data.Digest.SHA1 3.8 2.5
blockWord8sIn512 Data.Digest.SHA1 2.9 3.1


getWord32s looks like a good candidate for improvement.


getWord32s :: [Word8] -> [Word32]
getWord32s s =
map f [0..15]
where
f i = foldl (+) 0 $ map (\n -> toEnum (fromEnum (s!!(i*4+n))) `shiftL` (fromIntegral (8 * (3-n)))) [0..3]


Here's a more efficient version.


fromBytes :: (Bits a) => [a] -> a
fromBytes input =
let dofb accum [] = accum
dofb accum (x:xs) = dofb ((shiftL accum 8) .|. x) xs
in
dofb 0 input

blockWord8sIn32 :: [Word8] -> [[Word8]]
blockWord8sIn32 =
unfoldr g
where
g [] = Nothing
g xs = Just (splitAt 4 xs)

getWord32s' :: [Word8] -> [Word32]
getWord32s' =
map fromBytes . map (map fromIntegral) . blockWord8sIn32


Profiling again shows that this function now uses 3.3% of the total time as opposed to 19.7% of the total time.


Sat Mar 10 08:59 2007 Time and Allocation Profiling Report (Final)

perfTest +RTS -p -RTS eg

total time = 10.50 secs (210 ticks @ 50 ms)
total alloc = 1,972,570,248 bytes (excludes profiling overheads)

COST CENTRE MODULE %time %alloc

oneBlock Data.Digest.SHA1 20.5 28.8
rotL Data.Digest.SHA1 16.7 13.5
f Data.Digest.SHA1 16.7 11.0
$& Data.Digest.SHA1 13.8 16.2
k Data.Digest.SHA1 12.9 6.3
test2 Main 5.7 6.5
blockWord8sIn512 Data.Digest.SHA1 4.8 3.3
getWord32s' Data.Digest.SHA1 3.3 5.0
fromBytes Data.Digest.SHA1 1.9 2.6
blockWord8sIn32 Data.Digest.SHA1 1.9 4.0
pad Data.Digest.SHA1 1.4 2.6


This gives a small improvement but we are still tracking at about 80 times worse performance.


dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success

real 0m4.194s
user 0m3.952s
sys 0m0.036s

No comments: