-----module QuickTest>-----
74% expressions used (610/821)
0% boolean coverage (0/6)
0% guards (0/5), 4 always True, 1 always False
0% 'if' conditions (0/1), 1 always False
100% qualifiers (0/0)
64% alternatives used (42/65)
93% local declarations used (15/16)
76% top-level declarations used (33/43)
Saturday, December 8, 2007
Better Coverage in QuickTest IV
Better Coverage in QuickTest III
-----module QuickTest>-----
57% expressions used (548/952)
0% boolean coverage (0/6)
0% guards (0/5), 4 always True, 1 always False
0% 'if' conditions (0/1), 1 always False
100% qualifiers (0/0)
58% alternatives used (38/65)
50% local declarations used (15/30)
58% top-level declarations used (31/53)
Friday, December 7, 2007
Better Coverage in QuickTest II
-----module QuickTest>-----
69% expressions used (752/1084)
0% boolean coverage (0/6)
0% guards (0/5), 4 always True, 1 always False
0% 'if' conditions (0/1), 1 always False
100% qualifiers (0/0)
58% alternatives used (58/99)
65% local declarations used (15/23)
70% top-level declarations used (46/65)
Better Coverage in QuickTest
-----module QuickTest>-----
64% expressions used (768/1182)
0% boolean coverage (0/6)
0% guards (0/5), 4 always True, 1 always False
0% 'if' conditions (0/1), 1 always False
100% qualifiers (0/0)
57% alternatives used (60/104)
75% local declarations used (15/20)
66% top-level declarations used (46/69)
Tuesday, August 28, 2007
Initial Coverage
dom@heisenberg:~/asn15/asn1> hpc report Main --per-module
-----module ConstrainedType>-----
52% expressions used (1318/2496)
25% boolean coverage (13/51)
30% guards (4/13), 5 always True, 1 always False, 3 unevaluated
23% 'if' conditions (9/38), 3 always True, 10 always False, 16 unevaluated
100% qualifiers (0/0)
46% alternatives used (151/326)
63% local declarations used (67/105)
57% top-level declarations used (74/129)
-----module Language.ASN1>-----
0% expressions used (0/296)
100% boolean coverage (0/0)
100% guards (0/0)
100% 'if' conditions (0/0)
100% qualifiers (0/0)
0% alternatives used (0/27)
0% local declarations used (0/8)
0% top-level declarations used (0/36)
-----module Main>-----
100% expressions used (3/3)
100% boolean coverage (0/0)
100% guards (0/0)
100% 'if' conditions (0/0)
100% qualifiers (0/0)
100% alternatives used (0/0)
100% local declarations used (0/0)
100% top-level declarations used (1/1)
-----module Pretty>-----
0% expressions used (0/942)
0% boolean coverage (0/8)
0% guards (0/5), 5 unevaluated
0% 'if' conditions (0/3), 3 unevaluated
100% qualifiers (0/0)
0% alternatives used (0/111)
0% local declarations used (0/19)
0% top-level declarations used (0/51)
-----module QuickTest>-----
37% expressions used (488/1286)
25% boolean coverage (2/8)
20% guards (1/5), 4 always True
33% 'if' conditions (1/3), 1 always False, 1 unevaluated
100% qualifiers (0/0)
48% alternatives used (62/129)
78% local declarations used (18/23)
33% top-level declarations used (24/71)
-----module UnitTest>-----
35% expressions used (918/2590)
100% boolean coverage (0/0)
100% guards (0/0)
100% 'if' conditions (0/0)
100% qualifiers (0/0)
33% alternatives used (2/6)
70% local declarations used (17/24)
39% top-level declarations used (105/266)
Monday, March 19, 2007
A Small Start on SHA1 Performance Part IV
Using apfelmus' suggestions saves a bit more time but the improvement isn't dramatic. The code is here.
The function which consumes most of the time is shown below but it's difficult to see how it can be improved.
If you are going to build the code then here is the appropriate command line.
dom@heisenberg:~/sha11/sha1> time ./perfTest ../../sha1/eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success
real 0m2.225s
user 0m2.180s
sys 0m0.036s
The function which consumes most of the time is shown below but it's difficult to see how it can be improved.
oneBlock ss xs = tt
where
us :: Array Int Word32
us =
accumArray (curry snd) 0 (0,79) (zip [0..15] xs ++ map (\(x,y) -> (x, rotL 1 y))[(i, xxor i) | i<-[16..79]])
where
xxor i = us ! (i-16) `xor` us ! (i-3) `xor` us ! (i-8) `xor` us ! (i-14)
g (AccAndWord160 n (Word160 a b c d e)) w =
AccAndWord160 (n+1) (Word160 ((rotL 5 a) + (f n b c d) + e + w + (k n)) a (rotL 30 b) c d)
(AccAndWord160 _ tt) = foldl' g (AccAndWord160 0 ss) (elems us)
If you are going to build the code then here is the appropriate command line.
ghc -fglasgow-exts --make perfTest.hs -o perfTest -O -fvia-C -funbox-strict-fields
Saturday, March 10, 2007
A Small Start on SHA1 Performance Part III
Since we first wrote SHA1 (years ago), it looks like rotate is now in Data.Bits so we can use that rather than rolling our own.
type Rotation = Int
rotL :: Rotation -> Word32 -> Word32
rotL s a = shiftL a s .|. shiftL a (s-32)
rotL' = flip rotateL
This saves some more time.
type Rotation = Int
rotL :: Rotation -> Word32 -> Word32
rotL s a = shiftL a s .|. shiftL a (s-32)
rotL' = flip rotateL
This saves some more time.
Sat Mar 10 14:28 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 7.15 secs (143 ticks @ 50 ms)
total alloc = 1,484,351,328 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 35.7 40.1
$& Data.Digest.SHA1 16.1 21.6
f' Data.Digest.SHA1 10.5 6.1
test2 Main 8.4 8.7
getWord32s' Data.Digest.SHA1 7.7 6.6
blockWord8sIn512 Data.Digest.SHA1 5.6 4.4
k' Data.Digest.SHA1 4.9 0.0
blockWord8sIn32 Data.Digest.SHA1 4.9 5.3
pad Data.Digest.SHA1 3.5 3.5
fromBytes Data.Digest.SHA1 2.1 3.5
A Small Start on SHA1 Performance Part II
The function k is a bit inefficient.
Since we know that n is always greater than or equal 0 and less than 80, we can use this.
This gives a bit more of a saving.
We can do the same with f.
k n
| (0 <= n) &&amp; (n <= 19) = 0x5a827999
| (20 <= n) && (n <= 39) = 0x6ed9eba1
| (40 <= n) && (n <= 59) = 0x8f1bbcdc
| (60 <= n) && (n <= 79) = 0xca62c1d6
| otherwise = error "invalid index for k"
Since we know that n is always greater than or equal 0 and less than 80, we can use this.
k' n
| n <= 19 = 0x5a827999
| n <= 39 = 0x6ed9eba1
| n <= 59 = 0x8f1bbcdc
| n <= 79 = 0xca62c1d6
This gives a bit more of a saving.
Sat Mar 10 13:49 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 9.55 secs (191 ticks @ 50 ms)
total alloc = 1,847,562,152 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 25.1 30.7
f Data.Digest.SHA1 16.2 11.7
$& Data.Digest.SHA1 14.7 17.3
rotL Data.Digest.SHA1 14.1 14.4
blockWord8sIn512 Data.Digest.SHA1 8.9 3.5
k' Data.Digest.SHA1 4.7 0.0
pad Data.Digest.SHA1 4.2 2.8
test2 Main 4.2 7.0
getWord32s' Data.Digest.SHA1 2.6 5.3
blockWord8sIn32 Data.Digest.SHA1 2.6 4.3
hash Data.Digest.SHA1 1.0 0.0
fromBytes Data.Digest.SHA1 1.0 2.8
We can do the same with f.
Sat Mar 10 14:03 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 8.70 secs (174 ticks @ 50 ms)
total alloc = 1,722,554,056 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 29.3 32.9
rotL Data.Digest.SHA1 19.0 15.4
$& Data.Digest.SHA1 13.8 18.6
f' Data.Digest.SHA1 11.5 5.3
getWord32s' Data.Digest.SHA1 8.0 5.7
test2 Main 5.7 7.5
blockWord8sIn512 Data.Digest.SHA1 4.0 3.8
fromBytes Data.Digest.SHA1 2.9 3.0
k' Data.Digest.SHA1 2.3 0.0
pad Data.Digest.SHA1 1.7 3.0
blockWord8sIn32 Data.Digest.SHA1 1.7 4.6
A Small Start on SHA1 Performance
Currently, my SHA1 implementation in Haskell is about 100 times slower than the version that ships with my Linux installation.
Profiling reveals where the time is being spent.
getWord32s looks like a good candidate for improvement.
Here's a more efficient version.
Profiling again shows that this function now uses 3.3% of the total time as opposed to 19.7% of the total time.
This gives a small improvement but we are still tracking at about 80 times worse performance.
dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success
real 0m5.165s
user 0m4.796s
sys 0m0.060s
dom@heisenberg:~/sha1> time sha1sum eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3 eg
real 0m0.061s
user 0m0.016s
sys 0m0.004s
Profiling reveals where the time is being spent.
Sat Mar 10 07:27 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 11.95 secs (239 ticks @ 50 ms)
total alloc = 2,112,329,096 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 23.0 26.9
getWord32s Data.Digest.SHA1 19.7 17.5
rotL Data.Digest.SHA1 14.2 12.6
f Data.Digest.SHA1 13.4 10.2
$& Data.Digest.SHA1 9.2 15.2
k Data.Digest.SHA1 7.5 5.9
test2 Main 5.4 6.1
pad Data.Digest.SHA1 3.8 2.5
blockWord8sIn512 Data.Digest.SHA1 2.9 3.1
getWord32s looks like a good candidate for improvement.
getWord32s :: [Word8] -> [Word32]
getWord32s s =
map f [0..15]
where
f i = foldl (+) 0 $ map (\n -> toEnum (fromEnum (s!!(i*4+n))) `shiftL` (fromIntegral (8 * (3-n)))) [0..3]
Here's a more efficient version.
fromBytes :: (Bits a) => [a] -> a
fromBytes input =
let dofb accum [] = accum
dofb accum (x:xs) = dofb ((shiftL accum 8) .|. x) xs
in
dofb 0 input
blockWord8sIn32 :: [Word8] -> [[Word8]]
blockWord8sIn32 =
unfoldr g
where
g [] = Nothing
g xs = Just (splitAt 4 xs)
getWord32s' :: [Word8] -> [Word32]
getWord32s' =
map fromBytes . map (map fromIntegral) . blockWord8sIn32
Profiling again shows that this function now uses 3.3% of the total time as opposed to 19.7% of the total time.
Sat Mar 10 08:59 2007 Time and Allocation Profiling Report (Final)
perfTest +RTS -p -RTS eg
total time = 10.50 secs (210 ticks @ 50 ms)
total alloc = 1,972,570,248 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
oneBlock Data.Digest.SHA1 20.5 28.8
rotL Data.Digest.SHA1 16.7 13.5
f Data.Digest.SHA1 16.7 11.0
$& Data.Digest.SHA1 13.8 16.2
k Data.Digest.SHA1 12.9 6.3
test2 Main 5.7 6.5
blockWord8sIn512 Data.Digest.SHA1 4.8 3.3
getWord32s' Data.Digest.SHA1 3.3 5.0
fromBytes Data.Digest.SHA1 1.9 2.6
blockWord8sIn32 Data.Digest.SHA1 1.9 4.0
pad Data.Digest.SHA1 1.4 2.6
This gives a small improvement but we are still tracking at about 80 times worse performance.
dom@heisenberg:~/sha1> time ./perfTest eg
737be686da2aac4bbfb4cee3fc7dfa825f1d48a3
Success
real 0m4.194s
user 0m3.952s
sys 0m0.036s
Subscribe to:
Posts (Atom)