]>
Commit | Line | Data |
---|---|---|
e07f3d5a JAK |
1 | # Order-preserving minimal perfect hash function generator |
2 | ||
3 | Build order-preserving minimal perfect hash functions. | |
4 | ||
5 | [![codecov](https://codecov.io/gh/julian-klode/triehash/branch/master/graph/badge.svg)](https://codecov.io/gh/julian-klode/triehash) | |
6 | [![Build Status](https://travis-ci.org/julian-klode/triehash.svg?branch=master)](https://travis-ci.org/julian-klode/triehash) | |
7 | ||
8 | ## Performance | |
9 | ||
10 | Performance was evaluated against other hash functions. As an input set, the | |
11 | fields of Debian Packages and Sources files was used, and each hash function | |
12 | was run 1,000,000 times for each word. The byte count of the words were then | |
13 | summed up and divided by the total number of nanoseconds each function ran, so | |
14 | all speeds below are given in bytes per nanosecond, AKA gigabyte per second. | |
15 | ||
16 | arch/function|jak-x230 (amd64)|backup (amd64)|asachi.d.o (arm64)|asachi.d.o (armel)|asachi.d.o (armhf)|plummer.d.o (ppc64el)|eller.d.o (mipsel) | |
17 | -------------|----------------|--------------|------------------|------------------|------------------|---------------------|------------------ | |
18 | Trie | 2.4 | 1.9 | 1.2 | 0.9 | 0.8 | 2.0 | 0.2 | |
19 | Trie (*) | 2.2 | 1.7 | 0.8 | 0.7 | 0.7 | 1.8 | 0.2 | |
20 | re2c | 1.7 | 1.3 | 0.9 | 0.9 | 0.7 | 1.6 | 0.2 | |
21 | re2c (*) | 1.2 | 0.9 | 0.6 | 0.6 | 0.5 | 1.1 | 0.1 | |
22 | gperf (*) | 0.7 | 0.5 | 0.2 | 0.2 | 0.2 | 0.5 | 0.1 | |
23 | gperf | 1.3 | 0.9 | 0.3 | 0.3 | 0.2 | 0.4 | 0.1 | |
24 | djb (*) | 0.7 | 0.5 | 0.3 | 0.3 | 0.3 | 0.5 | 0.1 | |
25 | djb (**) | 1.0 | 0.7 | 0.4 | 0.5 | 0.5 | 0.6 | 0.2 | |
26 | djb | 0.9 | 0.7 | 0.5 | 0.5 | 0.5 | 0.7 | 0.2 | |
27 | apt (*) | 1.2 | 0.9 | 0.7 | 0.7 | 0.7 | 1.1 | 0.2 | |
28 | apt (**) | 2.3 | 1.7 | 0.7 | 0.9 | 0.8 | 1.9 | 0.2 | |
29 | ||
30 | And transposed: | |
31 | ||
32 | function/arch |Trie |Trie (*) |re2c |re2c (*) |gperf (*)|gperf |djb (*) |djb (**) |djb |apt (*) |apt (**) | |
33 | ---------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|--------- | |
34 | jak-x230 (amd64) | 2.4| 2.2| 1.7| 1.2| 0.7| 1.3| 0.7| 1.0| 0.9| 1.2| 2.3 | |
35 | backup (amd64) | 1.9| 1.7| 1.3| 0.9| 0.5| 0.9| 0.5| 0.7| 0.7| 0.9| 1.7 | |
36 | asachi.d.o (arm64) | 1.2| 0.8| 0.9| 0.6| 0.2| 0.3| 0.3| 0.4| 0.5| 0.7| 0.7 | |
37 | asachi.d.o (armel) | 0.9| 0.7| 0.9| 0.6| 0.2| 0.3| 0.3| 0.5| 0.5| 0.7| 0.9 | |
38 | asachi.d.o (armhf) | 0.8| 0.7| 0.7| 0.5| 0.2| 0.2| 0.3| 0.5| 0.5| 0.7| 0.8 | |
39 | plummer.d.o (ppc64el)| 2.0| 1.8| 1.6| 1.1| 0.5| 0.4| 0.5| 0.6| 0.7| 1.1| 1.9 | |
40 | eller.d.o (mipsel) | 0.2| 0.2| 0.2| 0.1| 0.1| 0.1| 0.1| 0.2| 0.2| 0.2| 0.2 | |
41 | ||
42 | ||
43 | Legend: | |
44 | ||
45 | * The (*) variants are case-insensitive, (**) are more optimised versions | |
46 | of the (*) versions. | |
47 | * DJB (*) is a DJB Hash with naive lowercase conversion, DJB (**) just ORs one | |
48 | bit into each value to get alphabetical characters to be lowercase | |
49 | * APT (*) is the AlphaHash function from APT which hashes the last 8 bytes in a | |
50 | word in a case-insensitive manner. APT (**) is the same function unrolled. | |
51 | * All hosts except the x230 are Debian porterboxes. The x230 has a Core i5-3320M, | |
52 | barriere has an Opteron 23xx. | |
53 | ||
54 | Notes: | |
55 | ||
56 | * The overhead is larger than needed on some platforms due to gcc inserting | |
57 | unneeded zero extend instructions, see: | |
58 | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729 |