Minimal perfect hash functions.

Package Specification

This package provides a number of state-of-the-art implementations of static (i.e., immutable) minimal perfect hash functions, and, more generally, of static functions from objects to integers. The classes can be gathered in three broad groups:

{@link it.unimi.dsi.sux4j.mph.LcpMonotoneMinimalPerfectHashFunction} and {@link it.unimi.dsi.sux4j.mph.ZFastTrieDistributorMonotoneMinimalPerfectHashFunction} were introduced by Djamal Belazzougui, Paolo Boldi, Rasmus Pagh and Sebastiano Vigna in “Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses”, Proc. of the 20th Annual ACM–SIAM Symposium On Discrete Mathematics (SODA), ACM Press, 2009. {@link it.unimi.dsi.sux4j.mph.TwoStepsLcpMonotoneMinimalPerfectHashFunction}, {@link it.unimi.dsi.sux4j.mph.PaCoTrieDistributorMonotoneMinimalPerfectHashFunction}, {@link it.unimi.dsi.sux4j.mph.HollowTrieMonotoneMinimalPerfectHashFunction} and {@link it.unimi.dsi.sux4j.mph.HollowTrieDistributorMonotoneMinimalPerfectHashFunction} were introduced by the same authors in “Theory and Practise of Minimal Monotone Perfect Hashing” (the class {@link it.unimi.dsi.sux4j.mph.MWHCFunction} implements a compacted version of the classical {@linkplain it.unimi.dsi.sux4j.mph.HypergraphSorter 3-hypergraph-based structure} introduced therein).

Usage

Functions in this package implement the {@link it.unimi.dsi.fastutil.objects.Object2LongFunction} interface. However, the underlying machinery manipulates {@linkplain it.unimi.dsi.bits.BitVector bit vectors} only. To bring you own data into the bit vector world, each constructor requires to specify a {@linkplain it.unimi.dsi.bits.TransformationStrategy transformation strategy} that maps your objects into bit vectors. For instance, {@link it.unimi.dsi.bits.TransformationStrategies#utf16()}, {@link it.unimi.dsi.bits.TransformationStrategies#prefixFreeUtf16()}, {@link it.unimi.dsi.bits.TransformationStrategies#iso()}, and {@link it.unimi.dsi.bits.TransformationStrategies#prefixFreeIso()} are ready-made strategies that can be used with character sequences.

Note that if you plain to use monotone hashing, you must provide objects in an order such that the corresponding bit vectors are lexicographically ordered. For instance, {@link it.unimi.dsi.bits.TransformationStrategies#utf16()} obtain this results by concatenating the reversed 16-bit representation of each character.

Signing functions

All functions in this package will return a value in their range for most of the keys that are not in their domain. In other words, they will produce false positives; in the few cases in which it is possible to detect a negative, you will get the default return value.

If you are interested in getting a more precise behaviour (e.g., you are migrating from the deprecated SignedMinimalPerfectHash class that was distributed with MG4J), you can sign a function, that is, you can record a signature for each key and use it to filter false positives. A signing class for character sequences is provided by the DSI utilities class ShiftAddXorSignedStringMap. By creating a function using one of the implementation provided with Sux4J and signing it using the above class, you can obtain the same functionality of the old signed classes, but you can choose the size of the signature, whether to require monotonicity, and also the space/time tradeoff of your function. Alternatively, by signing with LiterallySignedStringMap we will get a two-way function (i.e., a full StringMap implementation).