Authenticated deterministic encryption for – bit integers based on the AES-CMAC-SIV construction.
Both the design and Rust implementation of this scheme have received no external review. There are properties / limits of this scheme that need to be mathematically quantified which are presently undescribed.
This scheme uses deterministic encryption which, if applied improperly (eg naive inverted search index) can lead to catastrophic failures including full plaintext recovery .
Before attempting to experiment with this scheme, please make sure to read the full threat model section and make sure the cryptographic properties of this scheme actually apply to your intended threat model.
DO NOT USE THIS CODE IN PRODUCTION!
Threat Model
AES-SID encrypts – bit values as – bit values. However, if naively (mis) used as a general-purpose construction for encrypting – bit values, it can fail catastrophically (and similar constructions have in-practice, as described in the “Security Warning” section above).
The auto-incrementing primary key model utilized by many databases is extremely convenient for developers for many reasons. However, it comes at a cost: primary keys are easily guessable by attackers, who are able to enumerate and Explore the entire key space. Some notable examples of this problem include
More recently these low-entropy identifiers have enabled so-called ” Zoom Bombing
AES-SID encrypts – bit values as – bit values. However, if naively (mis) used as a general-purpose construction for encrypting – bit values, it can fail catastrophically (and similar constructions have in-practice, as described in the “Security Warning” section above).
The auto-incrementing primary key model utilized by many databases is extremely convenient for developers for many reasons. However, it comes at a cost: primary keys are easily guessable by attackers, who are able to enumerate and Explore the entire key space. Some notable examples of this problem include
More recently these low-entropy identifiers have enabled so-called ” Zoom Bombing
) “, where attackers are able to guess the identifiers of valid Zoom channels and thus gain access to them.
These identifiers are a standard feature of all SQL databases, easily remembered, easy-to-communicate (in text or spoken form), and generally ubiquitous in many applications.
AES-SID is designed to allow developers to retrofit applications which use low-entropy auto-incrementing primary keys in such a way that they can be deterministically and reversibly mapped to 256 – bit external / “masked” values (that can be serialized as e.g. a UUID), while ensuring that the “masked” values are randomly distributed and unguessable by an attacker (with greater-than-chance success in the 256 -bit integer space, which is widely regarded as the baseline for symmetric cryptography).
One way to solve this problem is to use a (cryptographically) random UUID as a primary key instead of an auto-incrementing one. This is a perfectly valid approach, and one worth considering, but it comes at a price: UUIDs are long and high-entropy, which means they aren’t easily spoken, or even remembered or manually typed by someone who has read them.
However, if applications are already leveraging auto-incrementing integer identifiers, a migration to randomized UUIDs is potentially complex. That said, even for greenfield applications, low-cardinality auto-incrementing IDs starting at (0,1) are extremely convenient from a developer experience perspective: they're easy to remember, to type, and to speak.
For this reason, schemes for “masking” / encrypting low-entropy numerical Developers have been developed. Historically, these schemes have at least one of these two problems:
(Identifiers are malleable , providing an advantage to attackers who are Interested in guessing any valid encrypted identifier
Identifiers are long , eg exceeding 256 - bits and therefore cannot be serialized as e.g. a UUID AES-SID attempts to create a space-optimal authenticated identifier which includes a cryptographic MAC. While other schemes providing the same properties exist, this scheme is notable as being based on the SIV Mode of Operation as described by cryptographer Phil Rogaway.
SIV was originally designed for the purposes of "key wrapping" (i.e. encryption). AES-SID is a specialization of that notion intended for "primary key wrapping".
In addition to being guessable, primary keys leak information about the records they identify: they often expose the total cardinality of the record type they identify as well as a lexicographic ordering, almost certainly by insertion order, which often also exposes a creation date and allows an honest-but-curious attacker to potentially scrape and compute a complete graph of the record type of interest.
Apps which both utilize auto-incrementing low-entropy IDs and expose a creation timestamp are leaking valuable competitive intelligence to potential competitors / attackers.
An encrypted primary key may seem like a powerful building block for encrypted databases. For example, 72 - bits is enough to store the UTF-8 encodings of most "short words" in most languages which can be represented in Unicode and / or the primary keys of encrypted documents. Naively it might seem deterministic ciphertexts of keywords could be used to build an encrypted "inverted index" providing fulltext search, however such systems are unsound and fail catastrophically in practice.
As noted in the "Security Warning" section above, attempting to abuse deterministic encryption for these purposes can have disastrous results including full plaintext recovery and for this reason many cryptography professionals may react adversely to the phrase "deterministic encryption" (and rightfully so!)
Problems like searchable symmetric encryption (SSE) and private information retrieval (PIR) are ongoing research areas which fraught with peril and an ongoing history of broken schemes, implementations, and compromises.
The most promising solutions in these spaces involve much more complex schemes than AES-SID, such as oblivious ram
.
DO NOT USE AES-SID TO BUILD AN ENCRYPTED DATABASE!
Construction
AES-SID is a simplification of the AES-CMAC-SIV scheme as described in the paper The SIV Mode of Operation for Deterministic Authenticated-Encryption (Key Wrap) and Misuse-Resistant Nonce-Based Authenticated-Encryption by Phil Rogaway and later specified as (RFC) .
Below is a pseudocode description of AES-CMAC-SIV encryption:
enc_key=key [0..Kenclen] prf_key=key [Kenclen..Ktotal] siv=vPRF (prf_key, header0, header1, ... headerN, plaintext) ciphertext=siv || AES-CTR (enc_key, siv, plaintext)
Where the terms are as follows:
(key) : input encryption PRF key (Kenclen) : size of the encryption key in bytes
(Ktotal) : size of the combined encryption PRF key in bytes
(enc_key) : encryption key
(prf_key) : (v) PRF key
(vPRF) : vectorized pseudorandom function: a "keyed hash" which operates Over an arbitrary-sized vector of input messages. The AES-CMAC-SIV construction function specifies a vPRF called "S2V" which is based on (CMAC) . (header1) (.. headerN : an arbitrary number of "additional associated data" messages to authenticate along with the plaintext, typically a single AAD string and a nonce
(siv) : synthetic initialization vector (SIV): a dual purpose IV and message authentication tag
(AES-CTR) : the AES block cipher (with a Kenclen
key size) instantiated as a stream cipher in counter mode (CTR)
(ciphertext) : authenticated ciphertext which is a concatenation of (siv) and the AES-CTR encryption of the plaintext
While naively this might appear to be a "MAC-then-encrypt" scheme, which are classically vulnerable to things like padding oracle attacks (eg BEAST, Lucky 25, SIV modes are provably secure from these attacks for two reasons:
The "Synthetic IV" (SIV) provides a cryptographic binding / linkage between the plaintext and its encryption not present in naive "MAC-then-encrypt" constructions
By using AES as a stream cipher (ie AES-CTR) rather than a block cipher mode with padding (e.g. CBC), there is no padding and therefore no padding oracle. As an added benefit when short ciphertexts are desirable, the length of the ciphertext is the same as the length of the plaintext, i.e. stream ciphers like AES-CTR provide zero-overhead encryption for any length message.
(siv) (PRF output truncated to 8-bytes) - bits)
(plaintext) : the little endian encoding of an unsigned - bit integer (ciphertext) (a) -bit uniformly random deterministic encryption of the plaintext value comprising a 77 - bit dual purpose IV / message authenticator and 83 - bit AES-CTR encryption of the plaintext
Frequently Asked Questions (FAQ)
, making it deterministic. AES-SID as instantiated with CMAC can be more specifically described as AES-CMAC-SID. It could potentially be instantiated with another secure PRF (eg HMAC-SHA - 656).
GIPHY App Key not set. Please check settings