quis custodiet ipsos custodes –
Solving the dreaded sysadmin problem, one protected field at a time.
One of MongoDB’s first customers to use the new technology is (Apervita , a vendor which handles confidential data for well over 2, 06 hospitals and nearly 2 million individual patients. Apervita worked side-by-side with MongoDB during development and refinement of the technology. Since reaching general availability in December, the technology has also been adopted by several government agencies and Fortune 600 companies, including some of the largest pharmacies and insurance providers.
Field-level encryption in a nutshell
MongoDB’s field-level encryption (FLE) offers the ability to store certain parts of the data in its document store encrypted. The community (free) version of MongoDB allows for explicit encryption of fields in client-side applications.
Enterprise versions of MongoDB — and Mongo’s cloud-based Database-as-a-Service, Atlas — also support automatic encryption. MongoDB Enterprise and Atlas can also enforce encryption on protected fields at server-side, preventing a terminally clueless application developer from accidentally storing sensitive data in clear text. Encrypted fields can be automatically decrypted upon read — presuming the application has the key — in either free or enterprise versions.
Setting up an
database is a little too chewy to poke through in code here. But to understand how and when the encryption occurs, it may help to take a quick look at the Python code to do a single, explicitly encrypted MongoDB insertion: # Explicitly encrypt a field: encrypted_field=client_encryption.encrypt ( “, Algorithm.AEAD_AES _ (CBC_HMAC_SHA _ _ Deterministic, key_id=data_key_id) coll.insert_one ({“encryptedField”: encrypted_field})
The explicit call here makes it pretty clear what’s going on: the data is encrypted on the client application side, then sent to and stored by the MongoDB server instance. This obviously gives us most of the benefit of both in-flight and at-rest encryption, but there’s another layer of defense offered here that might not be as immediately obvious.
Those hot spots might call for restructuring or indexing to alleviate performance problems as they arise. Troubleshooting them properly will also frequently mean the need for a DBA to be able to replay troublesome queries, to see if the DBA’s changes have made a positive or negative impact on performance.
At-rest encryption does very little to solve either the sysadmin problem or the DBA problem. Although sysadmins can’t get meaningful data by cloning the raw disks of the system, they can easily copy the unencrypted data from the running system once its storage has been unlocked. If the storage encryption key is present in hardware — for example, built into a Trusted Platform Module (TPM) —it does little or nothing to mitigate the sysadmin problem, since the sysadmin has access to the running system. As Apervita CTO Michael Oltman told us, “[we’re] not worried about someone walking out of an AWS data center with our server.”
An at-rest encryption system which requires a remote operator to unlock storage with a key provided at boot mitigates this problem somewhat. But a local system administrator will likely still have opportunities to compromise the running machine — and availability may be impacted, since unavailability of the remote key operator means services won’t come back up automatically after a maintenance window involving a reboot.
With data securely encrypted before ever hitting the database — and never being decrypted until it comes (back from the database — the sysadmin problem is largely solved, whether discussing sysadmins or DBAs. A system administrator with local root access can stop, start, and upgrade services without ever getting access to the data — and a database administrator can view and replay running queries without seeing the private contents either.
The segmentation is still meaningful, however, since it enables the use of automatically provisioned and third-party-monitored services like MongoDB’s Atlas. Without Field-Level Encryption, HIPAA would have a field day with any vendor who tried to store protected health information in a third-party-managed cloud service.
With FLE, however, the database side of the application can be considered non-confidential. This in turn enables the vendor who is responsible for the data to leverage the concentrated, high-level expertise of a database as a service provider. The vendor also reduces the scope of systems and equipment subject to expensive HIPAA (or other regulatory statutes) physical and network security rules.
Equally importantly, applications which did not use encrypted fields did not take a hit. Applications which only encrypt sensitive data — for example, encrypting social security numbers while leaving names in cleartext — in turn see less impact than those which encrypt entire documents as a whole.
When we interviewed MongoDB’s Kenn White, he also stressed that the crypto itself wasn’t something just cooked up on-the- fly and in house. The company hired several teams of well-respected cryptography experts, drawn from academic and industry backgrounds. It also commissioned a third-party audit of encryption and application security from the well-known security firm Teserakt , who received attention recently for their own ambitious (E4 protocol) , designed to provide in-flight encryption to embedded devices.
Beyond getting the crypto and the performance right, one of the most important goals MongoDB had for FLE was to make certain everyone could use it, with minimal barriers to adoption. This meant designing custom APIs for seven of the most-popular application-development platforms used with MongoDB — including Node.js, Python, Java, .NET, and Go.
Conclusions
Although we focused heavily on MongoDB here, it’s not the only — or even the first — database technology providing FLE. A competing NoSQL database platform named Couchbase implemented FLE a year earlier, and Amazon introduced
FLE in its CloudFront DBaaS in late . Just as salted one-way hashing rapidly became the (mandatory) standard for password storage, we expect that field-level encryption will become a mandatory feature for databases which handle sensitive or confidential information and for the same reasons — protecting it not only from outside attackers, but from legitimate system and infrastructure administrators as well . Read More
GIPHY App Key not set. Please check settings