Searchable Encryption, is it secure?

IQrypt uses two main types of encryption schemes:

  1. Probabilistic
  2. Deterministic

An accurate definition of probabilistic encryption can be found on Wikipedia, basically it means that if we encrypt a message with a certain key it will yield, in general, to a different cipher text at every call of the encryption function. Such scheme it is considered to be semantically secure .

In contrast, if we encrypt a message M with a deterministic encryption scheme with a key K it will always yield to the same cipher text C. Deterministic encryption schemes are vulnerable to statistical frequency analysis attacks.  Basically the attacker may learn information by checking the frequency of a certain cipher text in the database. However if deterministic schemes are used properly, it may offer the same level of security as probabilistic ones. Let’s analyze it more deeply.

As probabilistic schemes, IQrypt SDK provides :

  • RND – Randomized encryption
  • ROPE – Randomized order preserving encryption
  • BITMAP – Client side bitmap indexes

As deterministic encryption schemes, IQrypt SDK provides:

  • DET – Deterministic encryption
  • HMAC – Keyed-hash scheme
  • OPE – Order preserving encryption

Let’s see how IQrypt SDK uses those schemes to provide maximum security.

IQrypt works best with Document Oriented Databases like MongoDB or Couchbase, so the goal is to encrypt JSON documents. Usually JSON Documents are stored in Collections, a collection can be associated with Tables in relational databases. Every document consists in a list of key-value pairs, the values can be also another list of key-value pairs, arrays or simple types. Within every document it is embedded a special field called “Id” – unique identifier within a collection.

To obtain the desired security level, we will modify this data model, so our Document will have the following model:

  • Id – unique identifier
  • Content – document content as described above and where resides all data.
  • Tags – a simple key-value pair list

Our process flow will be as following: before encryption, we will extract the keywords from the Document content that we want to search by and put them as Tags. So in fact, we will store some data redundantly, once inside Document content and once within the Tags. The encryption will be done as follow:

  • Id – will be let plain text because usually should not reveal anything sensitive.
  • Content will be encrypted RND.
  • Each tag value will be encrypted with different encryption scheme based on the search needs:
    • HMAC – keyed-hash message authentication code used for equality/non-equality queries
    • DET – deterministic encryption used for equality /non-equality queries
    • OPE – deterministic order preserving encryption used for equality/non-equality queries and also for range queries
    • ROPE – randomized order preserving encryption used for equality/non-equality queries and also for range queries
    • BITMAP – client side encrypted bitmap indexes used for low cardinality tags capable to sustain equality/non-equality queries and also range queries
    • PLAINTEXT – plain text, no encryption, all types of queries

RND- Randomized encryption scheme

Document content is encrypted with this scheme.This scheme uses AES and/or Camellia as block ciphers with a key size of 128 and/or 256 bits and as mode of operation we use CBC with a random initial vector (IV). Such scheme is semantically-secure providing indistinguishability under chosen plaintext attack (IND-CPA).

HMAC – Keyed-hash scheme

HMAC – keyed-hash authentication it is one-way function, it means once you apply this function over a plain text you cannot reverse it. For this scheme we use SHA-256 as hashing algorithm. This construction allows us to make equality/non-equality queries for Tag values. Another important feature of this function is the fact that for any input size, the output has always the same size (256 bits in our case), so you can combine, for example, multiple fields values from the Document and put it as one Tag. Decryption of this tag value is not needed in our case, because we have the actual value within the document anyway, so using such a query, the server will return us the matched documents and we’ll get the values after we decrypt the document; the Tag being used only for the search purpose. Being a deterministic scheme, an adversary might perform a statistical analysis. To avoid that, we recommend to use this scheme when you want to search by multiple fields, example: FirstName, LastName and ZipCode and when those fields/tags combined has very high cardinality, ideally unique per collection. In this way it will provide the same security level as RND.

DET – Deterministic encryption scheme

Deterministic encryption scheme is used to encrypt Tag values and letting the cloud server to run equality or non-equality queries. For this scheme we use AES and/or Camellia block ciphers in CBC mode with synthetic IV (initial vector). Being a deterministic scheme, an adversary might perform a statistical analysis. To avoid that, we recommend to use this scheme when Tags values have high cardinality, ideally unique per collection. Example of such values: SSN(Social Security Number), Email, etc; such values are usually unique per collection. In this way it will provide the same security level as RND.

OPE – Order Preserving Encryption

This is a deterministic encryption scheme and it is used to encrypt Tag values and letting the cloud server to run equality or non-equality queries but also range queries. This scheme is very efficient because it let the server create indexes and make queries as efficiently as for unencrypted data. Besides the equality, the encrypted values leaks also the order which can be insecure if it is encrypted values with low cardinality. Therefor, we recommend to use this scheme to Tags values with high cardinality, example: DateTime and when not all (possible) values of plain text can be found as cipher text in the database. In this way it will provide a strong security level.

ROPE – Randomized Order Preserving encryption scheme

This is a probabilistic encryption scheme and it is used to encrypt Tag values and letting the cloud server to run equality or non-equality queries but also range queries. This scheme is very efficient because it let the server create indexes and make queries as efficiently as for unencrypted data. It is probabilistic because the cipher text is randomized, so for the same plain text and same encryption key, the cipher text will be different (with a certain probability) for consecutive calls of the algorithm. In this way there will be no frequency information leaked, however the order is still preserved, so if we have 2 messages a and b if a<b then Enc(a)<Enc(b). As noticed, the server cannot execute directly equality queries since Enc(a), most likely, is different on each run; however the equality queries are still possible by transforming the client equality query in a range query. From security perspective, this scheme is recommended to encrypt Tags with medium to high cardinality.

BITMAP – Client side bitmap indexes

Bitmap indexes work well for low-cardinality tags, which have a modest number of distinct values, either absolutely, or relative to the number of records/documents that contain the data. The extreme case of low cardinality is Boolean data (True, False). Bitmap indexes(compared with B-Tree structures) are very efficient about storage; for example if we have 1 million JSON documents, the total index size for holding a tag value is about 100KB. If we have a Boolean tag it will be needed to store 2 indexes, one for True values and one for False values so in total we will have 200KB. Usually bitmap indexes are kept in RAM; for a server, nowadays, this amount of data means nothing, but if we keep it on the client and if cardinality is higher, then storage may be problematic. Fortunately the bitmap indexes can be compressed and reduce drastically its size. IQrypt uses EWAH described here and thanks to it, we can use bitmap indexes only on the client side. So basically this type of index is fully encrypted on the client side with RND encryption scheme and stored encrypted on the server. When we run a query for a Tag value encrypted with this scheme, we will execute in fact 2 queries:one that download the entire (compressed and encrypted) bitmap index and one to get the documents that matches the query by providing to the server the identifiers got from bitmap indexes.( As an idea, we tested on many real world scenarios and the entire index compressed with EWAH for low cardinality tag values usually remain under 10KB). So we recommend to use this scheme if you have low cardinality tag values like Boolean, Sex, Statuses or even medium cardinality like for example Salary or Country if there is a decent total amount of documents stored in a collection. Since it is encrypted with RND it offers same degree of security as RND scheme.

To summarize: entire JSON document is always encrypted with most secure encryption scheme RND. The keywords (Tags) can be encrypted with different encryption scheme, BITMAP and ROPE for low to medium cardinality; DET, HMAC and OPE for high cardinality and PLAINTEXT for non-sensitive values.

Using combinations of these encryption schemes, IQrypt will allow you to store data highly secure and at the same time allow server to execute queries without decrypting the data.

The SDK is open source and it can be found on Github.

Encrypt and stay safe!

Leave a Reply