Eu-Jin Goh March 16, 2004
Introduction
- Using hash tables are unsuitable for indexing encrypted (and presumably sensitive) documents because they leak information about the document contents (hence break semantic security).
- Informally, a secure index allows users with a "trapdoor" for a word x to test the index only for x; the trapdoors can only be generated with a secret key.
- The secure indexes do not hide information such as doucment size that can be obtained by simply examining the encrypted documents.
contribution:
(1) defining a secure index and formulating a security model for indexes known as semantic security against adaptive chosen keyword attack(IND-CKA). consider a document D containing n words, of which an adversary already knows m words and wants information about the set W of n − m unknown words. Even when the adversary A has full access to other index-document pairs, and can adaptively obtain trapdoors for any word except those in W, A still cannot deduce any information about any word in W from D’s index.
(2) an efficient IND-CKA secure index construction called Z-IDX using pseudo-random functions and Bloom filters. Search scheme requires only O(1) search time per document, and can handle variable length words, and boolean and certain regular expression queries.
applications: searching on encrypted data; accumulated hashing and testing set mmbership securely.
IND-CKA Secure Indexes
Semantic Security Against Adaptive Chosen Keyword Attack(IND-CKA)
Objective: Capture the notion that an Adversary A cannot deduce a document's contents from its index, other than what it already knows from previous query results or from other channels.
Suppose the challenger C gives the adversary A two equal length documents V0 and V1, each containing some (possibly unequal) number of words, together with an index. Here, A’s challenge is to determine which document is encoded in the
index. If the problem of distinguishing between the index for V0 and V1 is hard, then deducing at least one of the words that V0 and V1 do not have in common from the index must also be hard. If A cannot determine which document is encoded in the index with probability non-negligibly different from 1/2, then the index reveals nothing about its contents. We use this formulation of index indistinguishability (ind) to prove the semantic security of indexes.
Constructing IND-CKA Secure Indexes
review Bloom Filters, pseudo-random functions, pseudo-random generators
-
pseudo-random functions: PRF is computationally indistringuishable from a random function -- given pairs (x1,f(x1,k)), ... (xm,f(xm,k)), an adversary can not predict f(xm+1,k) for any xm+1.
Pseudo-Random Generators. Intuitively, a pseudo-random generator outputs strings that are computationally indistinguishable from random strings.
More precisely, we say that a function
G : {0, 1}n -> {0, 1}m where m > n is a (t, e)-pseudo-random generator if
secure index construction uses pseudo-random functions as shown in the following lemma
Construction: