Introduction to Private Information Retrieval

(2024-07-17)

Go to: Home · Blog

In Spring 2024, I spent my semester learning the basics of cryptography through an independent research study on the Private Information Retrieval (PIR) problem and space surrounding it. Through this deep dive into PIR, I learned a lot about the cryptography research area. My final paper can be found here.

What is PIR?

The PIR problem explores how can a client query a public database such that the index of the query remained unknown.

For example, suppose a database (server) had 100 indices of 0's and 1's and a client wants to know whether index 20 has a 0 or 1. A solution to PIR is a way in which the client can ask for indices such that the server never learns which index the client wants to know nor gains an advantage at guessing this index. To showcase the latter part, consider the case where a client asks for index 20 and index 25, the adversary than can guess which index the client wants with a $\frac{1}{2}$ chance rather than the $\frac{1}{100}$ chance it originally had, thereby gaining a non-negligible advantage.

Trivial Solution

The trivial solution to this problem would be for the user/client to ask the database for every single index or i.e. the whole server at once. The user can then reindex the database and find the index they care about. Since the server is asked for every index by the client it gains no more information than before the interaction. This solution meets the problem requirement since the client gets the index it cares about and the server does not learn which index the client wants.

Formalization of PIR

Formalization of problems in cryptography including PIR allows for proofs to be easily written and understood. It forms a guideline for understanding both the problem statement and the proposed solution.

Syntax

The first part involves formalizing the problem and interactions in the system. To do so the syntax is introduced (a way of breaking down the problem in a more formal way). In coding, it is similar to an interface, where actions are listed as well as their inputs and outputs.

So for basic PIR we have the syntax:

\text{query}(i) \rightarrow x_i

where $i \in [n]$ and $x_i \in \{0, 1\}$ . Note that $[n]$ is the series $0, 1, ..., n - 1$ for $n \in \mathbb{N}$ where $n$ is the size of the database. Informally this means the only action we have is querying the server for the bit $x_i$ corresponding to index $i$ for a database of size $n$ .

For our trivial solution we will exapand this syntax to include

\text{queryAll()} \rightarrow \{0, 1\}^{n}

where queryAll() returns the entire database.

Correctness and Security

Source [1]

There are two concepts that are to be proved with proposed solutions: correctness and security. Correctness verifies that the series of actions does indeed get you the correct answer. With our syntax we can say that our trivial solution is correct if

Pr[\text{queryAll}()[i] = x_i ] = 1

That is, that when indexing the database returned by $\text{queryAll}$ , the bit at index $i$ is $x_i$ .

Most formal definitions in modern cryptography analyze the change in probability distributions from random interactions and the proposed solution to prove security. Depending on the games chosen and the probability 'leeway' alotted, there are stricter and thus stronger notions of security.

The strongest type of security is known as perfect security (also referred to as statistical security). In this notion, the attacker has the same probability across the distributions of data of guessing the index that the user wants. That is that no matter the distribution trying to figure out which index the user wants is no difference between randomly guessing. Another way of describing this is that no matter what the index is asked for under any distributions of the data, there is no higher probability of guessing the index.

Formally, we let $D_{n, i}$ be the distribution of asking for index $i$ in a database of size $n$ . We then can define the advantage an adversary $\mathcal{A}$ has in guessing the index to be

max_{i, j \in [n]} \{ Pr[\mathcal{A}(D_{n, i}) = 1] - Pr[\mathcal{A}(D_{n, j}) = 1] \}

where $\mathcal{A}(D_{n, i}) = 1$ is the event the adversary correctly guesses the index in the distribution $D_{n, i}$ . If this advantage is $0$ , then we have achieved perfect security. The trivial case can be proved through a reduction by utilizing a pseudorandom function. For the case of simplicity in this introduction we will leave this post to state the security rather than proving it. I've also omitted security parameters.

Note though, that if the advantage of the adversary is not $0$ but still negligible, then we say that the security is computationally secure. For computationally secure constructions/protocols, we assume our adversary is bounded to time and complexity polynomial to the omitted security parameter. For perfect security, we assume our adversary has an infinite amount of time and resources.