# Performance of CPI-Based Algorithms on Set Reconciliation

## Introduction to Algorithms

### CPIsync

In order to synchronize two sets via a communication channel, Our Basic CPI sync protocol requires a number of "samples" of one set (this number must exceed the number of symmetric differences between the two sets) to be sent to the other side, through a computational procedure known as interpolation using both received and local samples, all the differences can be then determined and sent back if needed.

Detailed description of Basic CPI algorithm can be found at the project page of CPIsync.

### ProbCPI

Recall that the Basic CPI requires knowledge of an upper bound on the number of symmetric differences in order to know how many samples must be communicated. The main computational bottleneck of the basic CPIsync is its computation complexity, which is cubic in the upper bound of symmetric differences. Thus the computation cost would be unnecessarily expensive when the upper bound guess is conservative. If no bound is known on the number of symmetric difference, instead of running a CPIsync with conservativly guessed upper bound, one can starts a CPIsync with relatively small upper bound, whose computation complexity is acceptable, and repeatly try CPIsync with doubled upper bound if the previous one fails. We call this probabilistic approach ProbCPI.

## Performance

### Interactive CPI (I-CPI)

The computational complexity of CPIsync can be further reduced by using interaction. Interactive Set Reconciliation (I-CPI) utilizes a "divide-and-conquer" approach that recursively partitions the space of all bit-strings into pre-determined number of partitions. The partitioning continues recursively until the number of differences between the sets, restricted to each subspace, is at most some constant, in which using one CPIsync call (with the constant served as the upper bound) can determine all the corresponding differences on that subspace.

### Priority CPI (P-CPI)

In some applications, data may have priorities determined by some attribute (e.g. status, source, timestamp). In these cases, hosts may need to conduct the synchronization in a prioritized fashion, for example, synchronizing data objects with higher priority first. In the sense of partitioning, I-CPI partitions sets by space ranges, and P-CPI partitions sets by priorities of elements, both of them contribute to reducing the size of synchronization problem to solvable, and can be applied together.

-- JiaxiJin - 21 Jun 2011