# String Reconciliation based on Characteristic Polynomial Interpolation Synchronization and Puzzle Pieces

## What is CPISync?

### Characteristic polynomial interpolation for fast data synchronization

Available in postscript(5.7M) or pdf(372k)

Characteristic Polynomial Interpolation-based Synchronization or CPISync, is a synchronization scheme that relies on information-theoretic research results. Many currently used synchronization schemes, such as the slow sync used in PDA - PC synchronization, are particularly inefficient, in terms of bandwidth usage and latency, as they choose to transfer full sets of data between two hosts whereas the actual number of differing entries is typically much smaller that the total number of records stored. The most salient property of CPISync is that the amount of communication needed for synchronizing the databases of two hosts relates only to their mutual differences.

The basic approach taken by CPIsync is a hashing of the data into a certain type of polynomial known as characteristic polynomial. Both the hosts maintain their own polynomial. When synchronizing, one of the hosts sends to the other evaluations of its polynomial at a number of random points. This number must exceed the number of differing entries. These evaluations suffice for the host to determine all the differences, through a computational procedure known as interpolation. Our implementation shows that the computational complexity of CPIsync is affordable in most reasonable scenarios, and the time taken to perform synchronization is generally much smaller than with slow sync. The CPIsync scheme thus carries the potential for a tremendous performance improvement over current synchronization protocols.

**Read a related paper**

Y. Minsky, and A. Trachtenberg, Scalable Set Reconciliation. Available in ps (375k) or pdf (277k).

## String Reconciliation with CPISync

### Bandwidth Efficient String Reconciliation using Puzzles

Available in pdf(216k)

The problem of string reconciliation appears in a number of applications, most prominently in ﬁle synchronization. In network applications where frequent updates are made to copies of the same ﬁle, string reconciliation can be used to share the updates with one another. It can be used to reconcile document replicas in replicated ﬁle systems.

Data distribution systems can leverage the similarity between the current and earlier versions of data to transmit updates efﬁciently. Internet web server mirroring is a good example of an application where current and previous versions of ﬁles need to be reconciled repeatedly and efﬁciently. Image reconciliation can be thought of as a two dimensional generalization of string reconciliation, and, in general, string reconciliation algorithms can be used as a basis for reconciling various types of structured data.

The idea behind our approach to string reconciliation is to divide each string into a multi-set of “puzzle pieces”. These multi-sets are reconciled using CPISync a distributed algorithm for set and multi-set reconciliation. In the ﬁnal step these “puzzle pieces” comprising the multi-set are put together, like a puzzle, in order to form the original string data.

**Read a related paper**

V. Chauhan, and A. Trachtenberg, Reconciliation Puzzles. Available in ps (138k) or pdf (102k).

## Downloads

User's interested in downloading the CPISync source and the StringRecon source should be aware of two things:

1) The source was built and tested on a Fedora Core 3 System with gcc 3.3.6. The source includes the GPL'd GMP and NTL packages. To properly build GMP, gcc version 3.3.x is REQUIRED. Gcc 3.4.x causes a compile error. This is a known problem in the GMP code.

2) The purpose of this project was to produce a prototype system. While this code has been tested, there still may exist some bugs in the code, and users should be cautioned before committing their data to our system.

Download the CPISync Source Here. (9.8M)

Download the String Reconciliation Source Here. (9.8M)