Implementation, Results and Conclusions

BCH Codes:
Implementation
Two approaches were taken towards implementing the BCH decoder. In the first approach, a look up table was created that stored all the different exponents of the primitive element a.  In the second approach, the required exponent values of a were calculated at run time. All coding was done in the ‘C’ programming language on Linux platform.
 

Results of the two approaches:
The graphs below show the performance of both the approaches for various parameters.


graph1  
Graph1:
The graph above shows the variation of the reconciliation time with respect to number of differences between the sets. The dotted line is a linear fitting for the values obtained with the implementation that used look-up tables. The axes are colored according to the respective plots they refer to.







graph2
 Graph 2:
The graph above shows the variation of the reconciliation time with respect to the length of the codewords.
 



Comparison between performance of Characteristic Polynomial Interpolation scheme and Error Correcting Code (BCH Code) scheme:

The table below lists the time taken in milliseconds to synchronize two sets with 16 bit elements for various number of differences between them.



Time in ms for size = 16 bits 


BCH
BCH
CPI Sync
Differences
(with look-up tables)
(without look-up tables)

40
  270 
17000
50
50
340
22000
90
60
410
27000
110
70
480
32000
160
80
540
38000
220
90
610
43000
290
100
690
49000
340
110
750
55000
430
120
820
61000
520
130
900
67000
640
140
970
73000
770
150
1040
79000
910
160
1110
85000
1080
170
1180
93000
1220
Table 1


The graph below shows a plot of the data from the table:

graph3
 Graph 3:
  Comparison of BCH scheme and CPI scheme for 16 bit vectors.

Note: All data collected by running the code on a PIII with 196 Mb of RAM, running Linux (Red Hat 7.0)
 
Conclusions drawn from the BCH Error-Correcting-Code Set-Reconciliation scheme:

From Graph1, it appears that the reconciliation time for the BCH scheme with and without look-up tables grow linearly as the number of differences increase. The amount of memory required by the BCH scheme when implemented with a look up table is, however, an exponential function of the length of the code. (Q(2n)). As is evident from Graph1, there is a significant difference in the time taken by the two approaches, a difference of almost 69 seconds for 130 differences between the sets. From Graph 3, it is also clear that for small differences between the sets, CPI synchronization consumes less time but as the number of differences between the sets increases, the BCH scheme, increases almost linearly but the CPI synchronization scheme increases quadratically. The only drawback of the BCH scheme is that the constants involved in the running time are relatively large. And even between the two approaches to the BCH scheme, the scheme with look-up tables has a much smaller constant when compared to the approach without look-up tables.
From Graph2, it is clear that the running time of the BCH scheme, in both approaches, is exponential in the length of the codeword. This fact affects the constant term involved in the running time for a given length of codeword.
The maximum size of the look – up table is bounded by the amount of memory available and the maximum value that an integer variable can assume. This is because the array index has to be an integer.
The large constants involved in the scheme without look-up tables makes it a non-practical scheme for set reconciliation. Also, the amount of memory that is required for the scheme with look-up tables is not practical when synchronizing large databases whose contents may require representation with up to 512 bits.
A simple analysis of the BCH decoding scheme reveals that the maximum amount of time (more than 99% of the total running time) is taken up by the final stage of the decoding process – the Chien Search. Each bit of the vector has to be examined to check if it is in error. If the data is represented by 16 bits, the length of the characteristic vector would be 216 and each bit of this vector has to be checked for error in the Chien Search. This leads to the possibility of usage of this scheme in scenarios where either only deletions or only insertions take place. If we knew before hand that a particular server is going to make only deletions in its data set, we would just need to examine each bit of the characteristic vector on the client that is “set” to see if it is in error. Again, if we know before hand that the server is going to make only insertions, we can model these insertions as deletions. We can check all the “unset” bits in  the characteristic vector on the client to see if they are in error.  The running time of such an operation would be dependent only on the number of entries in the set. Again, this scheme would be useful only if the data set on the server has a small number of initial entries on which insertions are performed or if it has a very large number of entries on which deletions are performed.


Tornado Codes:

Implementation issues and conclusions:
Tornado codes are difficult to implement, as there isn’t much documentation available on the codes. Designing good graphs for the codes involves complex differential equations. The code that was used in the project was of very low rate (less
than 1/2). The basic idea behind using error-correcting codes for the Set Reconciliation problem was to use the redundant bits in an error-correcting code to synchronize two hosts. In this implementation the number of redundant bits exceeded the number of message bits and so, wasn’t really practical to apply it to the Set – Reconciliation problem.



LDPC Codes:

Implementation issues and conclusions:
An implementation of LDPC codes developed by Radford McNeal2 and David MacKay3 was used in the study. The software is capable of simulating memoryless channels, including Binary Symmetric Channels and Additive White Gaussian Noise Channels.
Results:
The code wasn’t s able to decode some arbitrarily random vectors. When synchronizing databases that contain critical information, it is absolutely necessary that the two hosts reflect the same data. The fact that the LDPC codes cannot correct all possible errors, however, make it a bad choice for application to the Set Reconciliation problem.



Although the results of this project clearly indicate that Error Correcting Codes can be used for the Set Reconciliation problem, most of the codes available right now are not well suited for this purpose.