SC700 - Modern Information
Protocols

Research Projects

You will be responsible for completing two research projects in SC700, a paper presentation an research on an open problem. Each prescreened project has been assigned a difficulty rating:

**1 point** = *straightforward* **2
points** = *moderate* **3
points** = *challenging*

For presentations,
the difficulty reflects the readability and background needed for the
corresponding paper. For open problems,
the difficulty reflects how well the problem is known and how much related
literature is available.

The project grade will also be curved according to
difficulty, so that an average *challenging*
project might be assigned the same grade as a good *straightforward* project.
Finally, the difficulty of both projects must add up to __at least 4
points__ and you can team up in groups of two on *challenging* projects if you wish.
In all cases, projects will be assigned on a "first-come
first-serve" basis, and no projects may be repeated by more than one
group.

** Paper presentation **(30
points):

This involves reading and fully comprehending one paper from the recent research on a topic relevant to the course. Note that proper comprehension typically entails perusing several other background references, trying several small examples, or looking on the web for resources. You will present the paper during a scheduled fifty minute talk in class.

Your presentation will be graded by me and by your classmates, who will also ask you probing questions about the paper. The primary criteria for grading will be clarity, completeness, and expertise. Please hold to the following timeline for submitting your presentation:

**[1 point]** *September
17*: Paper selection and
presentation date due

**[5 points]** *October
15:* One page abstract
of paper is due *in your own words*

**[8 points]** *1
week before pres.:* Draft copy of
slides due

**[16 points]** *Date
of pres.:* Present your
paper

The following papers have been pre-screened; you are welcome to find your own presentation material, though you will have to solicit my approval for relevance. Most of these are avialable on the World Wide Web (do a "Google" search on the title or go to the author’s web site):

**Data compression**:

__Difficulty 2:__

Amiya Bhattacharya and Sajal K. Das. Leziupdate: An information-theoretic approach to track
mobile users in PCS networks. In Mobile Computing and Networking, pages 1--12,
1999.

__Difficulty 3:__

Joscha Bach, Ian H, "Lexical Attraction for Text
Compression.", Data Compression Conference, DCC
1999, 29 - 31 March, 1999,

J.C. Kieffer and E-H.
Yang, "Grammar based codes: A new class of universal lossless source
codes," IEEE Trans. Inform. Theory, Vol IT-46,
No. 3, pp. 737--754, May 2000.

M. Ajtai,
R. Burns, R. Fagin, D.D.E. Long and L. Stockmeyer, “Compactly
Encoding Unstructured Inputs with Differential Compression”, Journal of the
ACM, to appear.

**Error correction**:

__Difficulty 1:__

__Difficulty __2__:__

R.J. McEliece, "On the BCJR trellis for linear block
codes", IEEE Transactions on Information Theory, vol. 42, pp. 1072-1092,
1996.

J.J. Metzner,
and E.J. Kapturowski, "A general decoding
technique applicable to replicated file disagreement location and concatenated
code decoding", IEEE Trans. Information Theory, Vol. 36, No. 4, July 1990.

__Difficulty 3:__

V.I. Levenshtein,
"Efficient Reconstruction of Sequences", IEEE Transactions on
Information Theory, vol. 47, no. 1, p. 2-23, January 2001.

J.W. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A digital fountain approach to reliable
distribution of bulk data", Proceedings of ACM SIGCOMM '98,

M.G. Karpovsky, K. Chakrabarty, L.B. Levitin, "A new class of codes for identification of vertices
in graphs", IEEE Transactions on Information Theory, March 1998.

M. Sudan, “Decoding Reed
Solomon codes beyond the error-correction diameter”, Proc. 35^{th}
Annual Allerton Conference on Communication, Control,
and Computing, 1997.

Y. Ofek,
“The conservative code for bit synchronization”, IEEE Transactions on
Communications, vol. 38, no. 7, July 1990.

**Cryptography**:

__Difficulty 1:__

J.J. Quisquater and L. Guillou. "How to explain zero-knowledge protocols to your
children", Lecture Notes in Computer Science, 435 (1990), 628-631.

Lidong
Zhou and Zygmunt J. Haas, "Securing Ad Hoc
Networks". IEEE Networks Special Issue on

Network Security. November/December, 1999.

__Difficulty 2:__

__
__A. Juels
and M. Sudan, "A Fuzzy Vault Scheme". Proceedings
of IEEE Internation Symposium on Information Theory,
p.408, IEEE Press,

__Difficulty 3:
__D. Wong and A. Chan, “Efficient and
Mutually Authenticated Key Exchange for Low Power Computing Devices”, Proc.
ASIACRYPT 2001. LNCS
2248 (

R. Gennaro,
Y. Gertner and J. Katz, “Bounds on the Efficiency of
Encryption and Digital Signatures”, DIMACS Technical Report: 2002-22.

L. Knudesn
and B. Preneel, “Construction of secure and fast hash
functions using nonbinary error-correcting codes”,
IEEE Transactions on Information Theory, vol. 48, September 2002.

**Information dispersion:**

__Difficulty 1:__

R. H. Frenkiel, B.R. Badrinath, J. Borras and R. Yates. “The Infostations Challenge: Balancing Cost
and Ubiquity in Delivering Wireless Data”. IEEE Personal Communications
7(2):66-71. April, 2000

__Difficulty 2:__

K.A.S. Abdel-Ghaffar
and A.E. Abbadi, "An optimal strategy for
comparing file copies", IEEE Trans. on Parallel and Distributed Systems,
vol. 5, no. 1, pp. 87-93, January 1994.

J. Byers, J. Considine, M. Mitzenmacher, and

__Difficulty 3:__

L. Kleinrock,
"Information Flow in Large Communication Nets", RLE Quarterly
Progress Report, July 1961.

G. Cormode,
M. Paterson, S.C. Sahinalp, U. Vishkin,
"Communication complexity of document exchange", Proc. ACM-SIAM Symp. on Discrete Algorithms,
2000.

T. Schwarz, R.W. Bowdidge, W.A. Bukhard, "Low
cost comparisons of file copies", Proc. IEEE Conference on Distributed
Computing Systems, p. 196-202, 1990.

L. Fan, P.Cao,
J. Almeida, and A.Z. Broder, "Summary
cache: a scalable wide-area web cache
sharing protocol", IEEE/ACM Transactions on Networking, vol. 8, No. 3,
June 2000.

A. C. Yao,
“Some complexity questions related to distributive computing”, Proceedings of
the 11th Annual ACM Symposium on Theory of Computing, 1979, pages 209—213.

** Open problems **(30 points):

This project involves trying to solve a problem whose solution has not yet been discovered. Clearly, I do not expect you to actually solve these problems, but rather to take the first steps in addressing a real research problem and make a sincere effort to solve it. Students who do completely and clearly solve one of these problems will, after I have certified that the solution is indeed novel and problem yet unsolved, get an automatic A for the course. As with the paper presentation, you may find an open problem by yourself, if you wish, as long as I approve it.

Please hold to the following timeline for submitting your presentation:

**[1 point]** *September
19*: Problem selection due

**-----------** *week** of Sept 24:* Individual meetings to discuss project

**[3 points]** *October
3:* Five relevant references due

**[10 points]** *October
31:* Provide solution to
the simple instance of problem

**[10 points]** *December
3:* Final report due

**[6 points]** *last
2 weeks of class: *Presentation of
results (30 minutes)

Your final report must include a survey of existing literature relating to your problem and a review of three different approaches you attempted to solve the problem.

**Data compression**:

__Difficulty 3:__

Problem: Compression
with error-correcting codes.

Question: Is it
possible to improve data compression in general using an error correcting
code? Specifically, one can take a
message *m* and encode it with an error correcting code *C* into an
encoding *c = Cm*. We may now
introduce “errors” *e* into *c*, up to the error correcting
capability of *C*, and the question is whether it is generally possible to
design *c* and *e* in such a way that *c+e*
can be Lempel-Ziv compressed to a shorter string than *m*.

Instance: Give an
example message of at least 20 bits demonstrating improved compression.

Problem: Entropy of
WWW.

Question: Compute the
entropy of text on the WWW, together with good error bars.

Instance: Compute the
entropy of all text on the official BU web site.

**Error correction**:

__Difficulty 2:__

Problem: Minimum
distance approximation.

Question: Prove or
disprove that no polynomial time algorithm can approximate the minimum distance
of an arbitrary linear code within a multiplicative constant.

Instance: Find the
minimum distance of the linear code with the following generator matrix:

01111111100000000000000000000000000000000000

00000111111110000000000000000000000000000000

00011001100111100000000000000000000000000000

00101010101010110000000000000000000000000000

00000000011111111000000000000000000000000000

00000001101010101110000000000000000000000000

00000010101100011011000000000000000000000000

00000000000001111111100000000000000000000000

00000000000000000111111110000000000000000000

00000000000000011001100111100000000000000000

00000000000000101010101010110000000000000000

00000000000000000000011111111000000000000000

00000000000000000001101010101110000000000000

00000000000000000010101100011011000000000000

00000000000000000000000001111111100000000000

00000000000000000000000000000111111110000000

00000000000000000000000000011001100111100000

00000000000000000000000000101010101010110000

00000000000000000000000000000000011111111000

00000000000000000000000000000001101010101110

00000000000000000000000000000010101100011011

Problem:

Question: Construct a
family of error-correcting codes whose rate attains Shannon’s capacity with a
bit error rate of 10^{-5} over an additive white noise Gaussian channel,
and which can be coded and decoded in polynomial time and space.

Instance: Construct an
LDPC that approaches within 1dB of

__Difficulty 3:__

Problem: Parameters of
lexicodes.

Question: Compute the
length, dimension, and minimum distance of an arbitrary binary lexicographic
code (lexicode).

Instance: Compute the
length of the binary lexicode with minimum distance
12 and dimension 4.

**Cryptography**:

__Difficulty 1:__

Problem: Integer
factorization.

Question: Provide (or
prove non-existent) a polynomial time algorithm for determining some factor of
an arbitrary integer *n*.

Instance: Find the
smallest integer between 10^{7} and 10^{8}, inclusive, whose
prime factors have the smallest sum. If
a factor appears more than once in the factorization, it should be added more
than once to the sum.

__Difficulty 2:__

Problem: Discrete
logarithm.

Question: Provide (or
prove non-existent) a polynomial time algorithm for computing an exponent *e*
such that *b ^{e} = m (mod p)* for arbitrary integers

Instance: Solve for *e*: 5^{e} = 948603 (mod 8187734305878054004900051).

__Difficulty 3:__

Problem: Hash
inversion.

Question: Construct a
family of hash functions that can be computed with a number of (AND, OR, and
NOT) gates linear in the size of their inputs, but whose inverse cannot be
computed in a polynomial number of gates.

Instance: Provide a
hash function that requires fewer gates to compute than its inverse.

**Information dispersion:**

__Difficulty 3:__

Problem: Network
synchronization.

Question: Provide (or
prove non-existent) a polynomial time algorithm for synchronizing an arbitrary
network of *n* hosts, each with a set
of *b*-bit integers. Assume that the amount of communication
needed to synchronize two hosts is proportional to the number of differences
between the two hosts.

Instance: Give an
optimal synchronization for 10 hosts where the *i*-th host has a set of 2^{i} integers disjoint from
the other hosts.

Problem: Comparison
checking.

Question: Provide an
algorithm that uses (provedly) minimum communication
to determine whether two physically separate sets are more than 90% the same or
less than 10% the same (or neither)*.*

Instance: Provide an
algorithm that uses a (provedly) minimum
communication to determine whether two physically separate sets are exactly the
same or not.