The structure of cycle-free codes

We start with a simple theorem, which gives a tight upper bound on the minimum distance of high-rate cycle-free linear codes.

Theorem 3..3   Let $\ensuremath{\mathbb{C}}$ be an (n, k, d ) cycle-free linear code of rate k/n$ \ge$0.5. Then d$ \le$2.

Proof.  Let H be the r×n cycle-free parity-check matrix for $\ensuremath{\mathbb{C}}$. We assume without loss of generality that H has full row-rank and that r = n - k, since otherwise we can remove the linearly dependent rows of H while preserving the cycle-free property. Let $ \eta_{i}^{}$ denote the number of columns of weight i in H. If $ \eta_{0}^{}$$ \ne$ 0 then d = 1, and we are done. Otherwise, we have at least

$\displaystyle \eta_{1}^{}$ + 2(n - $\displaystyle \eta_{1}^{}$)  =  $\displaystyle \eta_{1}^{}$ + 2($\displaystyle \eta_{2}^{}$ + $\displaystyle \eta_{3}^{}$ + ... + $\displaystyle \eta_{r}^{}$$\displaystyle \le$ wt(H) (9)

ones in H. Moreover,

 $\displaystyle \le$ wt(H$\displaystyle \le$ n + r - 1 (9)

in view of (3.3). Substituting r = n - k into (3.5), this inequality readily reduces to $ \eta_{1}^{}$$ \ge$k + 1. Since k/n$ \ge$0.5, it follows that k$ \ge$r and $ \eta_{1}^{}$$ \ge$r + 1. This means that the number of weight-one columns in H is greater than the number of rows in H. Hence H contains at least two columns of weight one that are scalar multiples of each other, and d = 2.

width4pt depth2pt height6pt

Theorem 3.3 implies that the (n, n-1, 2) code $ \cal {E}$n, whose parity-check matrix consists of a single all-1 vector, is, in a sense, the optimal cycle-free code of rate $ \ge$0.5. Since all such codes have distance d$ \le$2 and $ \cal {E}$n has the highest rate. The cycle-free Tanner graph for $ \cal {E}$n is depicted in Figure 3.2a.

Figure 3.2: Tanner graphs for $ \cal {E}$n and for a general low-rate cycle-free code.

To show that the bound of Theorem 3.3 is tight for all n and k, with n/2$ \le$k$ \le$n - 1, we may start with the single-parity-check code $ \cal {E}$k+1 and repeat any symbol (or symbols) in  $ \cal {E}$k+1 until a code of length n is obtained. The following lemma shows that this always produces an (n, k, 2) cycle-free code for k$ \ge$n/2.

Lemma 3..4   Let $\ensuremath{\mathbb{C}}$ be a cycle-free code of length n and dimension k. Fix a positive integer i, with i$ \le$n, and let $\ensuremath{\mathbb{C}}'$ be the code obtained from $\ensuremath{\mathbb{C}}$ by repeating the i-th symbol in each codeword. Then $\ensuremath{\mathbb{C}}'$ is a cycle-free code of length n + 1 and dimension k.

Proof.  The length and dimension of $\ensuremath{\mathbb{C}}'$ are obvious. To see that $\ensuremath{\mathbb{C}}'$ is cycle-free, observe that a Tanner graph T' for $\ensuremath{\mathbb{C}}'$ can be obtained from the cycle-free Tanner graph T for $\ensuremath{\mathbb{C}}$ by introducing two new vertices x' and y' and two new edges: (x', y') and (xi, y'). Clearly this procedure does not create new cycles.

width4pt depth2pt height6pt

Let $\ensuremath{\mathbb{C}}$ be a cycle-free code of length n, and let $\ensuremath{\mathbb{C}}^*$ be the code of length n + $ \delta$ obtained from $\ensuremath{\mathbb{C}}$ by iteratively applying $ \delta$ times the procedure of Lemma 3.4, while possibly choosing a different value of i at different iterations. We then say that $\ensuremath{\mathbb{C}}^*$ is a code obtained by repeating symbols in  $\ensuremath{\mathbb{C}}$. To make our terminology precise, we further extend the notion of codes obtained by ``repeating symbols in $\ensuremath{\mathbb{C}}$'' to also include the codes obtained from  $\ensuremath{\mathbb{C}}^*$ by appending all-zero coordinates. The following proposition shows that every low-rate cycle-free linear code has this structure.

Proposition 3..5   Let $\ensuremath{\mathbb{C}}$ be an (n, k, d ) cycle-free linear code over IFq of rate k/n$ \le$0.5. Then, up to scaling by constants in IFq at certain positions, $\ensuremath{\mathbb{C}}$ is obtained by repeating symbols in a cycle-free code of rate > 0.5.

Proof.  Let H be the r×n cycle-free parity-check matrix for $\ensuremath{\mathbb{C}}$, with r = n - k. Then by Lemma 3.2 and (3.2), we have $
\ensuremath{\operatorname{wt}}(H) \le n+r-1 \le 3r - 1
$, where the second inequality follows from the fact that k/n$ \le$0.5. This implies that H contains at least one row of weight $ \le$2. If this row is of weight one, then the corresponding coordinate of  $\ensuremath{\mathbb{C}}$, say the n-th coordinate, is all-zero. Otherwise, assume without loss of generality that this row is of the form h = (0, 0,..., 0,*,*). Then, up to scaling the last two columns of H by constants in  IFq, we may further assume that h = (0, 0,..., 0, 1, - 1). This would mean that the n-th symbol in  $\ensuremath{\mathbb{C}}$ is a repetition of the preceding symbol. In both cases, we can puncture-out the n-th coordinate of  $\ensuremath{\mathbb{C}}$, and recursively repeat the argument until a cycle-free code of rate > 0.5 is obtained. 

width4pt depth2pt height6pt

Loosely speaking, Proposition4 implies that every cycle-free code $\ensuremath{\mathbb{C}}$ of rate $ \le$0.5 can be represented by a Tanner graph whose structure is shown in Figure 3.2b. The dashed line in Figure 3.2b encloses a cycle-free Tanner graph for a code $\ensuremath{\mathbb{C}}'$ of rate > 0.5 and distance $ \le$2. It follows that to establish a bound on the minimum distance of low-rate cycle-free codes, we need to determine an optimal choice for $\ensuremath{\mathbb{C}}'$ in Figure 3.2b and an optimal sequence of symbol repetitions. This problem is considered in detail in the next two sections.

Specifically, we will show in Section 3.4 and Section 3.5 that the single-parity-check code constitutes an optimal choice for $\ensuremath{\mathbb{C}}'$, and every symbol should be repeated equally often.