The Lexicographic Construction

A q-ary, length n, minimum distance d lexicode is traditionally defined constructively based on a lexicographic (i.e., dictionary) ordering of vectors in which, for example, 01111 comes before 10000. The construction starts with the set $\mathcal {S}$ = {0} and greedily adds, until exhaustion, the lexicographically earliest vector whose Hamming distance from $\mathcal {S}$ is at least d.

For example, the codewords of the binary lexicode (i.e., q = 2) of length n = 3 and minimum distance d = 2 are marked by a $\bullet$ in Figure

and would be computed left-to-right across the figure.

**Figure 1:** A simple n = 3, d = 2 binary lexicode.
$\begin{figure}\begin{center} \begin{tabular}{\vert c\vert c\vert c\vert c\vert c... ... & $\bullet$ & $\bullet$ & \ \hline \end{tabular}\par\end{center}\end{figure}$

This greedy construction always generates a linear code [3,5,9]. Thus, we may completely describe a dimension k lexicode by finding k basis vectors (known as ``generators'') using what we call the lexicographic construction. This construction starts with the zero code $\ensuremath{{\mathcal{L}}_{0}^{d}}={\mathbf{0}}$ and greedily adds the lexicographically earliest vector of distance d from the linear space spanned by the previously added vectors; k such iterations form the dimension k code $\ensuremath{{\mathcal{L}}_{k}^{d}}$ . Table 1 demonstrates this construction for d = 3; the resulting code is a (7, 4, 3) binary code, meaning that it has length 7, dimension 4, and minimum distance 3.

Table 1: Generator matrix for the dimension 4, minimum distance 3 binary code $\mathcal {L}$ ₄³. The generator paddings are in bold.

0000111

0011001

0101010

1001011

We may understand the lexicographic construction more analytically by making use of the covering radius of each intermediate code $\ensuremath{{\mathcal{L}}_{i}^{d}}$ in the iteration. The covering radius of a length n code is the smallest integer $\rho$ with the property that Hamming balls of radius $\rho$ centered at codewords of the code will cover every vector of $\mathbb {F}$ _qⁿ. In other words, $\rho$ is the maximum distance of a vector in $\mathbb {F}$ _qⁿ from the code. As an example, one can readily see that the binary lexicode in Figure 1 has a covering radius of 1 because every vector in $\mathbb {F}$ ₂³ is at most of Hamming distance 1 from a code vector.

An iteration of the lexicographic construction on an intermediate code $\ensuremath{{\mathbb{C}}}$ whose covering radius is $\rho$ and minimum distance is d can thus be understood as the addition of a generator vector:

$\begin{displaymath} \langle 1^{d-\rho}\; \vert\; \ensuremath{f_{\mbox{\small lexi}}}(\ensuremath{{\mathbb{C}}}) \rangle \end{displaymath}$

$\begin{multline} \ensuremath{f_{\mbox{\small lexi}}}(\ensuremath{{\mathbb{C}}}) ... ... d(w,\ensuremath{{\mathbb{C}}}) < d(v,\ensuremath{{\mathbb{C}}}), \end{multline}$

$\displaystyle \mathbb {F}$ _q^* = $\displaystyle \bigcup_{i=1}^{\infty}$ $\displaystyle \mathbb {F}$ _qⁱ.

The linear codes generated by the lexicographic construction described above are precisely the lexicodes.

Theorem 1 Over a finite field $\mathbb {F}$ _q, a dimension k, minimum distance d code $\ensuremath{{\mathbb{C}}}$ is a lexicode if and only if it is produced by the lexicographic construction, namely $\ensuremath{{\mathbb{C}}}=\ensuremath{{\mathcal{L}}_{k}^{d}}$ .

Proof: We first prove that $\mathcal {L}$ _k^d is always a lexicode, by induction on k for an arbitrary, fixed d. For the base case it is clear that $\ensuremath{{\mathcal{L}}_{1}^{d}}= \left\{0^d,1^d\right\}$ is a (d, 1, d ) lexicode. Now assume as an inductive hypothesis that $\mathcal {L}$ _k^d is a dimension k, minimum distance d lexicode. From the definition of the lexicographic construction, $\mathcal {L}$ _{k + 1}^d has parameters (n + d - $\rho$ , k + 1, d ), where $\rho$ is the covering radius of $\mathcal {L}$ _k^d. Consider the (n + d - $\rho$ , k', d ) lexicode $\ensuremath{{\mathbb{C}}}_{k'}$ constructed by repeatedly choosing the appropriate lexicographically-earliest vectors in $\mathbb {F}$ _q^{n + d -}. Clearly, we generate $\ensuremath{{\mathbb{C}}}_k$ in the process of this construction, so that $\ensuremath{{\mathbb{C}}}_k \subseteq \ensuremath{{\mathbb{C}}}_{k'}$ and, by the inductive hypothesis, $\ensuremath{{\mathcal{L}}_{k}^{d}}\subseteq \ensuremath{{\mathbb{C}}}_{k'}$ . Moreover, the vector

$\begin{displaymath} v = \left\langle 1^{d-\rho} \:\vert\: \ensuremath{f_{\mbox{\small lexi}}}( \ensuremath{{\mathbb{C}}}_k )\right\rangle \end{displaymath}$

In addition, any vector v $\mbox{$ \inn $}$ $\mathbb {F}$ _q^{n + d -} that is in $\ensuremath{{\mathbb{C}}}_{k'}$ but not in $\mathcal {L}$ _{k + 1}^d would necessarily have its n right-most bits at distance $\leq$ $\rho$ from $\mathcal {L}$ _{k + 1}^d, and its other bits at distance $\leq \ensuremath {\left\lceil (d-\rho)/2 \right\rceil}$ from $\mathcal {L}$ _{k + 1}^d. Noting that $\rho$ < d, we may now use the triangle inequality to see that the distance from v to $\mathcal {L}$ _{k + 1}^d (and consequently also $\ensuremath{{\mathbb{C}}}_{k'}$ ) must be at most

$\begin{displaymath} \rho + \ensuremath {\left\lceil (d-\rho)/2 \right\rceil}\ = \ \ensuremath {\left\lceil (d+\rho)/2 \right\rceil}\ < \ d. \end{displaymath}$

For the converse assertion of the theorem, we need to show that a non-trivial (n, k, d ) lexicode $\ensuremath{{\mathbb{C}}}$ can be constructed by the lexicographic construction. The code $\mathcal {L}$ _k^d produced by k iterations of the minimum-distance d lexicographic construction is clearly a lexicode, from the first part of the proof. Since all distance d lexicodes are ordered by inclusion, as we saw in the first part of this proof, we may conclude that either $\ensuremath{{\mathbb{C}}}\subseteq \ensuremath{{\mathcal{L}}_{k}^{d}}$ or $\ensuremath{{\mathcal{L}}_{k}^{d}}\subseteq \ensuremath{{\mathbb{C}}}$ . However $\ensuremath{{\mathcal{L}}_{k}^{d}}$ and $\ensuremath{{\mathbb{C}}}$ are both non-trivial⁵ dimension k codes, so it follows that they must be equal.

width4pt depth2pt height6pt

Theorem 1 allows us to bypass the codeword-by-codeword lexicode construction, and instead directly compute the generator matrix of a desired lexicode.