chela GitHub

The share format

A chela share is nothing but a short header and a list of ordinary BIP-39 words - the same words a wallet seed uses. Everything needed to recover lives in those words; the header just restates, for humans, what the words already say. This page walks the format field by field, using one running example throughout: the text secret "42" split 2-of-3. It sits between the theory and the exact, byte-level SPEC.md - which is precise enough to write a compatible implementation in another language.

The header line

Each share is two lines of text: a dashed code, then the words. Our example's x = 5 share looks like this:

CHELA-02C9-5-2-3-6
cactus float ghost shine baby talk

The code is a convenience, not a secret, and it only echoes what the words encode:

CHELA - 02C9 - 5 - 2 - 3 - 6
        │      │   │   │   │
        │      │   │   │   └ word count
        │      │   │   └ total shares made (N)
        │      │   └ shares needed to recover (M)
        │      └ this share's coordinate (x)
        └ recovery set id (which split this share belongs to)

Because all of that is inside the words too, a share still recovers if the header is smudged or torn off - chela reads the words and re-derives it.

A share, word by word

Each word is one of 2048 BIP-39 entries, so it stands for an 11-bit number (0-2047). chela packs values into those words most-significant-bit first, in four sections, and no byte ever straddles a section boundary - which keeps the layout checkable by hand. A share is W words (W ≥ 4):

word 0          [ X:5 | M:5 | reserved:1 ]   which share this is, and how many are needed
word 1          [ recovery set id:11 ]                  the recovery set id (same on every share)
words 2 … W-2   [ Y values ]                  this share's piece of the secret
word W-1        [ CRC-11 ]                     a checksum that catches transcription errors
The six words of the x=5 share of the 42 example, grouped into the four sections: word 0 (x and M), word 1 (recovery set id), three body words, and the CRC word.
The six words of the example's x = 5 share, grouped into the four sections. The words alone carry x, M and the recovery set id; any printed label only repeats them.

Word 0: x and M

The first word carries the two numbers recovery cannot run without: this share's own number x, and the threshold M. Both are stored as offsets (x − 1 and M − 2) so the illegal values literally cannot be written down. Bits 10..6 are the x field, bits 5..1 the M field, and bit 0 is reserved and must be 0.

x_field = x − 1     # x in 1..32  → 0..31   (x = 0 is the secret itself, never a share)
m_field = M − 2     # M in 2..32  → 0..30
word0   = (x_field << 6) | (m_field << 1)

For the example's x = 5, M = 2 share that is word0 = 4 << 6 = 0x100 - the word cactus. A decoder rejects the share if the reserved bit is set, or if the M field would decode above the 32-share cap.

Word 0 split into a 5-bit X field, a 5-bit M field, and one reserved bit, for the x=5 M=2 share.
Word 0 of the x = 5, M = 2 share. Storing X and M as offsets means an out-of-range value can't be encoded at all.

Word 1: the recovery set id

Word 1 is the recovery set id: a random 11-bit value drawn once per split and written identically into every share of that split - a batch stamp (the cryptographic name for such a one-shot random tag is a nonce). It lets recovery confirm the shares in front of it belong together and refuse a mix of two unrelated splits. It is not derived from the secret, so it leaks nothing about a low-entropy payload, and re-splitting the same secret draws a fresh id. In the example the draw was 0x2C9 - the word float - identical on all three shares.

The body words: the secret, split

The middle words carry this share's Y values - its points on the curves from the theory. chela uses one curve per byte of the body, all sharing the same x coordinates, with arithmetic over GF(2⁸) (a 256-element field, one byte at a time, no carries between bytes).

One polynomial per body byte; a share is the column of each curve's value at one x.
One curve per body byte. A share is the column of their values at a single x; one share alone reveals nothing about the values at x = 0.

What actually gets split is not the bare secret but a small body:

body = payload ‖ integrity-tag (1 byte) ‖ kind-byte (1 byte)

The payload is the raw secret bytes. For our example the secret "42" is 0x34 0x32, the tag is 0x43, and the kind byte is 0x0B (Text), so the body is 34 32 43 0B. SSS turns those four body bytes into four Y bytes per share (the x = 5 share's are 61 98 BC 44), which pack MSB-first, 11 bits at a time, into the body words. Because 8-bit bytes and 11-bit words do not line up, the last word is zero-padded on the right.

Packing four Y bytes into three 11-bit words, MSB-first, the last word zero-padded on the right.
The x = 5 share's Y = 61 98 BC 44 packed MSB-first: four bytes fill two whole words plus part of a third, the leftover bit is zero padding.

The kind byte: what the secret is, and where it ends

The body's last byte names the payload type. Because it is split inside the body, a single share never reveals what kind of secret it is. It is always non-zero, and the packing pads with zero bits, so once a body is reconstructed the last non-zero byte is the kind byte - which is exactly how recovery pins down the true length despite the 8-vs-11-bit misalignment.

kind bytemeaning
0x010x05BIP-39 12 / 15 / 18 / 21 / 24 words (16-32 B entropy), no passphrase
0x060x0ABIP-39 12 / 15 / 18 / 21 / 24 words, with passphrase
0x0BText (1-255 bytes of UTF-8)

Any value outside this table is rejected. A BIP-39 mnemonic is interchangeable with its underlying entropy, so chela splits the compact entropy and re-derives the words on recovery; a passphrase, if present, is appended to the entropy as UTF-8.

The integrity tag: catching the wrong shares

The tag is one byte: the first byte of SHA-256(payload ‖ kind-byte). It binds the whole secret. Combine shares from two different splits whose recovery set ids happened to collide (a 1-in-2048 chance), or otherwise reconstruct a body that isn't the original, and the recovered tag won't match - so recovery fails rather than handing back a plausible-looking wrong secret. It is checked in constant time.

The last word: the checksum

The final word is an 11-bit CRC-11/UMTS checksum (generator polynomial 0x307, no reflection, no final XOR) computed over the share's decoded meaning - its x, M, the recovery set id, and its body bytes. An 11-bit CRC is guaranteed to catch any error confined to a single word, and a mistyped word changes at most 11 adjacent bits, so a transcription slip is rejected up front instead of feeding bad bytes into the math. For the example's x = 5 share the CRC is 0x6EC - the word talk, the sixth and last word.

Why the word count is slightly ambiguous

Because 8 (bits per byte) and 11 (bits per word) only realign every 88 bits, two body lengths that differ by one byte can pack into the same number of words. A single share on its own therefore can't be sure of its exact byte length - it is validated (does a candidate length's CRC match?) but not length-pinned. The authoritative length is decided across the whole set at recovery, using the kind-byte terminator.

How recovery rebuilds the secret

Given any M shares, a decoder accepts the bare word lists and:

1. Per share  - decode word 0 (x, M), word 1 (recovery set id), the CRC word, and the body words;
                reject a bad word, a set reserved bit, or a failing checksum.
2. Agree      - every share must carry the same recovery set id, the same M, and the
                same body-word count; need M shares with distinct x (x = 0 is refused).
3. Rebuild    - Lagrange-interpolate at x = 0, one byte at a time, to recover the body.
4. Terminate  - the last non-zero byte is the kind byte; it fixes the true length.
5. Verify     - recompute the integrity tag over the payload and kind; mismatch → reject.
Recovery: two shares determine one straight line; its value at x = 0 is the secret.
Two shares (threshold M = 2) fix the whole line; extend it back to x = 0 and read off the secret. chela does this once per body byte.

Fewer than M shares leave the body information-theoretically undetermined - there is nothing to brute-force. The Lagrange weights and the field arithmetic are in Why secret sharing; the exact bytes, the CRC algorithm, the GF(2⁸) reduction polynomial, and full test vectors are in SPEC.md.

The whole example

The text "42", split 2-of-3, produced the share CHELA-02C9-5-2-3-6 / cactus float ghost shine baby talk: cactus = word 0 (x = 5, M = 2), float = the recovery set id, ghost shine baby = the body (Y = 61 98 BC 44), and talk = the CRC. The other two shares share float and differ everywhere else; any two of the three rebuild "42". This and other vectors are checkable in SPEC.md §8, and the reasoning behind each file is in AUDITORS.md.