Skip to content

Latest commit

 

History

History
365 lines (296 loc) · 14.7 KB

File metadata and controls

365 lines (296 loc) · 14.7 KB

Codelab

Throughout this codelab, you'll create a Tiled tree.

The Tiled tree will be stored on disk using the layout described in the layout directory. Its checkpoint uses the checkpoint format.

Prelimiary setup

The command-line tools we'll use from this repository can generate tile based logs from leaf data stored on your file system. Each file will correspond to a single leaf in the tree.

Before we start, let's define a few environment variables:

export DATA_DIR="/tmp/myfiles"  # where we'll store input data for the tree
export LOG_DIR="/tmp/mylog"  # where the tree will be stored
export LOG_ORIGIN="My Log"  # the origin of the log used by the Checkpoint format

Checkpoints of the log will be signed, and we need a public/private key pair for this.

Use the generate_keys command with --key_name, a name for the signing entity. You can output the public and private keys to files using --out_pub path and filename for the public key, --out_priv path and filename for the private key and stdout, private key, then public key, over 2 lines, using --print

go run ./cmd/generate_keys --key_name=astra --out_pub=key.pub --out_priv=key

Creating a new log

To create a new log state directory, use the integrate command with the --initialise flag, and either passing key files or with environment variables set:

go run ./cmd/integrate --initialise --storage_dir="${LOG_DIR}" --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"

After running this command, the log state directory looks like this:

$ tree /tmp/mylog/
/tmp/mylog/
├── checkpoint
├── leaves
│   └── pending
├── seq
└── tile

5 directories, 1 file
  • checkpoint contains the latest log checkpoint in the format described here.
  • seq/ contains a directory hierarchy containing leaf data for each sequenced entry in the log.
  • leaves/ contains files which map all known leaf hashes to their position in the log.
  • tile/ contains the internal nodes of the log tree.

See the layout documentation for more details about each directory.

Let's look at the checkpoint content:

$ cat /tmp/mylog/checkpoint
My Log
0
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=

— astra PlUh/n54e2dSIKi6kHjea5emrGnmC7lJVDgnIfWGIJmgFqp22k0UlnUk97L2ViqrFm986NwV+wJYGnrtRPJTBV0GrA0=
  • My Log is the origin that we defined above
  • 0 is the number of leaves in the tree, which currently is 0
  • 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= is the hash of an empty slice of bytes, since the log is empty.
  • The last line is a signature over this data, using the astra private key we've generated above

Creating log content

Now let's add some leaves to the log.

First, we generate the input data with:

$ mkdir $DATA_DIR
$ for i in $(seq 0 3); do x=$(printf "%03d" $i); echo "leaf_data_$x" > $DATA_DIR/leaf_$x; done;

To add the contents of these files to the log, use the sequence command with the --entries flag set to a filename glob of files to add:

$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*"  --public_key=key.pub --origin="${LOG_ORIGIN}"
I1221 13:16:23.940255  923589 main.go:131] 0: /tmp/myfiles/leaf_000
I1221 13:16:23.940806  923589 main.go:131] 1: /tmp/myfiles/leaf_001
I1221 13:16:23.941218  923589 main.go:131] 2: /tmp/myfiles/leaf_002
I1221 13:16:23.941673  923589 main.go:131] 3: /tmp/myfiles/leaf_003

The sequence commands assigns an index to each leaf, and stores data in the log directory using convenient formats.

Here is what the directory looks like:

$ grep -RH '^' /tmp/mylog/
/tmp/mylog/checkpoint:My Log
/tmp/mylog/checkpoint:0
/tmp/mylog/checkpoint:47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
/tmp/mylog/checkpoint:
/tmp/mylog/checkpoint:— astra h5lA3N6MJnmnD1dPLqxeoWbbPAc0XPKuqomvSZPiVNLkdJmPDvF+7BkMIr4KBynVgo/ipGbNijHxdbvTZ4zKVXbyLwU=
/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2
/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1
/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0
/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3
/tmp/mylog/seq/00/00/00/00/02:leaf_data_002
/tmp/mylog/seq/00/00/00/00/00:leaf_data_000
/tmp/mylog/seq/00/00/00/00/01:leaf_data_001
/tmp/mylog/seq/00/00/00/00/03:leaf_data_003

The seq directory contains the leaves data, in files named after each leaf's index.

The leaves stores the leaf index of each leaf, in a file named after the leaf hash. Let's take the leaf at index 0, which conveniently happens to contain leaf_data_000. This tree uses RFC6962's hashing function, where leaf_hash = sha256(0x + leaf_data).

8592d6f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d, the path for the leaf at index 0 with forward slashes removed, is the hexadecimal representation of this hash.

Note that at this point, no internal node of the tree has been computed, and neither has the checkpoint been updated. Leaves have only been assigned with a position in the log.

Attempting to re-sequence the same file contents will result in the sequence tool telling you that you're trying to add duplicate entries, along with their originally assigned sequence numbers:

$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*"  --public_key=key.pub --origin="${LOG_ORIGIN}"
I1221 13:18:59.735244  924268 main.go:131] 0: /tmp/myfiles/leaf_000 (dupe)
I1221 13:18:59.735362  924268 main.go:131] 1: /tmp/myfiles/leaf_001 (dupe)
I1221 13:18:59.735406  924268 main.go:131] 2: /tmp/myfiles/leaf_002 (dupe)
I1221 13:18:59.735447  924268 main.go:131] 3: /tmp/myfiles/leaf_003 (dupe)

Integrating sequenced entries

We still need to update the rest of the tree structure to integrate these new entries, generate the other nodes of the tree, and compute its new checkpoint. We use the integrate tool for that:

$ go run ./cmd/integrate  --storage_dir="${LOG_DIR}"  --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
I1221 13:19:20.190193  924589 integrate.go:94] Loaded state with roothash
I1221 13:19:20.190432  924589 integrate.go:132] New log state: size 0x4 hash: 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f

This output says that the integration was successful, and we now have a new log tree state which contains 4 entries, and has the printed log root hash.

Let's look at the contents of the tree directory again:

$ grep -RH '^' /tmp/mylog/
/tmp/mylog/tile/00/0000/00/00/00.04:32
/tmp/mylog/tile/00/0000/00/00/00.04:4
/tmp/mylog/tile/00/0000/00/00/00.04:hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0=
/tmp/mylog/tile/00/0000/00/00/00.04:McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0=
/tmp/mylog/tile/00/0000/00/00/00.04:uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0=
/tmp/mylog/tile/00/0000/00/00/00.04:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8=
/tmp/mylog/tile/00/0000/00/00/00.04:bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo=
/tmp/mylog/tile/00/0000/00/00/00.04:jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4=
/tmp/mylog/tile/00/0000/00/00/00.04:4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU=
/tmp/mylog/checkpoint:My Log
/tmp/mylog/checkpoint:4
/tmp/mylog/checkpoint:DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8=
/tmp/mylog/checkpoint:
/tmp/mylog/checkpoint:— astra h5lA3GOB547TCfoNMEXxENGJVWmpG6Ynk8C6Oaef5gaFotSVLX9isWdvjnhBek94Is9yVPzIvjQTADF/dk2MhHXiCAY=
/tmp/mylog/leaves/6c/b0/b1/a3c33114cec1d940b9a6c48b55fb2c73f6efcfd53aeef2644681c9b70a:2
/tmp/mylog/leaves/b8/71/4f/045c7d5d0201b06004e6939d944a981605c5fcfa5d3353a3084303d4ad:1
/tmp/mylog/leaves/85/92/d6/f366d9d1297f44034d649b68afcee74050aa7a55c769130b2f07ecc65d:0
/tmp/mylog/leaves/e0/7c/75/881e1ec1bcad5e45c5cc3d8e2c83cda817a48324514309267ee32ef115:3
/tmp/mylog/seq/00/00/00/00/02:leaf_data_002
/tmp/mylog/seq/00/00/00/00/00:leaf_data_000
/tmp/mylog/seq/00/00/00/00/01:leaf_data_001
/tmp/mylog/seq/00/00/00/00/03:leaf_data_003

The tile directory has been populated with a file, and the checkpoint has been updated. The leaves/ and seq/ directories have not changed.

Each tile can store a maximum of 256 leaf hashes. Since we only have 4 leaves for now, hashes fit in a single file. Given it is the first tile of the tree, its path is 00/0000/00/00/00. Until the tile is filed with 256 leaves, the tile is "partial", that's what the 00.04 notation means: tile 00/0000/00/00/00.04 is the partial 00/0000/00/00/00 tile with 4 leaf hashes.

Let's look at each line of this tile file:

  • 32 that's the number of bytes used for hashes
  • 4 the number of leaf hashes in this tile
  • the remaining lines are a series of hashes representing the node hashes of the tile: both the leaf hashes, and internal node hashes

Here is what a merkle tree with 4 leaves looks like:

              b
             / \
            /   \
           /     \
          a       c
         / \     / \
       h0  h1   h2  h3
       |   |    |   |
       0   1    2   3

In the tile file, leaves and internal node hashes are stored in the infix tree-traversal order.

$ cat /tmp/mylog/tile/00/0000/00/00/00.04
32
4
hZLW82bZ0Sl/RANNZJtor87nQFCqelXHaRMLLwfsxl0= <-- h0 = sha256(0x0 + leaf_data_000)
McF1R3nScwEJFHQpESACDl9SOdg9uTRLVZaDHzLckI0= <-- a = sha256(0x1 + h0 + h1)
uHFPBFx9XQIBsGAE5pOdlEqYFgXF/PpdM1OjCEMD1K0= <-- h1 = sha256(0x0 + leaf_data_001)
DC5xrAVNktWLDv0wE9DfI1JFMx8MDoKLq2Ko/mJGDH8= <-- b = sha256(0x1 + a + c)
bLCxo8MxFM7B2UC5psSLVfssc/bvz9U67vJkRoHJtwo= <-- h2 = sha256(0x0 + leaf_data_002)
jNfnGF6uHUDupKFIaPW/QjZnPkINVKkVYc7cBakvPy4= <-- c = sha(0x1 + h2 + h3)
4Hx1iB4ewbytXkXFzD2OLIPNqBekgyRRQwkmfuMu8RU= <-- h3 = sha256(0x0 + leaf_data_003)

Adding one more leaf

Let's add one more leaf to our tree.

$ echo "leaf_data_004" > $DATA_DIR/leaf_004

$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/leaf_004"  --public_key=key.pub --origin="${LOG_ORIGIN}"
I1221 13:23:43.956356  926120 main.go:131] 4: /tmp/myfiles/leaf_004

$ go run ./cmd/integrate  --storage_dir="${LOG_DIR}"  --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
I1221 13:24:11.168864  926446 integrate.go:94] Loaded state with roothash 0c2e71ac054d92d58b0efd3013d0df235245331f0c0e828bab62a8fe62460c7f
I1221 13:24:11.169036  926446 integrate.go:132] New log state: size 0x5 hash: 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496

This adds matching files in seq, leaves, and updates the checkpoint, as expected. A new tile is availble under 00/0000/00/00/00/00.05:

$ tree /tmp/mylog/tile
└── 00
    └── 0000
        └── 00
            └── 00
                ├── 00.04
                └── 00.05

5 directories, 2 files

Notice that the old tile file, 00.04 has not been deleted.

Here's the diff between the two tiles:

$ diff /tmp/mylog/tile/00/0000/00/00/00.04 /tmp/mylog/tile/00/0000/00/00/00.05
2c2
< 4
---
> 5
9a10,11
>
> 6KUzDe4gX/0rZTZCgfgBtaIGOBkOQz4duxjTT+NeM5w=

The number of leaves 4 has been updated to 5, and a new leaf node hash has appeared. Note that even though the tree has changed shape to include this new leaf, no internal node was added to the tile. That's because tiles only store non-emphemeral node, and in this case, all the new internal nodes are ephemeral (marked with a prime symbol): they will change when new leaves are added to the tree.

                      f'
                     / \
                    /   \
                   /     \
                  /       \
                 /         \
                /           \
               /             \
              b               e'
             / \             / \
            /   \           /   \
           /     \         /     \
          a       c       d'      X
         / \     / \     / \
       h0  h1   h2  h3  h4  X
       |   |    |   |   |
       0   1    2   3   4

Filling up the tile

Now, let's fill up the tile, with the maximum number of leaves it can hold: 256.

$ for i in $(seq 5 255); do x=$(printf "%03d" $i); echo "leaf_data_$x" > $DATA_DIR/leaf_$x; done;

$ go run ./cmd/sequence --storage_dir="${LOG_DIR}" --entries "${DATA_DIR}/*"  --public_key=key.pub --origin="${LOG_ORIGIN}"
I1221 13:26:19.752225  927458 main.go:131] 0: /tmp/myfiles/leaf_000 (dupe)
I1221 13:26:19.752350  927458 main.go:131] 1: /tmp/myfiles/leaf_001 (dupe)
I1221 13:26:19.752398  927458 main.go:131] 2: /tmp/myfiles/leaf_002 (dupe)
I1221 13:26:19.752442  927458 main.go:131] 3: /tmp/myfiles/leaf_003 (dupe)
I1221 13:26:19.752499  927458 main.go:131] 4: /tmp/myfiles/leaf_004 (dupe)
I1221 13:26:19.752859  927458 main.go:131] 5: /tmp/myfiles/leaf_005
I1221 13:26:19.753301  927458 main.go:131] 6: /tmp/myfiles/leaf_006
...

$ go run ./cmd/integrate  --storage_dir="${LOG_DIR}"  --public_key=key.pub --private_key=key --origin="${LOG_ORIGIN}"
I1221 13:26:22.243568  927696 integrate.go:94] Loaded state with roothash 1b26238e581181883c3f51827c58fe9c9e8a4d39383cbbabaabe0662b3c11496
I1221 13:26:22.250694  927696 integrate.go:132] New log state: size 0x100 hash: dc0d01251026e7138412adf1009ef9ed0fc55e2b9a954438b5762deb8e8519c5

You can check that the seq and leaves have been updated with new entries, and so has the checkpoint.

The tile directory now looks like this:

$ tree /tmp/mylog/tile
/tmp/mylog/tile
├── 00
│   └── 0000
│       └── 00
│           └── 00
│               ├── 00
│               ├── 00.04 -> /tmp/mylog/tile/00/0000/00/00/00
│               └── 00.05 -> /tmp/mylog/tile/00/0000/00/00/00
└── 01
    └── 0000
        └── 00
            └── 00
                └── 00.01

9 directories, 4 files

Since the 00/0000/00/00/00 tile is now full, its partial versions have been deleted, and now point to the full tile.

A new tile has also appeared, one stratum above: 01/0000/00/00/00.01. It contains a single node, which is the current root node of the tree. To avoid storing duplicate hashes, this top level node of the 00/0000/00/00/00 tile has been stripped, and you'll find an empty line in this file:

$ cat /tmp/mylog/tile/00/0000/00/00/00
...
ZkeKg5PJFHO3e+TRuTVf4QL7tk9C9NCBkR82ipcsUxw=
iTG/pTVoZUjBJTfXcdNv2oJjxLQRKUqMOC6zVZoBznk=
R0G/vzOBrC0IdaP092TEzFn4ksrZB77kIlcAK11J7aw=

SIeXDZcyctFVLLjX3BqTs4SirwpzCezE6yZRq9OIKHw=
O876VfSKWrJ5MOQrmnO0jVgqs+vonzE/iC1t681gnAA=
YDrvejyQgwwCB0u+vwiVml4eRbc5CSaJ0rWsieOtRb4=
...