Chapter 11. Simplified Payment Verification

The one block header field that we didn’t investigate much in Chapter 9 was the Merkle root. To understand what makes the Merkle root useful, we first have to learn about Merkle trees and what properties they have. In this chapter, we’re going to learn exactly what a Merkle root is. This will be motivated by something called a proof of inclusion.

Motivation

For a device that doesn’t have much disk space, bandwidth, or computing power, it’s expensive to store, receive, and validate the entire blockchain. As of this writing, the entire Bitcoin blockchain is around 200 GB, which is more than many phones can store; it can be very difficult to download efficiently and will certainly tax the CPU. If the entire blockchain cannot be put on the phone, what else can we do? Is it possible to create a Bitcoin wallet on a phone without having all the data?

For any wallet, there are two scenarios that we’re concerned with:

  1. Paying someone

  2. Getting paid by someone

If you are paying someone with your Bitcoin wallet, it is up to the person receiving your bitcoins to verify that they’ve been paid. Once they’ve verified that the transaction has been included in a block sufficiently deep, the other side of the trade, or the good or service, will be given to you. Once you’ve sent the transaction to the other party, there really isn’t anything for you to do other than wait until you receive whatever it is you’re exchanging the bitcoins for.

When getting paid bitcoins, however, we have a dilemma. If we are connected and have the full blockchain, we can easily see when the transaction is in a sufficiently deep block, at which point we give the other party our goods or services. But if we don’t have the full blockchain, as with a phone, what can we do?

The answer lies in the Merkle root field from the block header that we saw in Chapter 9. As we saw in the last chapter, we can download the block headers and verify that they meet the Bitcoin consensus rules. In this chapter we’re going to work toward getting proof that a particular transaction is in a block that we know about. Since the block header is secured by proof-of-work, a transaction with a proof of inclusion in that block means, at a minimum, there was a good deal of energy spent to produce that block. This means that the cost to deceive you would be at least the cost of the proof-of-work for the block. The rest of this chapter goes into what the proof of inclusion is and how to verify it.

Merkle Tree

A Merkle tree is a computer science structure designed for efficient proofs of inclusion. The prerequisites are an ordered list of items and a cryptographic hash function. In our case, the items in the ordered list are transactions in a block and the hash function is hash256. To construct the Merkle tree, we follow this algorithm:

  1. Hash all the items of the ordered list with the provided hash function.

  2. If there is exactly 1 hash, we are done.

  3. Otherwise, if there is an odd number of hashes, we duplicate the last hash in the list and add it to the end so that we have an even number of hashes.

  4. We pair the hashes in order and hash the concatenation to get the parent level, which should have half the number of hashes.

  5. Go to #2.

The idea is to come to a single hash that “represents” the entire ordered list. Visually, a Merkle tree looks like Figure 11-1.

The bottom row is what we call the leaves of the tree. All other nodes besides the leaves are called internal nodes. The leaves get combined to produce a parent level (HAB and HCD), and when we calculate the parent level of that, we get the Merkle root.

We’ll go through each part of this process in the following sections.

Merkle tree
Figure 11-1. Merkle tree

Be Careful with Merkle Trees!

There was a vulnerability in Bitcoin 0.4–0.6 related to the Merkle root, which is detailed in CVE-2012-2459. There was a denial-of-service vector due to the duplication of the last item in Merkle trees, which caused some nodes to invalidate blocks even if they were valid.

Merkle Parent Level

Given an ordered list of more than two hashes, we can calculate the parents of each pair, or what we call the Merkle parent level. If we have an even number of hashes, this is straightforward, as we can simply pair them up in order. If we have an odd number of hashes, then we need to do something, as we have a lone hash at the end. We can solve this by duplicating the last item.

That is, for a list like [A, B, C] we can add C again to get [A, B, C, C]. Now we can calculate the Merkle parent of A and B and calculate the Merkle parent of C and C to get:

  • [H(A||B), H(C||C)]

Since the Merkle parent always consists of two hashes, the Merkle parent level always has exactly half the number of hashes, rounded up. Here is how we calculate a Merkle parent level:

>>> from helper import merkle_parent
>>> hex_hashes = [
...     'c117ea8ec828342f4dfb0ad6bd140e03a50720ece40169ee38bdc15d9eb64cf5',
...     'c131474164b412e3406696da1ee20ab0fc9bf41c8f05fa8ceea7a08d672d7cc5',
...     'f391da6ecfeed1814efae39e7fcb3838ae0b02c02ae7d0a5848a66947c0727b0',
...     '3d238a92a94532b946c90e19c49351c763696cff3db400485b813aecb8a13181',
...     '10092f2633be5f3ce349bf9ddbde36caa3dd10dfa0ec8106bce23acbff637dae',
... ]
>>> hashes = [bytes.fromhex(x) for x in hex_hashes]
>>> if len(hashes) % 2 == 1:
...     hashes.append(hashes[-1])  1
>>> parent_level = []
>>> for i in range(0, len(hashes), 2):  2
...     parent = merkle_parent(hashes[i], hashes[i+1])
...     parent_level.append(parent)
>>> for item in parent_level:
...     print(item.hex())
8b30c5ba100f6f2e5ad1e2a742e5020491240f8eb514fe97c713c31718ad7ecd
7f4e6f9e224e20fda0ae4c44114237f97cd35aca38d83081c9bfd41feb907800
3ecf6115380c77e8aae56660f5634982ee897351ba906a6837d15ebc3a225df0
1

We add the last hash on the list, hashes[-1], to the end of hashes to make the length of hashes even.

2

This is how we skip by two in Python. i will be 0 the first time through the loop, 2 the second, 4 the third, and so on.

This code results in a new list of hashes that correspond to the Merkle parent level.

Exercise 2

Write the merkle_parent_level function.

Merkle Root

To get the Merkle root we calculate successive Merkle parent levels until we get a single hash. If, for example, we have items A through G (7 items), we calculate the Merkle parent level first as follows:

  • [H(A||B), H(C||D), H(E||F), H(G||G)]

Then we calculate the Merkle parent level again:

  • [H(H(A||B)||H(C||D)), H(H(E||F)||H(G||G))]

We are left with just two items, so we calculate the Merkle parent level one more time:

  • H(H(H(A||B)||H(C||D))||H(H(E||F)||H(G||G)))

Since we are left with exactly one hash, we are done. Each level will halve the number of hashes, so doing this process over and over will eventually result in a final single item called the Merkle root:

>>> from helper import merkle_parent_level
>>> hex_hashes = [
...     'c117ea8ec828342f4dfb0ad6bd140e03a50720ece40169ee38bdc15d9eb64cf5',
...     'c131474164b412e3406696da1ee20ab0fc9bf41c8f05fa8ceea7a08d672d7cc5',
...     'f391da6ecfeed1814efae39e7fcb3838ae0b02c02ae7d0a5848a66947c0727b0',
...     '3d238a92a94532b946c90e19c49351c763696cff3db400485b813aecb8a13181',
...     '10092f2633be5f3ce349bf9ddbde36caa3dd10dfa0ec8106bce23acbff637dae',
...     '7d37b3d54fa6a64869084bfd2e831309118b9e833610e6228adacdbd1b4ba161',
...     '8118a77e542892fe15ae3fc771a4abfd2f5d5d5997544c3487ac36b5c85170fc',
...     'dff6879848c2c9b62fe652720b8df5272093acfaa45a43cdb3696fe2466a3877',
...     'b825c0745f46ac58f7d3759e6dc535a1fec7820377f24d4c2c6ad2cc55c0cb59',
...     '95513952a04bd8992721e9b7e2937f1c04ba31e0469fbe615a78197f68f52b7c',
...     '2e6d722e5e4dbdf2447ddecc9f7dabb8e299bae921c99ad5b0184cd9eb8e5908',
...     'b13a750047bc0bdceb2473e5fe488c2596d7a7124b4e716fdd29b046ef99bbf0',
... ]
>>> hashes = [bytes.fromhex(x) for x in hex_hashes]
>>> current_hashes = hashes
>>> while len(current_hashes) > 1:  1
...     current_hashes = merkle_parent_level(current_hashes)
>>> print(current_hashes[0].hex())  2
acbcab8bcc1af95d8d563b77d24c3d19b18f1486383d75a5085c4e86c86beed6
1

We loop until there’s one hash left.

2

We’ve exited the loop so there should only be one item.

Exercise 3

Write the merkle_root function.

Merkle Root in Blocks

Calculating the Merkle root in blocks should be pretty straightforward, but due to endianness issues, it turns out to be tricky. Specifically, we use little-endian ordering for the leaves of the Merkle tree. After we calculate the Merkle root, we use little-endian ordering again.

In practice, this means reversing the leaves before we start and reversing the root at the end:

>>> from helper import merkle_root
>>> tx_hex_hashes = [
...     '42f6f52f17620653dcc909e58bb352e0bd4bd1381e2955d19c00959a22122b2e',
...     '94c3af34b9667bf787e1c6a0a009201589755d01d02fe2877cc69b929d2418d4',
...     '959428d7c48113cb9149d0566bde3d46e98cf028053c522b8fa8f735241aa953',
...     'a9f27b99d5d108dede755710d4a1ffa2c74af70b4ca71726fa57d68454e609a2',
...     '62af110031e29de1efcad103b3ad4bec7bdcf6cb9c9f4afdd586981795516577',
...     '766900590ece194667e9da2984018057512887110bf54fe0aa800157aec796ba',
...     'e8270fb475763bc8d855cfe45ed98060988c1bdcad2ffc8364f783c98999a208',
... ]
>>> tx_hashes = [bytes.fromhex(x) for x in tx_hex_hashes]
>>> hashes = [h[::-1] for h in tx_hashes]  1
>>> print(merkle_root(hashes)[::-1].hex())  2
654d6181e18e4ac4368383fdc5eead11bf138f9b7ac1e15334e4411b3c4797d9
1

We reverse each hash before we begin using a Python list comprehension.

2

We reverse the root at the end.

We want to calculate Merkle roots for a Block, so we add a tx_hashes parameter:

class Block:

    def __init__(self, version, prev_block, merkle_root,
                 timestamp, bits, nonce, tx_hashes=None):  1
        self.version = version
        self.prev_block = prev_block
        self.merkle_root = merkle_root
        self.timestamp = timestamp
        self.bits = bits
        self.nonce = nonce
        self.tx_hashes = tx_hashes
1

We now allow the transaction hashes to be set as part of the initialization of the block. The transaction hashes have to be ordered.

As a full node, if we are given all of the transactions, we can calculate the Merkle root and check that the Merkle root is what we expect.

Exercise 4

Write the validate_merkle_root method for Block.

Using a Merkle Tree

Now that we know how a Merkle tree is constructed, we can create and verify proofs of inclusion. Light nodes can get proofs that transactions of interest were included in a block without having to know all the transactions of a block (Figure 11-2).

Say that a light client has two transactions that are of interest, which would be the hashes represented by the green boxes, HK and HN in Figure 11-2. A full node can construct a proof of inclusion by sending us all of the hashes marked by blue boxes: HABCDEFGH, HIJ, HL, HM, and HOP. The light client would then perform these calculations:

  • HKL = merkle_parent(HK, HL)

  • HMN = merkle_parent(HM, HN)

  • HIJKL = merkle_parent(HIJ, HKL)

  • HMNOP = merkle_parent(HMN, HOP)

  • HIJKLMNOP = merkle_parent(HIJKL, HMNOP)

  • HABCDEFGHIJKLMNOP = merkle_parent(HABCDEFGH, HIJKLMNOP)

Merkle Proof
Figure 11-2. Merkle proof

You can see that in Figure 11-2, the dotted boxes correspond to the hashes that the light client calculates. In particular, the Merkle root is HABCDEFGHIJKLMNOP, which can then be checked against the block header whose proof-of-work has been validated.

Merkle Block

When a full node sends a proof of inclusion, there are two pieces of information that need to be included. First, the light client needs the Merkle tree structure, and second, the light client needs to know which hash is at which position in the Merkle tree. Once both pieces of information are given, the light client can reconstruct the partial Merkle tree to reconstruct the Merkle root and validate the proof of inclusion. A full node communicates these two pieces of information to a light client using a Merkle block.

To understand what’s in a Merkle block, we need to understand a bit about how a Merkle tree, or more generally, binary trees, can be traversed. In a binary tree, nodes can be traversed breadth-first or depth-first. Breadth-first traversal would go level by level like in Figure 11-3.

Breadth First
Figure 11-3. Breadth-first ordering

The breadth-first ordering starts at the root and goes from root to leaves, level by level, left to right.

Depth-first ordering is a bit different and looks like Figure 11-4.

Depth First
Figure 11-4. Depth-first ordering

The depth-first ordering starts at the root and traverses the left side at each node before the right side.

In a proof of inclusion (see Figure 11-5), the full node sends the green boxes, HK and HN, along with the blue boxes, HABCDEFGH, HIJ, HL, HM and HOP. The location of each hash is reconstructed using depth-first ordering from some flags. The process of reconstructing the tree, namely the dotted-edged boxes in Figure 11-5, is described next.

Merkle proof
Figure 11-5. Merkle proof

Merkle Tree Structure

The first thing a light client does is create the general structure of the Merkle tree. Because Merkle trees are built from the leaves upward, the only thing a light client needs is the number of leaves that exist to know the structure. The tree from Figure 11-5 has 16 leaves. A light client can create the empty Merkle tree like so:

>>> import math
>>> total = 16
>>> max_depth = math.ceil(math.log(total, 2))  1
>>> merkle_tree = []  2
>>> for depth in range(max_depth + 1):  3
...     num_items = math.ceil(total / 2**(max_depth - depth))  4
...     level_hashes = [None] * num_items  5
...     merkle_tree.append(level_hashes)  6
>>> for level in merkle_tree:
...     print(level)
[None]
[None, None]
[None, None, None, None]
[None, None, None, None, None, None, None, None]
[None, None, None, None, None, None, None, None, None, None, None, None, None,\
 None, None, None]
1

Since we halve at every level, log2 of the number of leaves is how many levels there are in the Merkle tree. Note we round up using math.ceil as we round up for halving at each level. We could also be clever and use len(bin(total))-2.

2

The Merkle tree will hold the root level at index 0, the level below at index 1, and so on. In other words, the index is the “depth” from the top.

3

There are levels 0 to max_depth in this Merkle tree.

4

At any particular level, the number of nodes is the number of total leaves divided by 2 for every level above the leaf level.

5

We don’t know yet what any of the hashes are, so we set them to None.

6

Note merkle_tree is a list of lists of hashes, or a two-dimensional array.

Exercise 5

Create an empty Merkle Tree with 27 items and print each level.

Coding a Merkle Tree

We can now create a MerkleTree class:

class MerkleTree:

    def __init__(self, total):
        self.total = total
        self.max_depth = math.ceil(math.log(self.total, 2))
        self.nodes = []
        for depth in range(self.max_depth + 1):
            num_items = math.ceil(self.total / 2**(self.max_depth - depth))
            level_hashes = [None] * num_items
            self.nodes.append(level_hashes)
        self.current_depth = 0  1
        self.current_index = 0

    def __repr__(self):  2
        result = []
        for depth, level in enumerate(self.nodes):
            items = []
            for index, h in enumerate(level):
                if h is None:
                    short = 'None'
                else:
                    short = '{}...'.format(h.hex()[:8])
                if depth == self.current_depth and index == self.current_index:
                    items.append('*{}*'.format(short[:-2]))
                else:
                    items.append('{}'.format(short))
            result.append(', '.join(items))
        return '\n'.join(result)
1

We keep a pointer to a particular node in the tree, which will come in handy later.

2

We print a representation of the tree.

Now that we have an empty tree, we can go about filling it to calculate the Merkle root. If we had every leaf hash, getting the Merkle root would look like this:

>>> from merkleblock import MerkleTree
>>> from helper import merkle_parent_level
>>> hex_hashes = [
...     "9745f7173ef14ee4155722d1cbf13304339fd00d900b759c6f9d58579b5765fb",
...     "5573c8ede34936c29cdfdfe743f7f5fdfbd4f54ba0705259e62f39917065cb9b",
...     "82a02ecbb6623b4274dfcab82b336dc017a27136e08521091e443e62582e8f05",
...     "507ccae5ed9b340363a0e6d765af148be9cb1c8766ccc922f83e4ae681658308",
...     "a7a4aec28e7162e1e9ef33dfa30f0bc0526e6cf4b11a576f6c5de58593898330",
...     "bb6267664bd833fd9fc82582853ab144fece26b7a8a5bf328f8a059445b59add",
...     "ea6d7ac1ee77fbacee58fc717b990c4fcccf1b19af43103c090f601677fd8836",
...     "457743861de496c429912558a106b810b0507975a49773228aa788df40730d41",
...     "7688029288efc9e9a0011c960a6ed9e5466581abf3e3a6c26ee317461add619a",
...     "b1ae7f15836cb2286cdd4e2c37bf9bb7da0a2846d06867a429f654b2e7f383c9",
...     "9b74f89fa3f93e71ff2c241f32945d877281a6a50a6bf94adac002980aafe5ab",
...     "b3a92b5b255019bdaf754875633c2de9fec2ab03e6b8ce669d07cb5b18804638",
...     "b5c0b915312b9bdaedd2b86aa2d0f8feffc73a2d37668fd9010179261e25e263",
...     "c9d52c5cb1e557b92c84c52e7c4bfbce859408bedffc8a5560fd6e35e10b8800",
...     "c555bc5fc3bc096df0a0c9532f07640bfb76bfe4fc1ace214b8b228a1297a4c2",
...     "f9dbfafc3af3400954975da24eb325e326960a25b87fffe23eef3e7ed2fb610e",
... ]
>>> tree = MerkleTree(len(hex_hashes))
>>> tree.nodes[4] = [bytes.fromhex(h) for h in hex_hashes]
>>> tree.nodes[3] = merkle_parent_level(tree.nodes[4])
>>> tree.nodes[2] = merkle_parent_level(tree.nodes[3])
>>> tree.nodes[1] = merkle_parent_level(tree.nodes[2])
>>> tree.nodes[0] = merkle_parent_level(tree.nodes[1])
>>> print(tree)
*597c4baf.*
6382df3f..., 87cf8fa3...
3ba6c080..., 8e894862..., 7ab01bb6..., 3df760ac...
272945ec..., 9a38d037..., 4a64abd9..., ec7c95e1..., 3b67006c..., 850683df..., \
d40d268b..., 8636b7a3...
9745f717..., 5573c8ed..., 82a02ecb..., 507ccae5..., a7a4aec2..., bb626766..., \
ea6d7ac1..., 45774386..., 76880292..., b1ae7f15..., 9b74f89f..., b3a92b5b..., \
b5c0b915..., c9d52c5c..., c555bc5f..., f9dbfafc...

This fills the tree and gets us the Merkle root. However, the message from the network may not be giving us all of the leaves. The message might contain some internal nodes as well. We need a cleverer way to fill the tree.

Tree traversal is going to be the way we do this. We can do a depth-first traversal and only fill in the nodes that we can calculate. To traverse, we need to keep track of where exactly in the tree we are. The properties self.current_depth and self.current_index do this.

We need methods to traverse the Merkle tree. We’ll also include other useful methods:

class MerkleTree:
...
    def up(self):
        self.current_depth -= 1
        self.current_index //= 2

    def left(self):
        self.current_depth += 1
        self.current_index *= 2

    def right(self):
        self.current_depth += 1
        self.current_index = self.current_index * 2 + 1

    def root(self):
        return self.nodes[0][0]

    def set_current_node(self, value):  1
        self.nodes[self.current_depth][self.current_index] = value

    def get_current_node(self):
        return self.nodes[self.current_depth][self.current_index]

    def get_left_node(self):
        return self.nodes[self.current_depth + 1][self.current_index * 2]

    def get_right_node(self):
        return self.nodes[self.current_depth + 1][self.current_index * 2 + 1]

    def is_leaf(self):  2
        return self.current_depth == self.max_depth

    def right_exists(self):  3
        return len(self.nodes[self.current_depth + 1]) > \
            self.current_index * 2 + 1
1

We want the ability to set the current node in the tree to some value.

2

We want to know if we are a leaf node.

3

In certain situations, we won’t have a right child because we may be at the furthest-right node of a level whose child level has an odd number of items.

We have Merkle tree traversal methods left, right, and up. Let’s use these methods to populate the tree via depth-first traversal:

>>> from merkleblock import MerkleTree
>>> from helper import merkle_parent
>>> hex_hashes = [
...     "9745f7173ef14ee4155722d1cbf13304339fd00d900b759c6f9d58579b5765fb",
...     "5573c8ede34936c29cdfdfe743f7f5fdfbd4f54ba0705259e62f39917065cb9b",
...     "82a02ecbb6623b4274dfcab82b336dc017a27136e08521091e443e62582e8f05",
...     "507ccae5ed9b340363a0e6d765af148be9cb1c8766ccc922f83e4ae681658308",
...     "a7a4aec28e7162e1e9ef33dfa30f0bc0526e6cf4b11a576f6c5de58593898330",
...     "bb6267664bd833fd9fc82582853ab144fece26b7a8a5bf328f8a059445b59add",
...     "ea6d7ac1ee77fbacee58fc717b990c4fcccf1b19af43103c090f601677fd8836",
...     "457743861de496c429912558a106b810b0507975a49773228aa788df40730d41",
...     "7688029288efc9e9a0011c960a6ed9e5466581abf3e3a6c26ee317461add619a",
...     "b1ae7f15836cb2286cdd4e2c37bf9bb7da0a2846d06867a429f654b2e7f383c9",
...     "9b74f89fa3f93e71ff2c241f32945d877281a6a50a6bf94adac002980aafe5ab",
...     "b3a92b5b255019bdaf754875633c2de9fec2ab03e6b8ce669d07cb5b18804638",
...     "b5c0b915312b9bdaedd2b86aa2d0f8feffc73a2d37668fd9010179261e25e263",
...     "c9d52c5cb1e557b92c84c52e7c4bfbce859408bedffc8a5560fd6e35e10b8800",
...     "c555bc5fc3bc096df0a0c9532f07640bfb76bfe4fc1ace214b8b228a1297a4c2",
...     "f9dbfafc3af3400954975da24eb325e326960a25b87fffe23eef3e7ed2fb610e",
... ]
>>> tree = MerkleTree(len(hex_hashes))
>>> tree.nodes[4] = [bytes.fromhex(h) for h in hex_hashes]
>>> while tree.root() is None:  1
...     if tree.is_leaf():  2
...         tree.up()
...     else:
...         left_hash = tree.get_left_node()
...         right_hash = tree.get_right_node()
...         if left_hash is None:  3
...             tree.left()
...         elif right_hash is None:  4
...             tree.right()
...         else:  5
...             tree.set_current_node(merkle_parent(left_hash, right_hash))
...             tree.up()
>>> print(tree)
597c4baf...
6382df3f..., 87cf8fa3...
3ba6c080..., 8e894862..., 7ab01bb6..., 3df760ac...
272945ec..., 9a38d037..., 4a64abd9..., ec7c95e1..., 3b67006c..., 850683df..., \
d40d268b..., 8636b7a3...
9745f717..., 5573c8ed..., 82a02ecb..., 507ccae5..., a7a4aec2..., bb626766..., \
ea6d7ac1..., 45774386..., 76880292..., b1ae7f15..., 9b74f89f..., b3a92b5b..., \
b5c0b915..., c9d52c5c..., c555bc5f..., f9dbfafc...
1

We traverse until we calculate the Merkle root. Each time through the loop, we are at a particular node.

2

If we are at a leaf node, we already have that hash, so we don’t need to do anything but go back up.

3

If we don’t have the left hash, then we calculate the value first before calculating the current node’s hash.

4

If we don’t have the right hash, we calculate the value before calculating the current node’s hash. Note that we already have the left one due to the depth-first traversal.

5

We have both the left and the right hash, so we calculate the Merkle parent value and set that to the current node. Once set, we can go back up.

This code will only work when the number of leaves is a power of two, as edge cases where there are an odd number of nodes on a level are not handled.

We handle the case where the parent is the parent of the rightmost node on a level with an odd number of nodes:

>>> from merkleblock import MerkleTree
>>> from helper import merkle_parent
>>> hex_hashes = [
...     "9745f7173ef14ee4155722d1cbf13304339fd00d900b759c6f9d58579b5765fb",
...     "5573c8ede34936c29cdfdfe743f7f5fdfbd4f54ba0705259e62f39917065cb9b",
...     "82a02ecbb6623b4274dfcab82b336dc017a27136e08521091e443e62582e8f05",
...     "507ccae5ed9b340363a0e6d765af148be9cb1c8766ccc922f83e4ae681658308",
...     "a7a4aec28e7162e1e9ef33dfa30f0bc0526e6cf4b11a576f6c5de58593898330",
...     "bb6267664bd833fd9fc82582853ab144fece26b7a8a5bf328f8a059445b59add",
...     "ea6d7ac1ee77fbacee58fc717b990c4fcccf1b19af43103c090f601677fd8836",
...     "457743861de496c429912558a106b810b0507975a49773228aa788df40730d41",
...     "7688029288efc9e9a0011c960a6ed9e5466581abf3e3a6c26ee317461add619a",
...     "b1ae7f15836cb2286cdd4e2c37bf9bb7da0a2846d06867a429f654b2e7f383c9",
...     "9b74f89fa3f93e71ff2c241f32945d877281a6a50a6bf94adac002980aafe5ab",
...     "b3a92b5b255019bdaf754875633c2de9fec2ab03e6b8ce669d07cb5b18804638",
...     "b5c0b915312b9bdaedd2b86aa2d0f8feffc73a2d37668fd9010179261e25e263",
...     "c9d52c5cb1e557b92c84c52e7c4bfbce859408bedffc8a5560fd6e35e10b8800",
...     "c555bc5fc3bc096df0a0c9532f07640bfb76bfe4fc1ace214b8b228a1297a4c2",
...     "f9dbfafc3af3400954975da24eb325e326960a25b87fffe23eef3e7ed2fb610e",
...     "38faf8c811988dff0a7e6080b1771c97bcc0801c64d9068cffb85e6e7aacaf51",
... ]
>>> tree = MerkleTree(len(hex_hashes))
>>> tree.nodes[5] = [bytes.fromhex(h) for h in hex_hashes]
>>> while tree.root() is None:
...     if tree.is_leaf():
...         tree.up()
...     else:
...         left_hash = tree.get_left_node()
...         if left_hash is None:  1
...             tree.left()
...         elif tree.right_exists():  2
...             right_hash = tree.get_right_node()
...             if right_hash is None:  3
...                 tree.right()
...             else:  4
...                 tree.set_current_node(merkle_parent(left_hash, right_hash))
...                 tree.up()
...         else:  5
...             tree.set_current_node(merkle_parent(left_hash, left_hash))
...             tree.up()
>>> print(tree)
0a313864...
597c4baf..., 6f8a8190...
6382df3f..., 87cf8fa3..., 5647f416...
3ba6c080..., 8e894862..., 7ab01bb6..., 3df760ac..., 28e93b98...
272945ec..., 9a38d037..., 4a64abd9..., ec7c95e1..., 3b67006c..., 850683df..., \
d40d268b..., 8636b7a3..., ce26d40b...
9745f717..., 5573c8ed..., 82a02ecb..., 507ccae5..., a7a4aec2..., bb626766..., \
ea6d7ac1..., 45774386..., 76880292..., b1ae7f15..., 9b74f89f..., b3a92b5b..., \
b5c0b915..., c9d52c5c..., c555bc5f..., f9dbfafc..., 38faf8c8...
1

If we don’t have the left node’s value, we traverse to the left node, since all internal nodes are guaranteed a left child.

2

We check first if this node has a right child. This is true unless this node happens to be the rightmost node and the child level has an odd number of nodes.

3

If we don’t have the right node’s value, we traverse to that node.

4

If we have both the left and the right node’s values, we calculate the current node’s value using merkle_parent.

5

We have the left node’s value, but the right child doesn’t exist. This is the rightmost node of this level, so we combine the left value twice.

We can now traverse the tree for the number of leaves that aren’t powers of two.

The merkleblock Command

The full node communicating a Merkle block sends all the information needed to verify that the interesting transaction is in the Merkle tree. The merkleblock network command is what communicates this information; it looks like Figure 11-6.

merkleblock command
Figure 11-6. Parsed merkleblock

The first six fields are exactly the same as the block header from Chapter 9. The last four fields are the proof of inclusion.

The number of transactions field indicates how many leaves this particular Merkle tree will have. This allows a light client to construct an empty Merkle tree. The hashes field holds the blue and green boxes from Figure 11-5. Since the number of hashes in the hashes field is not fixed, it’s prefixed with how many there are. Last, the flags field gives information about where the hashes go within the Merkle tree. The flags are parsed using bytes_to_bits_field to convert them to a list of bits (1’s and 0’s):

def bytes_to_bit_field(some_bytes):
    flag_bits = []
    for byte in some_bytes:
        for _ in range(8):
            flag_bits.append(byte & 1)
            byte >>= 1
    return flag_bits

The ordering for the bytes is a bit strange, but it’s meant to be easy to convert into the flag bits needed to reconstruct the Merkle root.

Exercise 6

Write the parse method for MerkleBlock.

Using Flag Bits and Hashes

The flag bits inform where the hashes go using depth-first ordering.

The rules for the flag bits are:

  1. If the node’s value is given in the hashes field (blue box in Figure 11-7), the flag bit is 0.

  2. If the node is an internal node and the value is to be calculated by the light client (dotted outline in Figure 11-7), the flag bit is 1.

  3. If the node is a leaf node and is a transaction of interest (green box in Figure 11-7), the flag is 1 and the node’s value is also given in the hashes field. These are the items proven to be included in the Merkle tree.

Merkle Blocks and Hashes
Figure 11-7. Processing a Merkle block

Given the tree from Figure 11-7:

  • The flag bit is 1 for the root node (1), since that hash is calculated by the light client.

  • The left child, HABCDEFGH (2), is included in the hashes field, so the flag is 0.

  • From here, we traverse to HIJKLMNOP (3) instead of HABCD or HEFGH since HABCDEFGH represents both those nodes and we don’t need them.

  • The right child, HIJKLMNOP, is also calculated, so it has a flag bit of 1.

  • To calculate HIJKLMNOP, we need the values for HIJKL and HMNOP (9). The next node in depth-first order is the left child, HIJKL (4), which is where we traverse to next.

  • HIJKL is an internal node that’s calculated, so the flag bit is 1.

  • From here, we traverse to its left child, HIJ (5). We will be traversing to HKL (6) when we come back to this node.

  • HIJ is next in depth-first ordering; its hash is included in the hashes list and the flag is 0.

  • HKL is an internal, calculated node, so the flag is 1.

  • HK (7) is a leaf node whose presence in the block is being proved, so the flag is 1.

  • HL (8) is a node whose value is included in the hashes field, so the flag is 0.

  • We traverse up to HKL, whose value can now be calculated since HK and HL are known.

  • We traverse up to HIJKL, whose value can now be calculated since HIJ and HKL are known.

  • We traverse up to HIJKLMNOP, whose value we can’t calculate yet since we haven’t been to HMNOP.

  • We traverse to HMNOP, which is another internal node, so the flag is 1.

  • HMN (10) is another internal node that’s calculated, so the flag is 1.

  • HM (11) is a node whose value is included in the hashes field, so the flag is 0.

  • HN (12) is of interest, so the flag is 1 and its value is in the hashes field.

  • We traverse up to HMN, whose value can now be calculated.

  • We traverse up again to HMNOP, whose value cannot be calculated because we haven’t been to HOP yet.

  • HOP (13) is given, so the flag is 1 and its hash is the final hash in the hashes field.

  • We traverse to HMNOP, which can now be calculated.

  • We traverse to HIJKLMNOP, which can now be calculated.

  • Finally, we traverse to HABCDEFGHIJKLMNOP, which is the Merkle root, and calculate it!

The flag bits for nodes (1) through (13) are:

1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0

There should be seven hashes in the hashes field, in this order:

  • HABCDEFGH

  • HIJ

  • HK

  • HL

  • HM

  • HN

  • HOP

Notice that every letter is represented in the hashes, A to P. This information is sufficient to prove that HK and HN (the green boxes in Figure 11-7) are included in the block.

As you can see from Figure 11-7, the flag bits are given in depth-first order. Anytime we’re given a hash, as with HABCDEFGH, we skip its children and continue. In the case of HABCDEFGH, we traverse to HIJKLMNOP instead of HABCD. Flag bits are a clever mechanism to encode which nodes have which hash value.

We can now populate the Merkle tree and calculate the root, given the appropriate flag bits and hashes:

class MerkleTree:
...
    def populate_tree(self, flag_bits, hashes):
        while self.root() is None:  1
            if self.is_leaf():  2
                flag_bits.pop(0)  3
                self.set_current_node(hashes.pop(0))  4
                self.up()
            else:
                left_hash = self.get_left_node()
                if left_hash is None:  5
                    if flag_bits.pop(0) == 0:  6
                        self.set_current_node(hashes.pop(0))
                        self.up()
                    else:
                        self.left()  7
                elif self.right_exists():  8
                    right_hash = self.get_right_node()
                    if right_hash is None:  9
                        self.right()
                    else:  10
                        self.set_current_node(merkle_parent(left_hash,
                        right_hash))
                        self.up()
                else:  11
                    self.set_current_node(merkle_parent(left_hash, left_hash))
                    self.up()
        if len(hashes) != 0:  12
            raise RuntimeError('hashes not all consumed {}'.format(len(hashes)))
        for flag_bit in flag_bits:  13
            if flag_bit != 0:
                raise RuntimeError('flag bits not all consumed')
1

The point of populating this Merkle tree is to calculate the root. Each loop iteration processes one node until the root is calculated.

2

For leaf nodes, we are always given the hash.

3

flag_bits.pop(0) is a way in Python to dequeue the next flag bit. We may want to keep track of which hashes are of interest to us by looking at the flag bit, but for now, we don’t do this.

4

hashes.pop(0) is how we get the next hash from the hashes field. We need to set the current node to that hash.

5

If we don’t have the left child value, there are two possibilities. This node’s value may be in the hashes field, or it might need calculation.

6

The next flag bit tells us whether we need to calculate this node or not. If the flag bit is 0, the next hash in the hashes field is this node’s value. If the flag bit is 1, we need to calculate the left (and possibly the right) node’s value.

7

We are guaranteed that there’s a left child, so we traverse to that node and get its value.

8

We check that the right node exists.

9

We have the left hash, but not the right. We traverse to the right node to get its value.

10

We have both the left and the right node’s values, so we calculate their Merkle parent to get the current node’s value.

11

We have the left node’s value, but the right does not exist. In this case, according to Merkle tree rules, we calculate the Merkle parent of the left node twice.

12

All hashes must be consumed or we got bad data.

13

All flag bits must be consumed or we got bad data.

Exercise 7

Write the is_valid method for MerkleBlock.