Chapter 9. Blocks

Transactions transfer bitcoins from one party to another and are unlocked, or authorized, by signatures. This ensures that the sender authorized the transaction, but what if the sender sends the same coins to multiple people? The owner of a lockbox may try to spend the same output twice. This is called the double-spending problem. Much like being given a check that has the possibility of bouncing, the receiver needs to be assured that the transaction is valid.

This is where a major innovation of Bitcoin comes in, with blocks. Think of blocks as a way to order transactions. If we order transactions, a double-spend can be prevented by making any later, conflicting transaction invalid. This is the equivalent to accepting the earlier transaction as the valid one.

Implementing this rule would be easy (earliest transaction is valid, subsequent transactions that conflict are invalid) if we could order transactions one at a time. Unfortunately, that would require nodes on the network to agree on which transaction is supposed to be next and would cause a lot of transmission overhead in coming to consensus. We could also order large batches of transactions, maybe once per day, but that wouldn’t be very practical as transactions would settle only once per day and not have finality before then.

Bitcoin finds a middle ground between these extremes by settling every 10 minutes in batches of transactions. These batches of transactions are what we call blocks. In this chapter we’ll review how to parse blocks and how to check the proof-of-work. We’ll start with a special transaction called the coinbase transaction, which is the first transaction of every block.

Coinbase Transactions

Coinbase transactions have nothing to do with the US company of the same name. Coinbase is the required first transaction of every block and is the only transaction allowed to bring bitcoins into existence. The coinbase transaction’s outputs are kept by whomever the mining entity designates and usually include all the transaction fees of the other transactions in the block as well as something called the block reward.

The coinbase transaction is what makes it worthwhile for a miner to mine. Figure 9-1 shows what a coinbase transaction looks like.

Coinbase transaction
Figure 9-1. Coinbase transaction

The transaction structure is no different from that of other transactions on the Bitcoin network, with a few exceptions:

  1. Coinbase transactions must have exactly one input.

  2. The one input must have a previous transaction of 32 bytes of 00.

  3. The one input must have a previous index of ffffffff.

These three conditions determine whether a transaction is a coinbase transaction or not.

Exercise 1

Write the is_coinbase method of the Tx class.

ScriptSig

The coinbase transaction has no previous output that it’s spending, so the input is not unlocking anything. So what’s in the ScriptSig?

The ScriptSig of the coinbase transaction is set by whoever mined the transaction. The main restriction is that the ScriptSig has to be at least 2 bytes and no longer than 100 bytes. Other than those restrictions and BIP0034 (described in the next section), the ScriptSig can be anything the miner wants as long as the evaluation of the ScriptSig by itself, with no corresponding ScriptPubKey, is valid. Here is the ScriptSig for the genesis block’s coinbase transaction:

4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c
6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73

This ScriptSig was composed by Satoshi and contains a message that we can read:

>>> from io import BytesIO
>>> from script import Script
>>> stream = BytesIO(bytes.fromhex('4d04ffff001d0104455468652054696d6573203033\
2f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64\
206261696c6f757420666f722062616e6b73'))
>>> s = Script.parse(stream)
>>> print(s.cmds[2])
b'The Times 03/Jan/2009 Chancellor on brink of second bailout for banks'

This was the headline from the Times of London on January 3, 2009. This proves that the genesis block was created some time at or after that date, and not before. Other coinbase transactions’ ScriptSigs contain similarly arbitrary data.

BIP0034

BIP0034 regulates the first element of the ScriptSig of coinbase transactions. This was due to a network problem where miners were using the same coinbase transaction for different blocks.

The coinbase transaction being the same byte-wise means that the transaction IDs are also the same, since the hash256 of the transaction is deterministic. To prevent transaction ID duplication, Gavin Andresen authored BIP0034, which is a soft-fork rule that adds the height of the block being mined into the first element of the coinbase ScriptSig.

The height is interpreted as a little-endian integer and must equal the height of the block (that is, the number of blocks since the genesis block). Thus, a coinbase transaction cannot be byte-wise the same across different blocks, since the block height will differ. Here’s how we can parse the height from the coinbase transaction in Figure 9-1:

>>> from io import BytesIO
>>> from script import Script
>>> from helper import little_endian_to_int
>>> stream = BytesIO(bytes.fromhex('5e03d71b07254d696e656420627920416e74506f6f\
6c20626a31312f4542312f4144362f43205914293101fabe6d6d678e2c8c34afc36896e7d94028\
24ed38e856676ee94bfdb0c6c4bcd8b2e5666a0400000000000000c7270000a5e00e00'))
>>> script_sig = Script.parse(stream)
>>> print(little_endian_to_int(script_sig.cmds[0]))
465879

A coinbase transaction reveals the block it was in! Coinbase transactions in different blocks are required to have different ScriptSigs and thus different transaction IDs. This rule continues to be needed as it would otherwise allow the duplicate coinbase transaction IDs across multiple blocks.

Exercise 2

Write the coinbase_height method for the Tx class.

Block Headers

Blocks are batches of transactions, and the block header is metadata about the transactions included in a block. The block header as shown in Figure 9-2 consists of:

  • Version

  • Previous block

  • Merkle root

  • Timestamp

  • Bits

  • Nonce

Block parsing
Figure 9-2. Parsed block

The block header is the metadata for the block. Unlike in transactions, each field in a block header is of a fixed length, as listed in Figure 9-2; a block header takes up exactly 80 bytes. As of this writing there are roughly 550,000 blocks, or ~45 MB in block headers. The entire blockchain, on the other hand, is roughly 200 GB, so the headers are roughly .023% of the size. The fact that headers are so much smaller is an important feature, as we’ll see when we look at simplified payment verification in Chapter 11.

Like the transaction ID, the block ID is the hex representation of the hash256 of the header interpreted in little-endian. The block ID is interesting:

>>> from helper import hash256
>>> block_hash = hash256(bytes.fromhex('020000208ec39428b17323fa0ddec8e887b4a7\
c53b8c0a0a220cfd0000000000000000005b0750fce0a889502d40508d39576821155e9c9e3f5c\
3157f961db38fd8b25be1e77a759e93c0118a4ffd71d'))[::-1]
>>> block_id = block_hash.hex()
>>> print(block_id)
0000000000000000007e9e4c586439b0cdbe13b1370bdd9435d76a644d047523

This ID is what gets put into prev_block for a block building on top of this one. For now, notice that the ID starts with a lot of zeros. We’ll come back to this in “Proof-of-Work”, after we take a closer look at the fields in the block header.

We can start coding a Block class based on what we already know:

class Block:

    def __init__(self, version, prev_block, merkle_root, timestamp, bits, nonce):
        self.version = version
        self.prev_block = prev_block
        self.merkle_root = merkle_root
        self.timestamp = timestamp
        self.bits = bits
        self.nonce = nonce

Exercise 3

Write the parse method for Block.

Exercise 4

Write the serialize method for Block.

Exercise 5

Write the hash method for Block.

Version

Version in normal software refers to a particular set of features. For a block, this is similar, in the sense that the version field reflects the capabilities of the software that produced the block. In the past this was used as a way to indicate a single feature that was ready to be deployed by the block’s miner. Version 2 meant that the software was ready for BIP0034, which introduced the coinbase transaction block height feature mentioned earlier in this chapter. Version 3 meant that the software was ready for BIP0066, which enforced strict DER encoding. Version 4 meant that the software was ready for BIP0065, which specified OP_CHECKLOCKTIMEVERIFY.

Unfortunately, the incremental increase in version number meant that only one feature was signaled on the network at a time. To alleviate this, the developers came up with BIP0009, which allows up to 29 different features to be signaled at the same time.

The way BIP0009 works is by fixing the first 3 bits of the 4-byte (32-bit) header to be 001 to indicate that the miner is using BIP0009. The first 3 bits have to be 001, as that forces older clients to interpret the version field as a number greater than or equal to 4, which was the last version number that was used pre-BIP0009.

This means that in hexadecimal, the first character will always be 2 or 3. The other 29 bits can be assigned to different soft-fork features for which miners can signal readiness. For example, bit 0 (the rightmost bit) can be flipped to 1 to signal readiness for one soft fork, bit 1 (the second bit from the right) can be flipped to 1 to signal readiness for another, bit 2 (the third bit from the right) can be flipped to 1 to signal readiness for another, and so on.

BIP0009 requires that 95% of blocks signal readiness in a given 2,016-block epoch (the period for a difficulty adjustment; more on that later in this chapter) before the soft fork feature gets activated on the network. Soft forks that used BIP0009 as of this writing have been BIP0068/BIP0112/BIP0113 (OP_CHECKSEQUENCEVERIFY and related changes) and BIP0141 (Segwit). These BIPs used bits 0 and 1 for signaling, respectively. BIP0091 used something like BIP0009 but with an 80% threshold and a smaller block period, so it wasn’t strictly using BIP0009. Bit 4 was used to signal BIP0091.

Checking for these features is relatively straightforward:

>>> from io import BytesIO
>>> from block import Block
>>> b = Block.parse(BytesIO(bytes.fromhex('020000208ec39428b17323fa0ddec8e887b\
4a7c53b8c0a0a220cfd0000000000000000005b0750fce0a889502d40508d39576821155e9c9e3\
f5c3157f961db38fd8b25be1e77a759e93c0118a4ffd71d')))
>>> print('BIP9: {}'.format(b.version >> 29 == 0b001))  1
BIP9: True
>>> print('BIP91: {}'.format(b.version >> 4 & 1 == 1))  2
BIP91: False
>>> print('BIP141: {}'.format(b.version >> 1 & 1 == 1))  3
BIP141: True
1

The >> operator is the right bit-shift operator, which throws away the rightmost 29 bits, leaving just the top 3 bits. The 0b001 is a way of writing a number in binary in Python.

2

The & operator is the “bitwise and” operator. In our case, we right-shift by 4 bits first and then check that the rightmost bit is 1.

3

We shift 1 to the right because BIP0141 was assigned to bit 1.

Exercise 6

Write the bip9 method for the Block class.

Exercise 7

Write the bip91 method for the Block class.

Exercise 8

Write the bip141 method for the Block class.

Proof-of-Work

Proof-of-work is what secures Bitcoin and, at a deep level, allows the decentralized mining of Bitcoin. Finding a proof-of-work gives a miner the right to put the attached block into the blockchain. As proof-of-work is very rare, this is not an easy task. But because proof-of-work is objective and easy to verify, anyone can be a miner if they so choose.

Proof-of-work is called “mining” for a very good reason. Like with physical mining, there is something that miners are searching for. A typical gold mining operation processes 45 tons of dirt and rock before accumulating 1 oz of gold. This is because gold is very rare. However, once gold is found, it’s very easy to verify that the gold is real. There are chemical tests, touchstones, and many other ways to tell relatively cheaply whether the thing found is gold.

Similarly, proof-of-work is a number that provides a very rare result. To find a proof-of-work, the miners on the Bitcoin network have to churn through the numerical equivalent of dirt and rock. Like with gold, verifying proof-of-work is much cheaper than actually finding it.

So what is proof-of-work? Let’s look at the hash256 of the block header we saw before to find out:

020000208ec39428b17323fa0ddec8e887b4a7c53b8c0a0a220cfd000000000000000000
5b0750fce0a889502d40508d39576821155e9c9e3f5c3157f961db38fd8b25be1e77a759
e93c0118a4ffd71d
>>> from helper import hash256
>>> block_id = hash256(bytes.fromhex('020000208ec39428b17323fa0ddec8e887b4a7c5\
3b8c0a0a220cfd0000000000000000005b0750fce0a889502d40508d39576821155e9c9e3f5c31\
57f961db38fd8b25be1e77a759e93c0118a4ffd71d'))[::-1]
>>> print('{}'.format(block_id.hex()).zfill(64))  1
0000000000000000007e9e4c586439b0cdbe13b1370bdd9435d76a644d047523
1

We are purposefully printing this number as 64 hexadecimal digits to show how small it is in 256-bit terms.

sha256 is known to generate uniformly distributed values. Given this, we can treat two rounds of sha256, or hash256, as a random number. The probability of any random 256-bit number being this small is tiny. The probability of the first bit in a 256-bit number being 0 is 0.5, the first two bits being 00, 0.25, the first three bits being 000, 0.125, and so on. Note that each 0 in the hexadecimal just shown represents four 0- bits. In this case, we have the first 73 bits being 0, which has a probability of 0.573, or about 1 in 1022. This is a really tiny probability. On average, 1022 (or 10 trillion trillion) random 256-bit numbers have to be generated before finding one this small. In other words, we need to calculate 1022 in hashes on average to find one this small. Getting back to the analogy, the process of finding proof-of-work requires us to process around 1022 numerical bits of dirt and rock to find our numerical gold nugget.

The Target

Proof-of-work is the requirement that the hash of every block header in Bitcoin must be below a certain target. The target is a 256-bit number that is computed directly from the bits field (in our example, e93c0118). The target is very small compared to an average 256-bit number.

The bits field is actually two different numbers. The first is the exponent, which is the last byte. The second is the coefficient, which is the other three bytes in little-endian. The formula for calculating the target from these two numbers is:

target = coefficient × 256exponent 3

This is how we calculate the target given the bits field in Python:

>>> from helper import little_endian_to_int
>>> bits = bytes.fromhex('e93c0118')
>>> exponent = bits[-1]
>>> coefficient = little_endian_to_int(bits[:-1])
>>> target = coefficient * 256**(exponent - 3)
>>> print('{:x}'.format(target).zfill(64))  1
0000000000000000013ce9000000000000000000000000000000000000000000
1

We are purposefully printing this number as 64 hexadecimal digits to show how small the number is in 256-bit terms.

A valid proof-of-work is a hash of the block header that, when interpreted as a little-endian integer, is below the target number. Proof-of-work hashes are exceedingly rare, and the process of mining is the process of finding one of these hashes. To find a single proof-of-work with the preceding target, the network as a whole must calculate 3.8 × 1021 hashes, which, when this block was found, could be done roughly every 10 minutes. To give this number some context, the best GPU mining card in the world would need to run for 50,000 years on average to find a single proof-of-work below this target.

We can check that this block header’s hash satisfies the proof-of-work as follows:

>>> from helper import little_endian_to_int
>>> proof = little_endian_to_int(hash256(bytes.fromhex('020000208ec39428b17323\
fa0ddec8e887b4a7c53b8c0a0a220cfd0000000000000000005b0750fce0a889502d40508d3957\
6821155e9c9e3f5c3157f961db38fd8b25be1e77a759e93c0118a4ffd71d')))
>>> print(proof < target)  1
True
1

target is calculated above.

We can see that the proof-of-work is lower by lining up the numbers in 64 hex characters:

TG: 0000000000000000013ce9000000000000000000000000000000000000000000

ID: 0000000000000000007e9e4c586439b0cdbe13b1370bdd9435d76a644d047523

Exercise 9

Write the bits_to_target function in helper.py.

Exercise 10

Write the difficulty method for Block.

Exercise 11

Write the check_pow method for Block.

Difficulty Adjustment

In Bitcoin, each group of 2,016 blocks is called a difficulty adjustment period. At the end of every difficulty adjustment period, the target is adjusted according to this formula:

  • time_differential = (block timestamp of last block in difficulty adjustment period) – (block timestamp of first block in difficulty adjustment period)
  •  
  • new_target = previous_target * time_differential / (2 weeks)

The time_differential is calculated so that if it’s greater than 8 weeks, 8 weeks is used, and if it’s less than 3.5 days, 3.5 days is used. This way, the new target cannot change more than four times in either direction. That is, the target will be reduced or increased by four times at the most.

If each block took on average 10 minutes to create, 2,016 blocks should take 20,160 minutes. There are 1,440 minutes per day, which means that 2,016 blocks will take 20,160 / 1,440 = 14 days to create. The effect of the difficulty adjustment is that block times are regressed toward the mean of 10 minutes per block. This means that long-term, blocks will always go toward 10-minute blocks even if a lot of hashing power has entered or left the network.

The new bits calculation should be using the timestamp field of the last block of each of the current and previous difficulty adjustment periods. Satoshi unfortunately had another off-by-one error here, as the timestamp differential calculation looks at the first and last blocks of the 2,016-block difficulty adjustment period instead. The time differential is therefore the difference of blocks that are 2,015 blocks apart instead of 2,016 blocks apart.

We can code this formula like so:

>>> from block import Block
>>> from helper import TWO_WEEKS  1
>>> last_block = Block.parse(BytesIO(bytes.fromhex('00000020fdf740b0e49cf75bb3\
d5168fb3586f7613dcc5cd89675b0100000000000000002e37b144c0baced07eb7e7b64da916cd\
3121f2427005551aeb0ec6a6402ac7d7f0e4235954d801187f5da9f5')))
>>> first_block = Block.parse(BytesIO(bytes.fromhex('000000201ecd89664fd205a37\
566e694269ed76e425803003628ab010000000000000000bfcade29d080d9aae8fd461254b0418\
05ae442749f2a40100440fc0e3d5868e55019345954d80118a1721b2e')))
>>> time_differential = last_block.timestamp - first_block.timestamp
>>> if time_differential > TWO_WEEKS * 4:  2
...     time_differential = TWO_WEEKS * 4
>>> if time_differential < TWO_WEEKS // 4:  3
...     time_differential = TWO_WEEKS // 4
>>> new_target = last_block.target() * time_differential // TWO_WEEKS
>>> print('{:x}'.format(new_target).zfill(64))
0000000000000000007615000000000000000000000000000000000000000000
1

Note that TWO_WEEKS = 60*60*24*14 is the number of seconds in 2 weeks: 60 seconds × 60 minutes × 24 hours × 14 days.

2

This makes sure that if it took more than 8 weeks to find the last 2,015 blocks, we don’t decrease the difficulty too much.

3

This makes sure that if it took less than 3.5 days to find the last 2,015 blocks, we don’t increase the difficulty too much.

Note that we only need the headers to calculate what the next block’s target should be. Once we have the target, we can convert the target to bits. The inverse operation looks like this:

def target_to_bits(target):
    '''Turns a target integer back into bits'''
    raw_bytes = target.to_bytes(32, 'big')
    raw_bytes = raw_bytes.lstrip(b'\x00')  1
    if raw_bytes[0] > 0x7f:  2
        exponent = len(raw_bytes) + 1
        coefficient = b'\x00' + raw_bytes[:2]
    else:
        exponent = len(raw_bytes)  3
        coefficient = raw_bytes[:3]  4
    new_bits = coefficient[::-1] + bytes([exponent])  5
    return new_bits
1

Get rid of all the leading zeros.

2

The bits format is a way to express really large numbers succinctly and can be used with both negative and positive numbers. If the first bit in the coefficient is a 1, the bits field is interpreted as a negative number. Since the target is always positive for us, we shift everything over by 1 byte if the first bit is 1.

3

The exponent is how long the number is in base 256.

4

The coefficient is the first three digits of the base 256 number.

5

The coefficient is in little-endian and the exponent goes last in the bits format.

If the block doesn’t have the correct bits calculated using the difficulty adjustment formula, then we can safely reject that block.

Exercise 12

Calculate the new bits given the first and last blocks of this 2,016-block difficulty adjustment period:

  • Block 471744:

    000000203471101bbda3fe307664b3283a9ef0e97d9a38a7eacd88000000000000000000
    10c8aba8479bbaa5e0848152fd3c2289ca50e1c3e58c9a4faaafbdf5803c5448ddb84559
    7e8b0118e43a81d3
  • Block 473759:

    02000020f1472d9db4b563c35f97c428ac903f23b7fc055d1cfc26000000000000000000
    b3f449fcbe1bc4cfbcb8283a0d2c037f961a3fdf2b8bedc144973735eea707e126425859
    7e8b0118e5f00474

Exercise 13

Write the calculate_new_bits function in helper.py.

Conclusion

We’ve learned how to calculate proof-of-work, how to calculate the new bits for a block after a difficulty adjustment period, and how to parse coinbase transactions. We’ll now move on to networking in Chapter 10 on our way to the block header field we haven’t covered, the Merkle root, in Chapter 11.