Understanding big scaling a little bit more
Because SPV support is essential for world-scale
Jun 2, 2022
Running the backbone of modern crypto systems, like Bitcoin Cash, are so called “full nodes”. They earn that name by being able to download the raw transaction blocks and validating them to comply with hundreds of rules. They end up with a current state UTXO database. This current state database is a fraction of the size of the entire blockchain because it only records where money currently is. Whereas the blockchain as a whole tells us also where money came from and which path it took.
Some smart people realized that since the product of the full node is a UTXO database, which is only a fraction of the size of the whole blockchain, it would be neat to only have to download the smaller database and that way get a node started much faster. Bypassing a big download and bypassing the validation of the bigger chain.
To make this reasonably secure there is an idea to hash this database and that way avoid corruption or even cheating when people download just this smaller database. The result are UTXO Commitments. (more)
Pruning
The natural consequence of starting a new node using a commitment instead of the full blockchain is that this node is running in “pruned” mode. Afterall, the entire point was to not download the history. So lets talk a little about pruned mode.
The Bitcoin Core people many years ago implemented pruning in a simple manner. They essentially just delete blocks after a certain amount of time. The client copies the required data from the blocks into the same database that holds the UTXO. This are details like the output script, the amount and similar.
The basic idea is that if all the data is copied from a block after it has been validated, you don’t actually have any more need for the block and thus deleting it after some time is safe. The only problem is that peers asking for that block won’t be able to get a copy.
SPV Wallets
A peer that is pruned no longer has access to the block and thus a peer can’t download the historical block from them. This is a known thing and mostly the network doesn’t mind. More peers are available.
There is a second side-effect of pruning, though. And it is not well known and the reason for me writing this post.
The way that pruning has been implemented by Core all these years ago is by simply deleting entire blocks to free up disk space. Basically, deleting the historical transactions knowing we copied enough info to validate its usage later.
When we look at SPV clients, or Simpliefied Payment Verification as the whitepaper calls them (8), they actually require the historical transactions. The whole transaction, not just the output script that is still unspent.
The reason for this is the merkle-tree. SPV wallets, in order to validate they have been sent an actually mined transaction need to have the entire transaction. Without this (and some more details) there is no way for an SPV wallet to check that this is a valid transaction. (see merkleblock spec)
If an SPV wallet can’t check if a transaction was mined they need to trust a random person on the Internet and the entire point of crypto is moot. Trust, but verify.
The problem here is that a pruned node can not supply this information. It has thrown away the entire block and only the output script is still present. A pruned node can not be used by an SPV wallet.
Intermezzo
We want to make it easy for people running full nodes and since we hear one of the pain points is the long download some people came up with commitments in order to make the download less.
This download is simply the “who owns what at this point in time” as opposed to the entire blockchain which also includes the entire minutely detailed historical.
Full nodes are important and we want to have more for the simple reason that (and Satoshi has been saying this also) the vast majority of people using Bitcoin Cash are going to run thin clients (SPV). So to allow many many more SPV users we need more full nodes. Simple, no?
These thin clients, or SPV, need to be able to download the transactions and using a concept of a ‘merkle-proof’ they can tie this into the blockheaders that make this secure and everyone is happy.
Unfortunately that means that the idea of pruning as it has been implemented by Bitcoin Core so many years ago is not sufficient. A pruned node is unable to supply an SPV wallet with proof-of-inclusion of transactions older than a couple of days.
Chapter 7; “Reclaiming disk space”
If we look at the bitcoin whitpaper we find that Satoshi wrote a chapter on reclaiming disk space.
He explains that a full node could reclaim disk space by removing fully spent transactions. The below screenshot goes into the technical details on how to do this properly.
Using the method described by Satoshi we would gain two things when compared to the pruning that is used today. First, the actual transactions would still be available for peers to download even several years later. Only fully spent transactions are deleted. Second gain is that Satoshi describes on how such a peer can provide the per-block merkle tree, which is needed to prove the transaction is part of the blockchain.
The bottom line here, though, is that should a full node implement this type of disk reclaiming it can have the benefits of a pruned node while still being able to supply some data to SPV wallets. Going from mostly useless network-nodes to much more useful ones.
And if the goal of commitments is to have more useful full nodes in order to support those billion SPV wallets, then some distribution scheme should take into account all not-yet-spent transactions and merkle-proofs to be distributed next to the actual unspent-outputs themselves.
Historical Transactions are sometimes also needed
While the above change of how to look at unspent outputs is relevant and would make a pruned node much more useful to the network, it should be noted that this does not (fully) solve the problem of an SPV wallet.
What it would also need is some historical node that can provide transactions which are fully spent. For instance if you are restoring a wallet from backup it needs to find also the spent transactions in order to get a proper balance of the wallet.
In short, a fast synched peer which has only unspent transactions is no substitute for a fully historical one. The network will still need historical data for correct operation.
Thank for reading!