Beacon State Trie Root and Beacon Reverse Downloader

Beacon State Trie Root and Beacon Reverse Downloader

I just wet the bed.


Hi Frens, This is Giulio again and here is yet another one of my rants. Today we are going to discuss my progress on Beacon State Trie Root and what is to come next in this beautiful Consensus Adventure. So the first thing is: how I managed to make State Trie computation work, its performance (probably lacking but still somewhat okey) and my thought process. Additionally, I will introduce a new routine inside my current prototype which is the Backward Beacon Downloader as my next objective. but on that I will talk later.

State Trie Root Computation Progress

So, I managed to crack Beacon State Merkle Root computation, so now I can actually generate correct state root and this will save a lot of time. However, It was not easy and let me explain why. But first, let’s go back and look at how the Beacon chain Merkle Tree look like.

Above is the Merkle Tree for a Bellatrix Beacon State, as you can see we have a bunch of fields as leaves plus a bunch of zero hashes to compensate. the reason behind the zero hashes is that the leaves count is 25 by itself as the Bellatrix Beacon state has 25 fields, which means the tree has depth 5. Now you could technically make a Merkle Tree with 25 leaves and just carry the extra leaves until we can break them even but the consensus specs wants us to round it to the next power of 2 which in this case is 32, so we need to add 7 extra empty leaves to compensate. that is the first thing indeed. Now, each of the leaves that are non-empty have a subtree but we dont care about that, because I decided that the caching is going to happen at depth 5, so depth 5 I will cache. In other words, in each state root re-computation, I am going to re-compute the leaves ONLY if I changed the associated field. But why do I need to cache them?

Here is a benchmark of the non-cached performace. As you can see it is really slow and beefy operation(takes half a second).

Here is a benchmark of the the cached Performace. As you can see it is faster and it provides almost a 300x improvement in Performace, However, this measurement is not realistic because it is the best case, and If I had to estimated the average case I would say it will lead to a 2-3x improvement in performance, which is still really good.

Side note

I designed the caching system to NOT cache intermediate hashes because it is too much code for too little benefit, we are talking about not recomputing 31 Keccak256 operations which will not be bottleneck and about which I do not give a flying fuck. I want Erigon Embedded Consensus Layer to be optimized but also to provide clean code and not be a gigantic mess of nested structures. Therefore, I decided to accept this tradeoff.

How are each leaves generated?

So for all the leaves that are composed by simple structures like ForkData, Validators, Eth1Data, etc… I just used the fastssz autogenerated code. However there were a bunch that I had to rewrite code for:

  • Array Structures with fixed length
  • Vector Structures with limited length
  • Chunk Packed structures

Lets go over them: Array structures with fixed length were the easiest as their root is just the Merkle Root of all the fields and we round up the expected array length if the length of the filed is too low by adding empty leaves. Vector structures with limited length were a little bit harder as they did not round up but still doable, at the end of the vector elements root computation, the specs also required me to mix the vector length (or limit) with the root as the depth 1 level of the Merkle Tree. Chunk packed structures roots were calculated exactly like Vector structures with limited length but had to be formatted first. Thanks to GoHashTree, I was able to keep my code small, concise and performant and I can still optimize a couple things too. All in all, I am happy with the result because the whole caching + root computation is kept at less than 2000 lines of code, while other CLs takes way more lines of code, but they are probably more optimized but this is a tradeoff I am willing to take.

Beacon Reverse Downloader

Now, that I can compute the correct State Root for a given Beacon State, I can also compute the Latest valid block root for a given Beacon State. This means that through checkpoint sync, I can now theoretically replace ETH1 Erigon downloader with an ETH2 Downloader instead (Always keep in mind Eth1 network is a subset of the Eth2 network), to do so I can download the latest valid block for the checkpoint state and then just validate each block backwards by comparing their block root with the next block parent root. then, I will repeat the process until I reconnect with the ETH1 chain. This will replace Headers and Bodies stages in Erigon under one single protocol. Basically this downloader type is the whole reason I needed state root computation in the first place.

Said that, I am very happy with my progress, and I also want to thank all the people that privately told me that enjoy reading my rants. I really appreciate it.

To the next one.

original post: