Beacon State Optimizations, Proposer Index computation and choice of structures formatting.

Beacon State Optimizations, Proposer Index computation and choice of structures formatting.


Hello, This is Giulio. This post will be quite short because I have not been able to code much but I still manage some brainstorming and optimizations, so I thought I would share… Today, I will talk about some significant optimizations to the Beacon State Merkle Root in Erigon, progress with state transition and brainstorming about how to best store Beacon Data in DB and put down some Pros and Cons on all the formats I went over…

Beacon State Optimizations

The only code is directly wrote this week was to implement a series optimizations into the Beacon State Root computation. Basically it was just mild refactoring and just some rewriting of State Root code. Basically what I did was: Optimize Intermediate hashes computation of the Beacon State to use CPU native vector operations with Prysm’s GoHashTree library, which led to a bunch of code deletions because we abstract away stuff in the library so less for me to maintain and led to an improvement of Best case scenario (9x boost in performance). Here are the 2 benchmarks:

Before optimization (Best Case):

After optimization (Best Case):

Another thing I did was use these vector operations on computation of SSZ Arrays/Vectors, which I was not doing before. this lead to a worst case scenario that 20x times more efficient than what I had before, and there is still some more stuff that I can improve there. In any case after doing so this is the performances I got:

Before optimization (Worst Case):

After optimization (Worst Case):

Progress on State Transition

Regarding State Transition, there is an external contributor that is greatly helping with it: Mikeneunder (on discord) and did some progress and I would like to share at what point I am thanks to him. So we have: Transition of slots with tests, made a State Transistor meant to take a block and transition the underlying state and also made a fully working version of Proposer/Shuffled Index computation, so there is quite some progress on that side as well. I have not taken a look myself into it yet but looks promising. Unfortunately, I cannot share too much because I was not the one touching it but I think it is going in the right direction :D.

Database Formats Brainstorming

So in any Blockchain there are 3 types of formatting of data you can have: Network Format, Hashing Format and Internal Format. Network Format is the format used when communicating with other peers, this kind of data format is enforced by the protocol and it is required to implement and use to encode/decode p2p packets, so it cannot be chosen and there is no choice whatsoever at all. Hashing Format is the data format for Merkle Tree encoding so it also cannot be chosen. In Consensus Layers, Hashing format is SSZ, while Network format is compressed SSZ (with snappy). However, unlike Hashing and Network format, there is a third type: Internal data format, which is how we can actually store data inside the database internally and this format can indeed be chosen at the developer discretion. So what I need to do is to decide what data model is best to store beacon blocks, state, etc…

Here are some options I thought about with their respective Pros and Cons:

  • RLP
    • Pros: No Pros
    • Cons: It is trash
  • CBOR
    • Pros: Fast reading/writing, somewhat compact and more generalized, it is what Erigon EL uses internally to encode for storage.
    • Cons: Slower RPC query as it needs to be decoded and re-encoded in SSZ and then compressed with snappy. Some parts of the API will also be slower because they are SSZ encoded, so decompress/re-compress issue.
  • SSZ
    • Pros: Fast reading/writing, would make RPC queries faster than CBOR because it just needs to be compressed and the SSZ encoded responses of the Beacon API would be at their highest level of optimization possible.
    • Cons: Not so compact, not generalized.

Snappy SSZ:

  • Pros: RPC queries at their fastest and extremely compact since it is compressed encoding. Faster than CBOR at handling some parts of the Beacon API (i think? not sure have not tested).
  • Cons: Slow in reading/writting (decompression).

As of now, I am indecided beetwen SSZ and Snappy SSZ, it will mostly depends how much do the Node operators market value Beacon API performance over disk footprint. If nobody cares too much about the Beacon API then screw it, Snappy SSZ it is. otherwise, I will consider plain SSZ.

In conclusion, I have not done much this week, sorry for that :(. was still able to gather these thoughts and do minor optimizations. Said that, I hope you enjoyed it.


original post: