Disk footprint changes in new Erigon alpha

Disk footprint changes in new Erigon alpha


Today we are publishing another alpha release, 2022.04.04-alpha, LINK.

Multiple things were fixed and improved.

Most important problem with alpha release 2022.04.03 was the slow download speed of block snapshots via BitTorrent. Even though there were enough seeders around, the download speed was usually limited by 2Mb/s. Although the exact root cause has not yet been determined, we have observed that:

  • Downloading via 3rd party BitTorrent clients does not have this issue, so it is likely that the root cause is due to the library Erigon uses or the manner in which it is used.
  • If files are downloaded one at a time, instead of all at the same time, the speed improves.

In version 2022.04.04, we therefore introduce a temporary workaround, to download files one by one, which leads to much more reasonable download times (otherwise it takes around 32 hours to download Ethereum main net block snapshots of 225 Gb at the speed of 2Mb/s).

There were also issues with the last stage of initial sync, called “TxLookup”. This stage manages the mapping that allows locating transaction payload by their hashes. Since in alpha release, most of this mapping was moved from the MDBX database to the indices based on perfect hashing, the issues may arise on the “edge”. If there are any issues, please update to the new version, and if they persist, use integration tool to reset the “TxLookup” stage:

make integration

./build/bin/integration stage_tx_lookup —reset —datadir <your_datadir>

The main topic of this post is the comparison of disk footprint between beta (current stable versions of Erigon) and alpha (with improvements coming from Erigon 2 upgrade 1). Here are the charts:

First thing to note is that in beta version, all the data are kept in the single MDBX database file, which at the time of writing should come at around 1858 Gb after a fresh sync for Ethereum main net. Within that database file, there are individual tables shown as segments in the charts. In alpha version, data related to block headers, block bodies, transaction payloads, senders (originators of transactions), and mapping from transaction hashes to transaction payload, are mostly moved from the MDBX database file into separate static files which are shared and downloaded via BitTorrent. Total size of these shared files is around 225 Gb.

All corresponding segments are coloured in the same colour on both charts, which makes it easy to see that two segments that shrank the most are Transactions (light green, 477 Gb => 17.5 Gb), and Transaction Lookup (grey, 101.3 Gb => 1.3 Gb).

Another thing to note is the biggest tables in the ALPHA footprint breakdown: Event Logs, Account History, Storage History, Call Trace Set. These tables will be the focus of Erigon 2 Upgrade 2.

And lastly, similar to go-ethereum freezer (a.k.a ancient storage), the static files can now be mounted on different drives and do not need to be co-located with the rest of the data. However, we have not yet tested the performance of the configuration where these files are moved to a cheaper but slower storage, like HDD.

original post: https://erigon.substack.com/p/disk-footprint-changes-in-new-erigon