The Road to Sub-dollar Transactions, Part 2: Compression Edition

The Road to Sub-dollar Transactions, Part 2: Compression Edition
This post is the second in a series on the path to sub-dollar ORU transactions. For full context, check out the first!

Optimistic rollups are rapidly gaining maturity. As we move beyond the “zero to one” phase, the name of the game is optimization–and the most tangible optimization is cost. Within the next month we will be decreasing fees 30–40% by deploying the first ever system-wide calldata compression on any production ORU network. Get ready.

Looking further ahead, we have plans to unlock even more savings this summer as part of OP Mainnet: Bedrock. This post dives into the nitty-gritty of calldata compression: in particular, how we evaluated compression algorithms, and how we will use them to pave the way to sub-dollar fees.

Calldata Overview

OP Mainnet uses Ethereum as a data availability layer. This means that each transaction executed on OP Mainnet is stored (but not executed) on Ethereum. Right now we store OP Mainnet transactions in calldata. Multiple L2 transactions are batched up into a binary blob and the blob (plus some other info) is stored in the data field of the transaction. To later retrieve that data, we look at the body of transactions (which are stored in the blocks themselves). Because Ethereum blocks are saved, OP Mainnet can always be reconstructed from Ethereum.

While storing data in the blocks is much cheaper than storing it in contract state, keeping historical blocks around forever does incur a cost on node operators. As such, Ethereum charges for calldata. It costs 4 gas per zero byte and 16 gas per non-zero byte of calldata (zeros represent around 40% of bytes in transactions submitted to OP Mainnet).

While posting calldata is a large part of where rollup savings come from, it still represents the primary cost passed on to users. So, the more we can reduce the amount of data posted, the cheaper we can make rollups. Enter compression: the art of reducing the size of data! What follows is an in-depth analysis of compression statistics run on real-world data.

Compression Overview & Results

We looked at 22 thousand batches (nearly 3 million individual transactions) that OP Mainnet had submitted to Ethereum and compressed them in different configurations to determine how to best implement compression and experiment with what is possible.

We also looked at a variety of compression algorithms and measured both the compression rate (the size of the compressed data as a percentage of the uncompressed size) and the estimated fee savings (assuming that 40% of bytes in a transaction are the zero-byte).

One configuration option that is important to understand is the presence of a dictionary. A dictionary is created ahead of time to show the algorithm data fragments that are commonly used in real world data. The compression algorithm uses a dictionary to better compress data, particularly when compressing a small amount of data at once. By taking a random sample of transactions, we can create a dictionary for zlib and zstd which improves the compression ratio when compressing individual transactions and batches.

As most fields in an Ethereum transaction are random (addresses and function selectors are hashes, signatures should be random), a single Ethereum transaction does not compress well. Because Ethereum also heavily discounts 0 bytes, which compression algorithms quickly remove, the fee savings will not be as much as the compression rate. As such, to get the highest fee savings, we need to run an advanced algorithm over as much data as possible.

Here are the results for compressing transactions by themselves:

AlgorithmModeDictionary?Compression RateEstimated Fee Savings
zlibtransactionno63.97%8.61%
zlibtransactionyes59.33%15.24%
zstdtransactionno62.87%10.18%
zstdtransactionyes48.31%30.99%
LZWtransactionno68.29%2.44%
Brotlitransactionno65.78%6.02%
ZLEtransactionno59.37%15.18%

view rawindividual-transactions.csv hosted with ❤ by GitHub

As you can see, compressing individual transactions on their own will only get us 10–15% savings. Note that the size of transactions is reduced by more than this, but the savings are less–this is due to the cheaper zero bytes discussed above.

Zstandard with a dictionary is significantly more performant because there are commonalities between each transaction and what is stored in the dictionary. However, zstd still performs better when compressing larger amounts of data at once.

On the other extreme is compressing every single example transaction in one go. This is not possible in practice, but serves as an example of the maximum possible compression ratio.

AlgorithmModeDictionary?Compression RateEstimated Fee Savings
zlibtotalno38.40%45.14%
zlibtotalyes38.40%45.14%
zstdtotalno35.92%48.68%
zstdtotalyes36.19%48.31%
LZWtotalno62.30%11.00%
Brotlitotalno34.41%50.84%
ZLEtotalno59.37%15.18%

view rawtotal-transactions.csv hosted with ❤ by GitHub

So, in this example we are looking at somewhere between 10% and 50% savings due to compression. But what can we actually achieve in practice?

When looking at compressing batches of transactions (hundreds of transactions), the compression ratio is significantly better than compressing individual transactions, but slightly worse than compressing every transaction at once. This is because users tend to interact with some contracts significantly more than other contracts. In addition, certain fields (like chain ID and gas price) tend to be similar across transactions. Compression algorithms rely on these similarities to do their job.

AlgorithmModeDictionary?Compression RateEstimated Fee Savings
zlibbatchesno40.13%42.67%
zlibbatchesyes39.61%43.41%
zstdbatchesno40.12%42.68%
zstdbatchesyes37.31%46.70%
LZWbatchesno62.24%11.09%
Brotlibatchesno39.21%43.99%
ZLEbatchesno59.37%15.18%

view rawbatched-transactions.csv hosted with ❤ by GitHub

When comparing the different compression algorithms, we identified zlib, zstd, and brotli as the algorithms that compress the most. We ruled out Brotli because it is much slower than zstd or zlib at similar compression ratios. In general, the better compression ratio an algorithm (or setting for an algorithm), the slower the compression algorithm runs. ZSTD tends to perform better than other compression algorithms over a wide range of compression speed/ratio options in common benchmarks. Also note that Ethereum transactions have different characteristics than the data in the benchmarks.

Zlib and zstd are quite close, and we’ll be rolling out zlib compression (without a dictionary) in the short term because it has quite good results, good speed, and good availability in different programming languages. Longer term, we expect to lean into zstd to achieve the highest possible compression ratio and lowest possible user fees.

Conclusion

In summary: if historic trends continue, we can expect to reduce fees by 30–40% by introducing the compression explained above.

Zlib batch compression is coming to OP Mainnet soon.

  • Kovan on 3/17
  • Mainnet on 3/24

Zstd based compression (with a dictionary) is on the roadmap for OP Mainnet: Bedrock, with release slated for later this year.

In addition to reducing user fees through compression, OP Mainnet is also looking to improve Ethereum’s ability to serve as a data availability layer via EIP-4844 & similar approaches to shave further costs.

That’s it for now! Stay tuned for the next post on another step along the road to sub-dollar transactions! Oh, and as always if you’re interested in joining a talented group of Optimists dedicated to building a scalable and sustainable future for Ethereum we’d love to hear from you! Check out all open roles on our jobs board.

Appendix: Algorithm Summary

AlgorithmCompression OptionsUpstream Link
zlibgolang default levelhttps://pkg.go.dev/compress/zlib
zstdLevel 3http://facebook.github.io/zstd/
LZWLSB, 8https://pkg.go.dev/compress/lzw
BrotliLevel 6https://github.com/google/brotli
ZLEn/ahttps://github.com/ethereum/pyethereum/blob/develop-snapshot/ethereum/compress.py

view rawappendix.csv hosted with ❤ by GitHub

ZLE stands for zero-byte run length encoding. It is a simple compression algorithm that replaces a string of zeros with how many zeros should be there.