Vitalik: What kind of Layer 3 makes sense?

One topic that often resurfaces in layer 2 scaling discussions is the concept of “layer 3s.” If we can build a layer 2 protocol anchored to layer 1 for its security and primarily for increasing its scalability, then surely we can scale it up by building a layer 3 protocol that “anchors to layer 2 for security and adds more scalability on top of it”?

A simple version of this idea is: if you have a scheme that gives you quadratic growth, can you stack that scheme on top of itself and get exponential growth? Similar ideas include my 2015 scalability paper and the multi-layer scaling mentioned in the Plasma paper. Unfortunately, such simple concepts of layer 3s are not so easy to formulate. There are always things in the design that are not stackable and only give you a one-time scalability boost, due to data availability constraints, reliance on layer 1 bandwidth for urgent extraction, or a host of other issues.

Newer ideas around layer 3s, like the framework proposed by Starkware, are more complex: they don’t just stack the same thing on top of themselves, they assign different purposes to layer 2 and layer 3. This potential form of this approach could work if it’s done the right way. This post will go into detail about what might and might not make sense in a three-layer architecture.

Why can’t you keep scaling by stacking rollups on top of rollups ?

Rollups (see my longer post here) are a scaling technology that combines different techniques to solve the two main scaling bottlenecks of running a blockchain: computation and data. Computation has been solved by things like “fraud proofs” or SNARKs, which rely on a very small number of participants to process and verify each block, requiring others to perform only a small amount of computation to check that the proof process was completed correctly. These schemes, especially SNARKs, can scale almost infinitely; we can keep making “SNARKs of many SNARKs” to reduce more computation into a single proof.

The data is different. Rollups use a range of compression tricks to reduce the amount of data that a transaction needs to store on-chain: a simple money transfer goes from ~100 bytes to ~16 bytes, an ERC20 transfer in an EVM-compatible chain goes from ~180 bytes to ~23 bytes, and a privacy-preserving ZK-SNARK transaction can be compressed from ~600 bytes to ~80 bytes. That’s about an 8x compression in all cases. But the rollup still needs to make the data available on-chain in a medium that users can access and verify, so that users can independently compute the state of the rollup and join as attesters when existing attesters are offline. Data can be compressed once, but not again — and if it can, then there’s usually a way to put the logic of the second compressor into the first and get the same benefit from compressing it once. So “Rollups on top of Rollups” don’t actually provide huge gains in scalability, but, as we’ll see below, the pattern can be used for other purposes.

So what is the “sane” version of layer 3?

Well, let’s see what Starkware is advocating in their post about layer 3s. Starkware is made up of very smart cryptographers, and they are sane, so if they are advocating layer 3s, their version will be much more complicated than “if rollups compress data 8x, then obviously rollups on top of rollups will compress data 64x”. .

Here’s the image from Starkware’s post:

A few quotes:

The figure above depicts an example of such an ecosystem. Its L3 includes:

1. StarkNet with Validium data availability, such as , is often used in applications that are extremely sensitive to pricing.

2. Application-specific StarkNet systems customized for better application performance, for example, by adopting specified storage structures or data availability compression.

3. StarkEx systems (such as those serving dYdX, Sorare, Immutable, and DeversiFi) have Validium or Rollup data availability, immediately bringing proven scalability advantages to StarkNet.

4. Private StarkNet instances (also referred to as L4 in this example) allow privacy-preserving types of transactions to exist without including them in the public StarkNet.

We can condense the article into “Three Visions of ‘L3s’”:

L2 is for scaling, and L3 is for custom features, such as privacy. In this vision, there is no attempt to provide "quadratic scalability"; instead, there is a layer in the stack that helps applications scale, and then there are independent layers to meet the custom feature requirements of different use cases.
L2 is for general-purpose expansion, and L3 is for customized expansion. Customized expansion may take different forms: dedicated applications that use something other than the EVM for computation, rollups with data compression optimized for application-specific data formats (including separating "data" from "proof" in each block and replacing proofs with a single SNARK), etc.
L2 is for trustless scaling (rollups), and L3 is for weak trust scaling (validiums). Validium is a system that uses SNARKs to verify computations, but leaves data availability to a trusted third party or committee. Validium is, in my opinion, severely underrated: in particular, many "enterprise blockchain" applications may actually be best served by a prover running a validium and periodically submitting hashes to the chain's centralized servers. Validium is less secure than rollups, but can be much cheaper.

In my opinion, all three of these visions are fundamentally plausible. The idea that specialized data compression requires its own platform is probably the weakest proposition - it would be very easy to design an L2 with a generic base layer compression scheme that users could automatically scale with application-specific sub-compressors, but other than that, the use cases are all plausible. But that still leaves a big question: is a three-layer architecture the right way to achieve these goals? What is the point of anchoring authentication, privacy systems, and custom environments to L2 instead of just to L1? It turns out that the answer to this question is quite complicated.

In the subset tree of L2, do deposits and withdrawals become cheaper and easier?

One possible argument for the three-layer model being superior to the two-layer model is that the three-layer model allows an entire sub-ecosystem to exist in a single rollup, which allows cross-domain operations within that ecosystem to occur very cheaply without having to go through an expensive L1.

But it turns out that deposits and withdrawals can be very cheap even between two L2s or even L3s. The key here is that tokens and other assets don’t have to be issued in the root chain. That is, you can have an ERC20 token on Arbitrum, create a wrapper around it on Optimism, and move back and forth between the two without any L1 transactions!

Let’s take a look at how such a system works. There are two smart contracts: the base contract on Arbitrum and the wrapped token contract on Optimism. To transfer from Arbitrum to Optimism, you send tokens to the base contract, which generates a receipt. Once Arbitrum is finalized, you can take a Merkle proof of that receipt and root it in the L1 state, and then send it to the wrapped token contract on Optimism, which verifies it and issues you a wrapped token. To move tokens back, you do the same thing in reverse.

While the Merkle path required to prove a deposit on Arbitrum goes through the L1 state, Optimism only needs to read the L1 state root to process the deposit - no L1 transactions are required. Note that since rollups data is the most scarce resource, a practical implementation of this scheme would use SNARK or KZG proofs instead of direct Merkle proofs to save space.

This scheme has an Achilles’ heel compared to L1-based tokens (at least on optimistic rollups): deposits also need to wait for a fraud-proof window. If the tokens were rooted on L1, there would be a one-week delay to withdraw from Arbitrum or Optimism to L1, but deposits would be instant. However, in this scheme, both deposits and withdrawals have a one-week delay. That said, it’s not clear that a three-tier architecture on an ideal rollups is better: there is a lot of technical complexity in ensuring that fraud-proof gaming that occurs inside a system that itself runs on fraud-proof gaming is secure.

Fortunately, neither of these issues will be a problem for ZK rollups. ZK rollups don’t require week-long waiting windows for security reasons, but they still require shorter windows (first-generation technology might require 12 hours) for two other reasons. First, especially more complex general-purpose ZK-EVM rollups take longer to cover the non-parallelizable computation time of blocks. Second, for economic reasons, proofs need to be submitted rarely to minimize the fixed costs associated with proving transactions. The next generation of ZK-EVM technology, including specialized hardware, will solve the first problem, while better-architected batch verification can solve the second. It is this issue of optimization and batch submission of proofs that we will discuss next.

Rollups and validiums have a confirmation time vs fixed cost tradeoff. L3 can help with this , but what else can do the same ?

The cost of rollups per transaction is cheap: it’s just 16-60 bytes of data, depending on the application. But rollups also have a high fixed cost that must be paid every time a batch of transactions is submitted to the chain: 21,000 L1 gas per batch for optimistic rollups and over 400,000 gas for ZK rollups (millions of gas if you want to provide quantum-safe stuff with just STARKS).

Of course, rollups can simply choose to wait until there is an L2 transaction with 10 million gas value to submit the entire batch, but this will give them a very long batch interval, forcing users to wait longer for high-security confirmations. Therefore, they need to make a trade-off: a longer batch interval and optimal cost, or a shorter batch interval and greatly increased cost.

To give us some concrete numbers, let’s consider a ZK rollup that costs 600,000 gas per batch and processes a fully optimized ERC20 transfer (23 bytes) at a cost of 368 gas per transaction. Assume this rollup is in the early to mid-stage of adoption and has a TPS of 5. We can calculate the gas per transaction vs. the batch interval:

If we move into a world with a lot of custom validation and application-specific environments, many of them will be processing well below 5 TPS. So the tradeoff between confirmation time and cost starts to become very important. And in fact, the “L3” paradigm does solve this problem! ZK rollup in ZK rollup, even a simple implementation, has a fixed cost of only about 8,000 layer-1 gas (500 bytes for proofs). This changes the above table to:

The problem is basically solved, so are L3s good? Maybe. But it should be noted that there is another way to solve this problem inspired by ERC 4337 aggregate verification.

The strategy is as follows. Today, if each ZK rollup or validium receives a proof, it proves that S _new = STF(S _old ,D) : the new state root must be the result of correctly processing transaction data or state increments on top of the old state root. In this new scheme, a ZK rollup will accept a message from a batch validator contract saying that it has verified a proof of a batch of statements, where each statement is of the form S _new = STF(S _old ,D). This batch proof can be constructed via a recursive SNARK scheme or Halo aggregation.

This will be an open protocol: any ZK-rollup can join, and any batch prover can aggregate proofs from any compatible ZK-rollup and get compensated for transaction fees from the aggregator. The batcher contract will verify the proof once, then pass a message to each rollup with the ( S _old , S _new , D) triple of that rollup. The fact that the triple came from the batcher contract is used as evidence to prove that the transition is valid.

If optimized properly, the cost per rollup in this scheme could be closer to 8000, with 5000 for the state write to add the new update, 1280 for the old and new roots, and an additional 1720 for miscellaneous data processing. So it would give us the same savings. Starkware actually already has something similar, called SHARP, although it's not (yet) a permissionless, open protocol.

One response to this approach might be: but isn’t this really just another layer 3 solution? Instead of base layer <- rollup <- validium we’ll have base layer <- batch mechanism <- rollup or validium. From a certain philosophical architectural perspective this may be true. But there is an important distinction: the middle layer is not a complex full EVM system, but a simplified and highly specialized object, so it is more likely to be secure, it is more likely to be built without another specialized token, it is more likely to be minimally governed, and it is less likely to change over time.

Conclusion: What exactly is a “ Layer ”?

Three-layer scaling architectures consisting of stacking the same scaling scheme on top of itself generally don’t work well. Forms of rollups on top of rollups (where two layers of rollups use the same technology) also don’t work well. However, three-layer architectures where L2 and L3 have different purposes can work. Validiums on top of rollups do make sense, even if they are not certain to be the best way to do things long term.

However, once we start getting into the details of which architecture makes sense, we get into philosophical questions: what is a “layer” and what is not? base layer <- batch mechanism <- rollup or validium and base layer <- rollup <- rollup or validium do the same job, but the proof aggregation layer looks more like ERC-4337 than rollups in terms of how it works. Typically, we don’t call ERC-4337 a “layer 2”. Similarly, we don’t call Tornado Cash a “Layer 2”, so if we want to be consistent, we wouldn’t call a privacy-centric subsystem that sits on top of L2 an L3. As a result, there is an unresolved semantic debate about what should be called a “layer” in the first place.

There are many possible schools of thought in this regard. My personal preference is to restrict the term "Layer 2" to things that have the following properties:

1. Their purpose is to improve scalability

2. They follow the “blockchain within a blockchain” model: they have their own transaction processing mechanism and their own internal state

3. They inherit the full security of the Ethereum chain

So ideal rollups and ZK rollups are L2, but verification, proof aggregation schemes, ERC 4337, on-chain privacy systems, and Solidity are another matter. It might make sense to call some of them L3, but probably not all; in any case, it seems too early to nail down a definition, and the architecture of the multi-rollup ecosystem is far from set in stone, with most discussions taking place only in theory.

That said, the linguistic debate is less important than the technical question of which structure actually makes the most sense. Clearly, certain “layers” that serve non-scaling needs like privacy have an important role to play, and the important function of proof aggregation needs to be filled in some way, preferably through an open protocol. But at the same time, there are good technical reasons to keep the middle layer that links the user-facing environment and L1 as simple as possible; in many cases, acting as a “glue layer” for EVM rollups may not be the right approach. My guess is that as the L2 scaling ecosystem matures, the more complex (and simpler) structures described in this article will start to play a larger role.

<<: Altcoins will eventually fail

>>: ENS from multiple perspectives: the key to the Web3 era

DTCC CEO pledges blockchain experiment in New York

Why is it difficult for people with scars on the tip of their nose to accumulate wealth and why is it easy for them to lose money?

Blog

What kind of face makes people rich in their later years? What does a person who lives a wealthy life in his later years look like?

Blog

Palmistry diagram: What kind of palm is the most attractive woman

Blog

Is the career luck good for men with phoenix eye pattern on their left hand?

Blog

Bitcoin enriches BTR coins, a new virtual currency in the Monero series. Both CPU servers and graphics card mining machines can mine, and CPU servers are the main force!

Total amount: 20 million max coins Anonymous: Bas...

Vitalik: What kind of Layer 3 makes sense?

Why can’t you keep scaling by stacking rollups on top of rollups ?

So what is the “sane” version of layer 3?

In the subset tree of L2, do deposits and withdrawals become cheaper and easier?

Conclusion: What exactly is a “ Layer ”?

DTCC CEO pledges blockchain experiment in New York

Illustration of noble woman's face

What is the face of a woman with a mole on the tip of her nose?

How to know your love destiny through palmistry

9 signs of wealth in palmistry

What is "Triangle Eyes"

Why is it difficult for people with scars on the tip of their nose to accumulate wealth and why is it easy for them to lose money?

What kind of face makes people rich in their later years? What does a person who lives a wealthy life in his later years look like?

Palmistry diagram: What kind of palm is the most attractive woman

Is the career luck good for men with phoenix eye pattern on their left hand?

Recommend

Do you have the best body shape for wealth?

Judging from your appearance whether you are loyal

Bitcoin miners' mining revenue increased 7% in July

The most lovable woman

Do people with broken palms have a smooth marriage? What is a broken palm?

An article reviews the trends of Bitcoin data in Q1 2020

What does a charming woman look like? A woman with double eyelids and watery eyes!

Is it good for a person to have a groove on his chin? What does it mean?

Bitcoin enriches BTR coins, a new virtual currency in the Monero series. Both CPU servers and graphics card mining machines can mine, and CPU servers are the main force!

How to read the love luck in palmistry

Australian Bitcoin exchanges were raided by banks, and more than 10 platform accounts were closed

Palmistry tells you whether your basic fortune in life is good or bad

The face that is easy to be eaten by old people

Merkle Tree inventor hopes DAO will become the world's dominant management model

Cables were laid under the fish pond to steal electricity. Police used drones to locate the suspect who mined Bitcoin