Opinion: Ethereum shouldn’t risk switching to PoS for scalability

Opinion: Ethereum shouldn’t risk switching to PoS for scalability


Original title: "Viewpoint | Should Ethereum's roadmap change?"
Written by: Ajian

Since the publication of "A Rollup-centric Ethereum roadmap" (Chinese translation), the entire community has questioned the roadmap of Ethereum (especially Ethereum 2.0).

On November 18, 2020, in the fifth AMA event of the Ethereum Foundation's Eth2.0 research team (Chinese excerpt translation), Vitalik clearly stated that the roadmap has changed: (1) The importance of Phase 2 is no longer emphasized for the time being. Phase 1 is committed to realizing data sharding for use in the Rollup method; (2) The beacon chain will have execution functions, that is, after the Eth1-Eth2 merger, the beacon chain block will directly include transactions; (3) The three major tasks after the implementation of Phase 0: light client support, data sharding, and merging will be promoted in parallel, and any module will be launched as soon as it is ready.

The purpose of this article is not to defend the original three-phase roadmap. Instead, this article wants to argue that the three-phase roadmap is illusory, the new roadmap is boring, and no roadmap related to Eth2.0 is worth abandoning Ethereum's current operating model and moving to a PoS-based system.

Here, I will first explain the argumentation ideas and technical difficulties of the initial three-phase roadmap; then analyze the scalability of the new roadmap. Finally, I will argue that the scalability advantage of the new roadmap is too small to make Ethereum risk switching to PoS.

Eth2.0’s three-phase roadmap

Over the past two years, the widely circulated Eth2.0 roadmap has planned three components to be implemented in sequence:

  • Phase 0: Beacon Chain with PoS consensus mechanism

  • Phase 1: Multiple Shard Chains

  • Phase 2: Add execution capabilities to all shards

It can be clearly seen from this roadmap that the original goal of Ethereum 2.0 is to create a "sharded execution" system, which means that each shard has its own state, which changes according to the state transition rules of each shard; the changed state is finalized by the beacon chain; thus, Ethereum 2.0 becomes a system where multiple shards can process transactions in parallel. This also means that Ethereum 2.0 is a system that decouples "consensus" and "transaction processing (verification)", and the validators assigned to each shard are responsible for verifying the correctness of transactions and states; but the finalization of these states depends on the epoch finalization mechanism of the beacon chain, and the two processes are not completely synchronized.

This "PoS beacon chain + multi-shard" architecture seems to make good use of the characteristics of the PoS algorithm itself: in order to solve the "nothing-at-stake" problem (PoS block generation does not require computational effort, so accounts eligible for block generation will try to generate blocks on different forks at the same time, causing the system to fall apart), the Casper algorithm used in Ethereum 2.0 requires users to deposit a portion of the deposit before they can be eligible to generate blocks. If the validator abuses the block generation qualification (for example, supporting two forks at the same time), the deposit will be confiscated. As a result, algorithms like Casper actually create two states on the blockchain that can communicate with each other but change independently: one is the state of ordinary users, and the other is the state of the validator's block weight. The consensus process is based on the block weight state, and reaching a consensus will also change the block weight state. Therefore, the consensus process is inherently independent of the verification of user transactions and can be decoupled. For any transaction batch and result state, the consensus process can be abstracted into a "finality finalization mechanism", and logically, multi-shard parallel execution becomes possible.

As for its scalability, the name of Ethereum sharding technology, "quadratic sharding", gives a clue: Assuming that the execution complexity of transactions on a shard can be reduced to the same difficulty as block header verification, the sharded execution architecture can increase the processing capacity of the entire system quadratically as the processing capacity of participating nodes increases linearly. In layman's terms, if the nodes participating in the network (on average) can verify 4 block headers in a period of time, this means that when participating in a shard, the nodes can verify 4 transactions in the same amount of time, and the total system processing capacity is 4 shards × 4 transactions/shard = 16 transactions; if the processing capacity of the node becomes 8 (2 times), the processing capacity will become 64 transactions (4 times).

It sounds great, but this "quadratic expansion" argument contains the following assumptions:

(1) There is a technology that can simplify the verification of shard transactions to the same difficulty as verifying block headers;

(2) There are no cross-shard transactions, that is, transactions within each shard are completely independent of each other. Cross-shard transactions require the processing capacity of multiple shards and the processing capacity of the beacon chain, which will greatly reduce scalability.

Regarding (1), this assumption can be satisfied. Statelessness is such a technology. Its idea is to provide a witness to the transaction access status when propagating transactions (or blocks), so that the transaction verifier can verify the validity of the transaction without holding the state data at the time of transaction execution. This is extremely critical. Without statelessness, the validators participating in shard verification must save the state of the shards, because the validators will be continuously assigned to different shard chains, which means that they must save the state of all shards. In practice, this means that they have to continuously download blocks from all shards and process transactions, thus collapsing the entire system into a large block system (for example: investing resources that can process 16 transactions to process 16 transactions). Unfortunately, to date, Ethereum 1.0 has not developed a sufficiently lightweight stateless method.

There is nothing much to say about (2). If cross-shard transactions cannot be implemented, a sharded execution system is meaningless because each shard is independent. ETH must be able to exist on each shard so that the system can still be based on ETH. To date, there has not been a cross-shard transaction solution that does not increase the processing capacity of the beacon chain. The reason is very simple. For any shard A, due to parallel processing, it is unknown what transactions are taking place on any shard B and whether the state of this shard needs to be rewritten. Therefore, there must be a communication layer to credibly prove that a transaction occurred on shard B that attempted to rewrite the state of shard A. Once the beacon chain needs to have the function of processing transactions, the quadratic expansion effect will be broken. (By the way, the chain that meets the needs of this trusted communication layer becomes the de facto Layer-1, while the other shards become the de facto Layer-2, which is very similar to "Layer-1 + Layer-2".)

In addition to questionable scalability, sharded execution also brings up a lot of interesting economic issues. For example, if the processing time of cross-shard transactions exceeds the processing time of a single intra-shard transaction (which is inevitable), this means that the value of ETH on different shards will not be the same. Just like $1 in the United States is not actually the same thing as $1 outside the United States. No matter how many shards there are, there will be at least two ETH prices, one is the price of ETH on the shard with the most financial applications (that is, Eth1 shard); the other is the price of ETH on other shards; the latter must pay a certain fee and spend a certain amount of time to exchange for the former, so there must be some discount to the former. Similarly, even if there is uniswap on each shard, the transaction slippage of the market on different shards will definitely be different, and everyone will eventually converge on one shard, because when everyone is together, the liquidity is the most abundant and the capital efficiency is the highest. To some extent, it can be argued that the need for cross-shard transactions is very small - but this also means that the idle transaction processing capacity on other shards is simply meaningless.

The technical difficulties of the sharded execution system will not be elaborated here. Those who are interested can think about how to pay the handling fee of the sharded execution system. But what I want to say here is that the design concept of the sharded execution system violates everyone's actual needs and the law of development of things. The global state (composability) is not a problem, but what everyone needs; it is precisely because Ethereum enables all financial applications to be combined instantly, creating a space where value can circulate with zero friction, that Ethereum has the potential to change the world; creating friction for the circulation of value at the protocol layer is self-defeating. When there is a good basic layer, we should find a way to maintain this basic layer, and let users choose the rest and let the ecosystem evolve by itself - don't think that design can design an ecosystem, over-design only imposes costs on everyone.

The suspension of sharding execution (Phase 2) indirectly confirms the difficulty involved - in the foreseeable future, this path will not produce satisfactory results. Despite this, I don’t think that Eth2.0 researchers have completely abandoned the three-phase roadmap. Vitalik also emphasized that the changed roadmap is fully compatible with Phase 2, but Phase 2 no longer has priority.

But in fact, giving up sharded execution is the path that Ethereum should choose.

Executable Beacon Link Line Diagram

The most striking point in the new roadmap of Ethereum 2.0 is that the beacon chain block will contain the transactions of the merged Eth1 shard, which means that the beacon chain has the execution function. Other shards only have the function of saving data.

In fact, the positioning of "data sharding" in the new roadmap is "data availability layer for Rollup".

Without execution-based sharding, quadratic expansion is out of the question. So, how scalable is this "PoS Layer-1 + Rollup + Rollup data does not occupy the main chain block space" architecture?

To answer this question, let’s first look at the interaction mode between the Rollup solution and the main chain.

First of all, you can understand a Rollup system as a stateless contract. The internal state of this contract (which user has how much money) is not visible to the outside world. However, the data of all transactions occurring within the contract will be made public on a regular basis and published on the main chain, so that any third party can reconstruct the internal state of the contract after obtaining this data.

The characteristics of Rollup that uses validity proof (such as ZK Rollup) are: each time the contract discloses transaction data, it is accompanied by a "computational integrity proof" that these transactions have been executed correctly and therefore the new state root should be XXX; if the proof can pass the verification of the contract, the contract updates the state root; if the proof cannot pass the verification, the contract refuses to update.

The Rollup scheme that uses fraud proofs (such as Optimistic Rollup) is the opposite: every time anyone discloses transaction data for a contract, they must deposit a deposit and assert that the new state root of the contract is YYY; within a period of time thereafter, anyone else can deposit a deposit and issue a fraud proof to challenge the assertion; the fraud proof proves that the batch of transactions is flawed, or that the new state root after the transaction is processed is not YYY; if the challenge is successful, the person who issued the incorrect assertion will lose the deposit; if no one challenges for a period of time, the contract updates its state root to YYY.

Both solutions must publish data on the chain, so they will occupy on-chain space; moreover, the size of the on-chain space determines the processing volume (TPS) of the Rollup system per unit time. If you think more deeply, if these transaction data can be published in a place with smaller data volume constraints, or in other words, it does not occupy the space of the Layer-1 block, then its processing volume can be multiplied. If there are many such places, it can also produce a multiplier effect.

This is the concept of "data sharding" and "Rollup-centric roadmap": let the Rollup solution put all transaction data into shard blocks. The number of shards can increase the processing capacity by many times; the current Ethereum block data volume is about 20-30 KB, which is obviously safe. If we have 64 shards, we can provide 64*30 = 1920 KB = 1.9 MB of data every 15 seconds. Moreover, I provide such a large data throughput on the user side, but it will not become a burden on the full node, because you can download these data if you want, and you don't have to download them if you don't want to (that is the meaning of "sharding"). Everyone downloads a little, and I download a little, and the burden on the node is still very light - anyway, verifying the status of these Rollup contracts does not require me to have all the historical transaction data of the Rollup. The status of Ethereum is still safe.

It sounds reasonable, but again, it’s too optimistic and involves too many assumptions:

(1) This "download if you want to, don't download if you don't want to" approach does not work at all on ZK Rollup: when ZK Rollup wants to update the state root, the verifier of the ZK Rollup contract update operation (that is, the full node of Layer-1) must also obtain the transaction data corresponding to the proof when accepting the proof, otherwise it will not pass the verification. (There is also a solution that does not require transaction data and only verifies the proof to advance the contract state root, called Validium, which is not Rollup). In other words, if only ZK Rollup is considered, then the "data sharding" method is no different from large blocks in terms of bandwidth. Regardless of who the data was originally sent to and where it was stored, the full node must download it.

(2) For Optimistic Rollup, if you are willing to take a more optimistic assumption, of course you can. You can not download transaction data at all in normal times, only keep the latest state root for finality, and only download relevant transaction data when a dispute occurs. From the perspective of the full node, it does not lose the ability to verify the contract state; but from the user's perspective, things are completely different: you begin to be uncertain whether you can reconstruct your state at any time to complete the withdrawal. In other words, users will not be sure whether they are using Optimistic Rollup or plasma. Originally, the Optimistic Rollup solution ensures that all full nodes have backups of historical transactions, so users can easily rebuild their own states and submit state proofs (or assertions) to complete withdrawals; but if this guarantee is lost, you are not sure whether you can rebuild the state. The security of Optimistic Rollup will also be affected: its security assumption is that at least one of the people who obtained the transaction data complies with the protocol; in the data sharding mode, you do not know how many people will request this part of the transaction data.

In summary, when the "data sharding" mode is paired with ZK Rollup, it cannot provide greater scalability in terms of bandwidth, but has the same effect as expanding the block space; when paired with Optimistic Rollup, its scalability advantage relative to large blocks is inversely proportional to the frequency of challenges; more seriously, it puts Optimistic Rollup at risk of degenerating into plasma (by definition, there is no Optimistic Rollup, and another name should be used to refer to this thing between Optimistic Rollup and plasma).

in conclusion

The Rollup solution is actually a phoenix that has learned from the bloody lessons of the development of Layer-2. Its greatest feature is that it provides full protection for the security of users' funds. Because anyone who obtains the transaction data can reconstruct the state, and the blockchain guarantees the perpetual data availability of these transaction data, the Rollup solution can provide the best user protection among the layer-2 solutions. Only with such a solution will users dare to really use it. Abandoning this benefit and designing the system based on the optimistic assumption of maximizing performance can only design things that users dare not use.

As long as you realize that Rollup is essentially a contract design pattern, the myth that "PoS + data sharding + Rollup can provide greater throughput" can be seen through at a glance - Rollup can provide the same scalability regardless of the consensus, and data sharding can provide more, but because of the introduction of other security assumptions, Rollup sacrifices security in exchange for throughput - the problem is that such contracts, which are less secure and more scalable than Rollup, have appeared before, and they cannot be designed on the pow chain, but no one uses them even if they are designed.

Since 2017, the Ethereum community has been struggling to explore secure scalability solutions for practical needs. Many people may have believed that "PoS + sharding" can provide strong scalability, but that is a "sharded execution system" with its own problems. The "executable beacon link" in front of us is nothing more than sacrificing the properties of the contract itself in exchange for throughput. To this day, there is no evidence that Ethereum should embrace PoS for scalability.

Ultimately, only performance improvements that meet user needs are truly meaningful performance improvements. If you do not start from the actual needs of users, but instead start from the assumption of technical aesthetics or maximizing performance, you can only design castles in the air. If possible, let users make their own decisions. Worrying too much at the protocol layer often adds friction.

Should Ethereum’s roadmap change? Of course, we should abandon these unrealistic fantasies and go back to ask what users need.


<<:  Bitcoin is aiming for $19,000, but mining machines seem to be insufficient

>>:  Xu Mingxing is out, and the "Bean Holder" is saved

Recommend

Judging career success or failure from the forehead

In physiognomy, the forehead is the location of t...

The relationship between teeth and destiny

The relationship between teeth and destiny Health...

What kind of palmistry makes men prone to heartbreak?

People can only make progress if they compare wit...

What does a mole on the neck mean?

If a person has a mole in the center of his neck,...

A face that is not surprised by anything

Although we are not surprised by things that are ...

The meaning of three moles forming a triangle

I think everyone has heard the saying that there ...

What is the saying about horizontal lines on the face being a bad person?

Many times, we can tell whether a person is a bad...

Study: Banking system consumes twice as much energy as Bitcoin each year

Amid continued concerns over Bitcoin’s energy con...

Analysis of women's triangular eyes

The so-called triangular eyes are when the upper e...

Miscellaneous Discussions on Physiognomy, Simple and Accurate (Part 2)

Far Away 1. The deceased person is a stranger: Ch...