In-depth analysis of the next trillion-dollar track: the combination of zero-knowledge proof and distributed computing

If we want more real business scenarios to be connected to Web3, the combination of zero-knowledge proof and decentralized computing will bring opportunities.

Written by: X Research DAO

1. Development history and market prospects of distributed computing

1.1 Development History

At first, each computer could only perform one computing task. With the emergence of multi-core and multi-threaded CPUs, a single computer can perform multiple computing tasks.

As the business of large websites increases, it is difficult to expand the single server model, which also increases hardware costs. Service-oriented architecture appears, consisting of multiple servers. It consists of three parts: service registrar, service provider and service consumer.

However, with the increase of business and servers, the maintainability and scalability of point-to-point services in the SOA model become more difficult. Similar to the principle of microcomputers, the bus model appears to coordinate various service units. The service bus connects all systems together through a hub-like architecture. This component is called ESB (Enterprise Service Bus). As an intermediary role, it translates and coordinates service protocols of different formats or standards.

Subsequently, the REST model communication based on the application programming interface (API) stood out for its simplicity and higher composability. Each service outputs an interface in the form of REST. When a client makes a request through a RESTful API, it passes a representation of the resource status to the requester or terminal. The information or representation is transmitted via HTTP in one of the following formats: JSON (Javascript Object Notation), HTML, XLT, Python, PHP, or plain text. JSON is the most commonly used programming language. Although its name originally means "JavaScript Object Notation" in English, it is applicable to various languages and can be read by both humans and machines.

Virtual machines, container technology, and three papers from Google:

2003, GFS: The Google File System
2004, MapReduce: Simplified Data Processing on Large Clusters
2006, Bigtable: A Distributed Storage System for Structured Data

They are distributed file system, distributed computing, and distributed database, which opened the curtain of distributed system. Hadoop reproduces Google's paper, Spark is faster and easier to use, and Flink meets the needs of real-time computing.

However, all previous systems were distributed systems, not full-fledged peer-to-peer systems. In the Web3 field, the previous software architecture has been completely overturned. A series of problems such as the consistency of distributed systems, anti-fraud attacks, and anti-dust transaction attacks have brought challenges to the decentralized computing framework.

The smart contract public chain represented by Ethereum can be abstractly understood as a decentralized computing framework, but EVM is a virtual machine with a limited instruction set and cannot perform general computing required by Web2. Moreover, on-chain resources are extremely expensive. Despite this, Ethereum has also broken through the bottleneck of the peer-to-peer computing framework, such as peer-to-peer communication, consistency of computing results across the entire network, and data consistency.

1.2 Market Prospects

Readers have learned about the history of distributed computing above, but there are still many confusions. Let me list the potential questions that readers may have, as follows:

From the perspective of business needs, why is a decentralized computing network important? How big is the overall market size? What stage is it at now, and how much room is there in the future? What opportunities are worth paying attention to? How to make money?

1.2.1 Why is decentralized computing important?

In Ethereum's original vision, it was to become the world's computer. After the ICO explosion in 2017, everyone found that it was still mainly based on asset issuance. But in 2020, Defi summer appeared, and a large number of Dapps began to emerge. With the explosion of on-chain data, EVM is becoming increasingly powerless in the face of more and more complex business scenarios. Off-chain expansion is needed to realize functions that EVM cannot achieve. Roles such as oracles are decentralized computing to some extent.

Let's think about it from an expanded perspective. Dapps are growing rapidly, and the amount of data is also growing explosively. The value of this data requires more complex algorithms to calculate and tap its commercial value. The value of data is reflected and generated by calculation. These are all things that most smart contract platforms cannot achieve.

Dapp development has gone from just completing the process of 0 to 1. Now it needs more powerful underlying infrastructure to support it to complete more complex business scenarios. The entire Web3 has passed the stage of developing toy applications and will face more complex logic and business scenarios in the future.

1.2.2 What is the overall market size?

How do we estimate the market size? By estimating the size of distributed computing businesses in the Web2 field? Multiplying it by the penetration rate of the web3 market? Adding up the valuations of the corresponding financing projects currently on the market?

We cannot simply copy the market size of Web2’s distributed computing to Web3 for the following reasons: 1. Distributed computing in the Web2 field meets most of the needs, while decentralized computing in the Web3 field meets market needs in a differentiated way. If we simply copy it, it will go against the objective market background. 2. The market business scope of decentralized computing in the Web3 field will be global in the future. Therefore, we need to estimate the market size more rigorously.

The overall budget for potential tracks in the Web3 field is calculated based on the following points:

The valuation of other projects in the industry that can be included in the track is used as the benchmark market value. According to the data on the coinmarketcap website, the market value of distributed computing projects that have been circulated in the market is 6.7 billion US dollars.
The revenue model comes from the design of the token economic model. For example, the currently more popular token revenue model is that the token is used as a means of paying transaction fees. Therefore, the transaction fee income can indirectly reflect the prosperity of the ecosystem and the degree of transaction activity. It is ultimately used as a criterion for valuation judgment. Of course, there are other mature models for tokens, such as mortgage mining, trading pairs, or anchor assets of algorithmic stablecoins. Therefore, the valuation model of Web3 projects is different from the traditional stock market and is more like a national currency. There are various scenarios in which tokens can be adopted. Therefore, specific analysis is required for specific projects. We can try to explore how the token model should be designed in the Web3 decentralized computing scenario. First, let's assume that we design a decentralized computing framework. What challenges will we encounter? a). Because the fully decentralized network completes the execution of computing tasks in such an untrusted environment, it is necessary to incentivize resource providers to ensure online rates and service quality. In terms of game mechanism, it is necessary to ensure that the incentive mechanism is reasonable, and how to prevent attackers from launching fraud attacks, witch attacks, and other attack methods. Therefore, tokens are needed as a means of staking to participate in the POS consensus network to first ensure the consensus consistency of all nodes. For resource contributors, the workload they contribute is needed to implement a certain incentive mechanism. Token incentives must have a positive cycle of growth for business growth and network efficiency improvement. b). Compared with other layer1s, the network itself will also generate a large number of transactions. Facing a large number of dust transactions, paying a handling fee for each transaction is a token model that has been verified by the market. c). If the token is only used for practical purposes, it is difficult to further expand the market value. If it is used as an anchor asset for an asset portfolio, a few layers of asset nesting can be performed to greatly expand the effect of financialization. Overall valuation = pledge rate * Gas consumption rate * (the inverse of the circulation volume) * single price

1.2.3 What stage are we at now and how much room is there for the future?

From 2017 to now, many teams have been trying to develop in the direction of decentralized computing, but all have failed. The reasons for the failure will be explained in detail later. The exploration path has evolved from projects similar to alien exploration plans, to models that imitate traditional cloud computing, and then to the exploration of Web3 native models.

The current status of the entire track is that the breakthrough from 0 to 1 has been verified at the academic level, and some large projects have made great progress in engineering practice. For example, the current zkRollup and zkEVM implementations are both in the stage of just releasing products.

There is still a lot of room for improvement in the future, for the following reasons: 1. The efficiency of verification calculations needs to be improved. 2. More instruction sets need to be added. 3. Optimization of truly different business scenarios. 4. Business scenarios that could not be achieved with smart contracts in the past can be achieved through decentralized computing.

We will use a specific case to explain a completely decentralized game. Currently, most Gamefis require a centralized service as the backend. The backend is responsible for managing the player's status data and some business logic. The frontend of the application is responsible for user interaction logic and event triggering and then passing it to the backend. Currently, there is no complete solution on the market that can support Gamefi's business scenarios. However, a verifiable decentralized computing protocol has emerged, and the backend is replaced by zkvm. This can truly realize decentralized games. The frontend sends the user event logic to zkvm to execute the relevant business logic. After verification, the status is recorded in the decentralized database.

Of course, this is just one application scenario, and Web2 has many business scenarios that require computing capabilities.

1.2.4 What opportunities are worth paying attention to? How to make money?

2. Decentralized distributed computing attempts

2.1 Cloud Service Model

Currently Ethereum has the following problems:

The overall throughput is low. It consumes a lot of computing power, but the throughput is only equivalent to that of a smartphone.
Low verification enthusiasm. This problem is called Verifier's Dilemma. Nodes that obtain packaging rights are rewarded, and other nodes need to verify, but they do not receive rewards, so the verification enthusiasm is low. Over time, the calculation may not be verified, which poses a risk to the security of on-chain data.
The amount of computation is limited (gasLimit) and the computational cost is high.

Some teams have tried to adopt the cloud computing model widely adopted by Web2. Users pay a certain fee, and the fee is calculated according to the time the computing resources are used. The fundamental reason for adopting this model is that it is impossible to verify whether the computing task is executed correctly, and it can only be verified through detectable time parameters or other controllable parameters.

Ultimately, this model was not widely used because it did not take human nature into account. A large amount of resources were used for mining to maximize profits, resulting in fewer resources that could actually be utilized. This is the result of each role in the game system seeking to maximize profits.

The final result is completely contrary to the original intention.

2.2 Challenger Mode

TrueBit uses a game system to achieve the global optimal solution to ensure that the computing tasks issued are executed correctly.

The key points of our fast calculation framework are:

1. Roles: Problem Solver, Challenger, and Judge

2. Problem solvers need to pledge funds before they can participate in computing tasks

3. As a bounty hunter, the challenger needs to repeatedly verify whether the calculation results of the problem solver are consistent with his local results.

4. The challenger will extract the most recent computing task that is consistent with the computing status of both parties. If there is a divergence point, submit the Merkle tree hash value of the divergence point.

5. Finally, the judge will decide whether the challenge is successful

However, this model has the following shortcomings:

1. Challengers can submit at a later time, and only need to complete the submission task. This results in a lack of timeliness.

2.3 Using Zero-Knowledge Proofs to Verify Calculations

So how to achieve it, to ensure that the calculation process can be verified, and to ensure the timeliness of the verification.

For example, in the implementation of zkEVM, a verifiable zkProof needs to be submitted at each block time. This zkProof contains the bytecode generated by the logical computing business code, and then the circuit code is generated by the bytecode execution. This ensures that the computing business logic is correctly executed, and the timeliness of verification is guaranteed through a short and fixed time.

Although zkEVM is only for the scenario of smart contract execution, its essence is still under the computing business framework. If we extend the EVM logic to other general types of virtual machines, such as WASM virtual machine, or the more general LLVM high-performance virtual machine. Of course, there will be many challenges in the specific implementation of engineering practice, but it gives us more room for exploration.

Under the assumption that there is enough high-performance zero-knowledge proof acceleration hardware and sufficiently optimized zero-knowledge proof algorithms, general computing scenarios can be fully developed. A large number of computing businesses in Web2 scenarios can be reproduced by zero-knowledge proof general virtual machines. Just like the profitable business direction mentioned above.

3. Combination of zero-knowledge proof and distributed computing

3.1 Academic level

Let’s look back at the historical development of zero-knowledge proof algorithms.

GMR85 is the earliest algorithm, which comes from the paper published by Goldwasser, Micali and Rackoff: The Knowledge Complexity of Interactive Proof Systems (GMR85), which was proposed in 1985 and published in 1989. This paper mainly explains how much knowledge needs to be exchanged in an interactive system after K rounds of interaction to prove that a statement is correct.
Yao's Garbled Circuit (GC) [89]. A well-known two-party secure computation protocol based on oblivious transfer, which can evaluate any function. The central idea of GC is to decompose the computation circuit (we can use AND, OR, and NOT circuits to perform any arithmetic operation) into a generation phase and an evaluation phase. Each party is responsible for one phase, and in each phase the circuit is encrypted so that no party can obtain information from the other party, but they can still obtain the result based on the circuit. The GC consists of an oblivious transfer protocol and a block cipher. The complexity of the circuit grows at least linearly with the size of the input. After the publication of the GC, Goldreich-Micali-Wigderson (GMW) [91] extended the GC to multiple parties to resist malicious adversaries.
The sigma protocol is also known as the (special) zero-knowledge proof of honest verifier. That is, it assumes that the verifier is honest. This example is similar to the Schnorr identity authentication protocol, except that the latter usually adopts a non-interactive approach.
Pinocchio (PGHR13) in 2013: Pinocchio: Nearly Practical Verifiable Computation, which reduces the time required for proof and verification to a practical level, is also the base protocol used by Zcash.
Groth16 in 2016: On the Size of Pairing-based Non-interactive Arguments simplifies the size of proofs and improves verification efficiency. It is currently the most widely used ZK basic algorithm.
In 2017, Bulletproofs (BBBPWM17) Bulletproofs: Short Proofs for Confidential Transactions and More proposed the Bulletproof algorithm, a very short non-interactive zero-knowledge proof that does not require a trusted setup. It was applied to Monero 6 months later, which was a very fast combination of theory to application.
In 2018, zk-STARKs (BBHR18) Scalable, transparent, and post-quantum secure computational integrity proposed the ZK-STARK algorithm protocol that does not require trusted setup. This is another eye-catching direction of ZK development. StarkWare, the most important ZK project, was born on this basis.
The characteristics of Bulletproofs are:

1) Short NIZK without trusted setup

2) Building on Pedersen commitment

3) Support proof aggregation

4) Prover time is: O ( N ⋅ log ⁡ ( N ) ) O(N\cdot \log(N))O(N⋅log(N)), about 30 seconds

5) Verifier time: O ( N ) O(N)O(N), about 1 second

6) The proof size is: O ( log ⁡ ( N ) ) O(\log(N))O(log(N)), about 1.3KB

7) Based on the security assumption of: discrete log

Bulletproofs are suitable for:

1) Range proofs (only about 600 bytes)

2) Inner product proofs

3) Intermediary checks in MPC protocols

4) aggregated and distributed (with many private inputs) proofs

9. Halo2 main features are:

1) No trusted setup is required, and the accumulation scheme is efficiently combined with PLONKish arithmetization.

2) Based on IPA commitment scheme.

3) A prosperous developer ecosystem.

4) Prover time is: O ( N ∗ log ⁡ N ) O(N*\log N)O(N∗logN).

5) Verifier time is: O (1) > O(1)>O(1)>Groth16.

6) Proof size is: O ( log ⁡ N ) O(\log N)O(logN).

7) Based on the security assumption of discrete log.

Halo2 is suitable for the following scenarios:

1) Any verifiable computation

2) Recursive proof composition

3) Circuit-optimized hashing based on the lookup-based Sinsemilla function

Halo2 is not suitable for the following scenarios:

1) Unless the KZG version of Halo2 is used instead, the verification cost on Ethereum is high.

10. The main features of Plonky2 are:

1) Combine FRI with PLONK without trusted setup.

2) Optimized for processors with SIMD and using 64 byte Goldilocks fields.

3) Prover time is: O ( log ⁡ N ) O(\log N)O(logN).

4) Verifier time is: O ( log ⁡ N ) O(\log N)O(logN).

5) Proof size is: O ( N ∗ log ⁡ N ) O(N*\log N)O(N∗logN).

6) Based on the security assumption: collision-resistant hash function.

Plonky2 is suitable for the following scenarios:

1) Any verifiable computation.

2) Recursive proof composition.

3) Use custom gates for circuit optimization.

Plonky2 is not suitable for the following scenarios:

1) Due to its non-native arithmetic limitation, it is not suitable for statements involving elliptic curve operations.

Currently, Halo2 has become the mainstream algorithm used by zkvm, supporting recursive proofs and verifying any type of calculations, laying the foundation for general computing scenarios for zero-knowledge proof type virtual machines.

3.2 Engineering practice

Since zero-knowledge proof has made rapid progress at the academic level, what is the current progress when it comes to actual development?

We observe from multiple levels:

Programming language: Currently, there are specialized programming languages that help developers not need to have an in-depth understanding of how to design circuit codes, which can lower the development threshold. Of course, there is also support for translating Solidity into circuit codes. Developer friendliness is getting higher and higher.
Virtual Machine: There are currently many implementations of zkvm. The first is a self-designed programming language, which is compiled into circuit code through its own compiler and finally generates zkproof. The second is to support the solidity programming language, which is compiled into target bytecode through LLVM, and finally translated into circuit code and zkproof. The third is true EVM equivalent compatibility, which eventually translates the execution operation of the bytecode into circuit code and zkproof. Is this the end game of zkvm? No, whether it is expanding to general computing scenarios beyond smart contract programming, or the completion and optimization of zkvm's underlying instruction set for different solutions, it is still at the stage of 1 to N. There is a long way to go, and a lot of engineering work needs to be optimized and implemented. Each company has achieved implementation from the academic level to engineering implementation. Who can finally become the king and carve out a bloody path? Not only does it need to make significant progress in performance improvement, but it also needs to attract a large number of developers to enter the ecosystem. Timing is a very important prerequisite. Launching it to the market first, attracting capital accumulation, and the spontaneous emergence of applications within the ecosystem are all factors for success.
Peripheral supporting tools and facilities: editor plug-in support, unit test plug-in, Debug debugging tools, etc., to help developers develop zero-knowledge proof applications more efficiently.
Infrastructure for zero-knowledge proof acceleration: Because FFT and MSM take up a lot of computing time in the entire zero-knowledge proof algorithm, they can be executed in parallel using parallel computing devices such as GPU/FPGA to achieve the effect of compressing time overhead.
Implementation in different programming languages: for example, using a more efficient or better performing programming language: Rust.
Star projects have emerged: zkSync, Starkware and other high-quality projects have announced the time of their official product releases. This shows that the combination of zero-knowledge proof and decentralized computing is no longer just a theory, but is gradually maturing in engineering practice.

4. Bottlenecks encountered and how to solve them

4.1 zkProof generation efficiency is low

Earlier we talked about the market capacity, current industry development, and actual technological progress, but aren’t there any challenges?

We disassemble the entire zkProof generation process:

In the stage of compiling the logic circuit into digital R1CS, 80% of the computation is done on NTT and MSM computing services. In addition, hash algorithms are performed on different levels of the logic circuit. As the number of levels increases, the time overhead of the hash algorithm increases linearly. Of course, the industry has now proposed the GKR algorithm, which reduces the time overhead by 200 times.

However, the computational time overhead of NTT and MSM is still high. If we want to reduce the waiting time for users and improve the user experience, we must accelerate the mathematical implementation, software architecture optimization, GPU/FPGA/ASIC, etc.

The following figure shows the test of proof generation time and verification time of each zkSnark family algorithm:

Since we can see flaws and challenges, it also means that there are opportunities hidden in them:

Design chips for specific zkSnark algorithm acceleration or general zkSnark algorithm acceleration. Compared with other types of encryption algorithms, zkSnark generates more temporary files and has requirements for the device's memory and video memory. Chip startup projects also face a large amount of capital investment, and there is no guarantee that they can be successfully taped out in the end. However, once successful, its technical barriers and IP protection will be moats. Chip project startups must have channels with sufficient bargaining power to get the lowest cost. And ensure overall quality control.
Graphics card accelerated SaaS services use graphics cards for acceleration, which is less expensive than ASIC design and has a shorter development cycle. However, software innovation will eventually be eliminated by hardware acceleration in the long run.

4.2 Large hardware resource usage

At present, we have been in contact with some zkRollup projects, and finally found that large memory and large video memory graphics cards are more suitable for them to use for software acceleration. For example, in Filecoin mining, a large number of idle data packaging machines have become the target devices of the current popular zkRollup project. In Filecoin mining, in the C2 stage, the generated circuit code file needs to be generated and cached in the memory. If the business code logic is very complex, the corresponding generated circuit code scale will also be very large, and the final presentation form is a large temporary file. In particular, it involves hashing the circuit code, which requires AMD CPU instructions to accelerate. Because of the high-speed exchange between the CPU and the memory, the efficiency is very high. This also involves NVME solid-state drives, which will accelerate the zkSnark operation. Above, we have discussed the possibility of possible acceleration and found that the resource requirements are still very high.

In the future, if we want to popularize zkSnark applications on a large scale, it is imperative to optimize at different levels.

4.3 Gas Consumption Cost

We have observed that all zkRollup Layer2s need to pass zkProof to layer1 for verification and storage. Because the resources on the ETH chain are very expensive, if it is widely popularized, a large amount of Gas will need to be paid for ETH. Ultimately, users will bear this cost, which is contrary to the original intention of technological development.

Therefore, many zkp projects have proposed data validation layers and the use of recursive proofs to compress submitted zkProofs, all of which are aimed at reducing Gas costs.

4.4 Missing instructions for the virtual machine

Currently, most zkvm platforms are oriented towards smart contract programming. If more general computing scenarios are required, a lot of work needs to be done to complete the underlying zkvm instruction set. For example, the underlying zkvm virtual machine supports libc instructions, matrix operation instructions, and other more complex computing instructions.

5. Conclusion

Because smart contract platforms are more asset-oriented, if we want more real business scenarios to be connected to Web3, the combination of zero-knowledge proof and decentralized computing brings opportunities. Our corresponding zero-knowledge proof will become the mainstream track and no longer a technology in a niche field.

<<: Uniswap NFT market launch countdown, its Github revealed these details

>>: Web3 security incidents in November: 36 in total, with losses of $590 million