IOSG: Why we are optimistic about zero-knowledge proof hardware acceleration

IOSG: Why we are optimistic about zero-knowledge proof hardware acceleration
GPU is superior to FPGA in terms of performance, while FPGA has an advantage in energy consumption.

Written by Bryan, IOSG Ventures

This article will mainly discuss the current status of ZKP as a capacity expansion solution, describe the main dimensions that need to be optimized in the process of generating proofs from a theoretical perspective, and go into the acceleration requirements of different capacity expansion solutions. Then, it will focus on hardware solutions and look forward to Moore's Law in the field of zk hardware acceleration. Finally, some opportunities and current situations in the field of hardware zk acceleration will be explained at the end of the article. First, there are three main dimensions that affect the speed of proof: the proof system, the scale of the circuit to be proved, and the optimization of algorithm hardware and software.

For proof systems, all algorithms that use elliptic curves (EC), that is, the mainstream zk-snark algorithms on the market, such as Groth 16 (Zcash), Galo2 (Scroll), Plonk (Aztec, Zksync), currently have a bottleneck of long time (high computing power requirements) in the process of generating polynomial commitments. For FRI-based algorithms, such as ZK-Stark, the polynomial commitment generation method is Hash Function, which does not involve EC, so it does not involve MSM operations.

The proof system is the foundation, and the size of the circuit to be proved is also one of the core hardware optimization requirements. The recently hotly discussed ZKEVM has different levels of compatibility with Ethereum, resulting in different levels of circuit complexity. For example, Zksync/Starkware built a virtual machine different from the native Ethereum, thereby bypassing some of the underlying codes of Ethereum that are not suitable for zk processing, reducing the complexity of the circuit. The circuits of zkevm such as Scroll/Hermez, which aims to be compatible from the bottom, will naturally be more complex. (A convenient analogy is that the complexity of the circuit can be understood as the number of seats on a bus. For example, on ordinary days, the number of passengers that need to be carried is less than 30. Some buses choose to seat 30 people. These buses are Zksync/StarkWare. There are also some days in a year with a particularly large number of passengers, and ordinary buses cannot accommodate them, so some buses are designed with more seats (Scroll). However, there may be fewer passengers on these days, which will result in many empty seats on weekdays.) Hardware acceleration is more urgent for circuits with more complex circuit designs, but this is more of a Specturm thing, and it is also beneficial to ZKEVM.

Requirements/focus of different proof system optimizations:

Basic:

When a thing to be proved is processed by a circuit (such as R1CS/QAP), a set of scalars and vectors will be obtained, which are then used to generate polynomials or other forms of algebraic forms such as inner product argument (groth16). This polynomial is still very long. If the proof is generated directly, both the proof size and the verification time will be very large. So we need to simplify this polynomial further. The optimization method here is called polynomial commitment, which can be understood as a special hash value of the polynomial. Algebra-based polynomial commitments include KZG, IPA, and DARK, which all use elliptic curves to generate commitments.

FRI uses Hash Function as the main way to generate commitments. The choice of polynomial commitments is mainly based on several points - security and performance. Security here is mainly considered in the setup phase. If the randomness used to generate the secret is public, such as FRI, then we say that this setup is transparent. If the randomness used to generate the secret is private and the Prover needs to destroy it after use, then this setup needs to be trusted. MPC is a means to solve the need for trust here, but in actual applications, it is found that this requires users to bear a certain cost.

The aforementioned FRI, which is relatively excellent in terms of security, is not ideal in terms of performance. At the same time, although the performance of pairing-friendly elliptic curves is relatively excellent, when recursion is added, there is also a considerable overhead because there are not many suitable curves.

Image source: https://hackernoon.com

Justin Drake on Polynomial commitment, Part 1

Industry status:

Currently, whether based on Plonk (matterlabs) or Ultra-Plonk (Scroll, PSE), their final polynomial commitment is based on KZG, so most of the work of Prover involves a large number of FFT calculations (generating polynomials) and ECC point multiplication MSM operations (generating polynomial commitments). In pure plonk mode, since the number of points that need to be committed is not large, the proportion of MSM operation in Prove time is not high, so optimizing FFT performance can bring greater performance improvement in the short term. However, under the UltraPlonk (halo2) framework, due to the introduction of customer gate, the number of committed points designed in the prover stage has increased, making the performance optimization of MSM operations also very important. (Currently, after the MSM operation is optimized by pippenger, log(P(logB)) is still required (B is the upper bound of exp, p is the number of points participating in MSM).

The new generation of Plonky2 proof system uses FRI, which is common in STARK system, instead of KZG. Therefore, the prover of Plonky2 no longer needs to consider MSM, so theoretically, the performance improvement of the system no longer depends on MSM-related algorithm optimization. Mir (currently Polygon Zero), the author of plonky2, is vigorously promoting the system. However, since the number field Goldilocks Field used by plonky2 is not particularly friendly to writing circuits related to elliptic-related hash algorithms (such as ECDSA), although Goldilocks Field has obvious advantages in machine word operations, it is still difficult to judge which of Mir and PSE/Scroll solutions is a better solution.

Based on comprehensive consideration of the Prove algorithms of Plonk, Ultraplonk, and Plonky2, the modules that require hardware acceleration will most likely be concentrated in the three directions of FFT, MSM, and HASH.

Another bottleneck of Prover is the generation of witnesses. Ordinary non-zk calculations usually omit a large number of intermediate variables, but in the ZK prove process, all witnesses need to be recorded and will participate in the subsequent FFT calculations. Therefore, how to efficiently parallelize witness calculations will also be a potential direction that prover miners need to consider.

Attempts to accelerate ZKP: recursive proof - StarkNet's fractal L3 concept is based on the concept of recursive proof, Zksync's fractal hyperscaling, and Scroll also has similar optimizations.

> The concept of Recursive zkSNARK is to prove the verification process of Proof A, thereby generating another Proof B. As long as the Verifier can accept B, it is equivalent to accepting A. Recursive SNARK can also aggregate multiple proofs together, such as aggregating the verification process of A1 A2 A3 A4 into B; Recursive SNARK can also decompose a long calculation process into several steps, and the calculation proof S1 of each step must be verified in the calculation proof of the next step, that is, calculate one step, verify one step, and then calculate the next step. This will allow the Verifier to only verify the last step and avoid the difficulty of constructing a large circuit of indefinite length.

Theoretically, all zkSNARKs support recursion. Some zkSNARK schemes can directly implement the Verifier using circuits, while other zkSNARKs need to split the Verifier algorithm into parts that are easy to circuitize and parts that are not easy to circuitize. The latter adopts a strategy of delayed aggregation verification and puts the verification process in the last step of the verification process.

In future applications of L2, the advantages of recursion can further reduce the cost and performance requirements through the induction of things with proofs.

The first case (application-agnostic) is for different things to be proved, such as a state update and a Merkle Tree. The proofs of these two things to be proved can be combined into one proof, but there are still two output results (public keys used for verification)

The second case (applicative recursion) is for the same type of things to be proved, for example, if both are state updates, then these two things can be aggregated before generating proofs, and there is only one output result, which is the state difference after two updates. (Zksync's method is similar, and user cost is only responsible for state difference)

In addition to recursive proof and the hardware acceleration discussed below, there are other ways to accelerate ZKP, such as custom gates, removing FFT (the theoretical basis of OlaVM), etc., but this article will not discuss them due to space constraints.

Hardware Acceleration

Hardware acceleration has always been a common way to speed up cryptographic proofs in cryptography, whether it is for RSA (the underlying mathematical logic of RSA is similar to that of elliptic curves and also involves a lot of complex large number operations) or the early GPU-based optimization method for zcash/filecoin's zk-snark.

Hardware Selection

After the Merge of Ethereum, there will inevitably be a large amount of redundant GPU computing power (partly affected by the change in Ethereum consensus, the stock price of GPU giant Nvidia has fallen by 50% since the beginning of the year, and inventory redundancy is also increasing). The figure below shows the transaction price of Nvidia's GPU flagship product RTX 3090, which also shows that the buyer's power is relatively weak.

With GPU prices at a low point and a large amount of GPU computing power idle, a natural question is whether GPU is the right hardware to accelerate ZK. There are three main options on the hardware side: GPU/FPGA/ASIC.

FPGA vs GPU

Let’s look at the summary first: The following is trapdoor-tech’s summary of GPU (taking Nvidia 3090 as an example) and FPGA (Xilinx VU9P as an example) in several dimensions. A very important point is that GPU is higher than FPGA in performance (speed of generating proofs), while FPGA has an advantage in energy consumption.

A more intuitive example is the specific operation result from Ingoyama:

Especially for operations with higher bit widths (2^20), GPUs are five times faster than FPGAs, while consuming much more power.

For ordinary miners, cost performance is also an important factor in determining which hardware to use. Both U55C ($4795) and VU9P ($8394) are much more expensive than GPUs (RTX 3090: $1860).

In theory, GPU is suitable for parallel computing, and FPGA pursues programmability. However, these advantages cannot be perfectly applied in the environment of zero-knowledge proof generation. For example, the parallel computing applicable to GPU is for large-scale graphics processing. Although it is logically similar to the processing method of MSM, the scope of application (floating number) is inconsistent with the specific finite field targeted by zkp. For FPGA, the application scenario of programmability in the existence of multiple L2s is not clear, because considering that the miner reward of L2 is linked to the demand undertaken by a single L2 (unlike pow), there may be a situation where the winner takes all in the segmented track, resulting in the possibility that miners need to frequently change algorithms.

ASIC is a solution that has a better performance and cost balance (including throughput, latency, etc.), but it is still unclear whether it is the best solution. The problems it has are:

Long development time - It takes a complete process from chip design to chip production. Even if the chip has been designed, chip production is a lengthy, expensive process with inconsistent yields. In terms of foundry resources, TSMC and Samsung are the best chip foundries. Currently, TSMC's orders are scheduled for two years later. The products competing with ZK chips for foundry resources are AI chips and electric vehicle chips, which have been proven to be in demand and have been designed by web2 early on. In comparison, the demand for ZK chips is not clear.

Secondly, the performance of the entire chip is negatively correlated with the size of a single chip, which is often referred to as 20nm and 18nm. That is to say, the smaller the single chip, the more chips can be accommodated on the wafer, that is, the higher the performance of the entire chip. The current technology for manufacturing high-end chips is monopolized (for example, the most complex part of chip manufacturing, lithography technology, is monopolized by ASML in the Netherlands). For some small and medium-sized foundries (such as SMIC in China), this technology lags behind the top by one or two generations, which means that it lags behind the best foundries in terms of yield and chip size. This will result in ZK chips only seeking some suboptimal solutions, and of course, choosing non-high-end chips around 28nm based on cost considerations when the demand side is not so clear.

The current ASIC solution mainly deals with the operators with high computing power requirements in the two common ZK circuits, FFT and MSM. It is not designed for a specific project, so the specific operation efficiency is not the highest in theory. For example, the logic circuit of Scroll's prover has not been 100% implemented, so there is naturally no hardware circuit that matches it one by one. In addition, ASIC is application-specific and does not support subsequent adjustments. When the logic circuit changes, such as the node client needs to be upgraded, whether there is a solution that can also be compatible is also uncertain at present.

At the same time, talent shortage is also an industry reality for ZK chips. It is not easy to find people who understand cryptography and hardware. Suitable candidates are those who have both deep mathematical attainments and many years of experience in hardware product design and maintenance.

Closing thoughts - prover Development TrendsEigenDA

The above are all the industry's thoughts and attempts to accelerate ZKP. The ultimate meaning is that the threshold for running prover will become lower and lower. Periodically speaking, prover needs to go through the following three stages:

Phase I: Cloud-based prover

Cloud-based provers can greatly increase the entry threshold for third-party provers (non-users/project parties), similar to AWS/Google Cloud in Web2. In terms of business model, the project party will lose part of the rewards, but from the perspective of decentralization, this is a way to attract more participants at the economic and execution levels. Cloud computing/cloud services are the existing technology stack of Web2, and there is a mature development environment for developers to use. It can also play the low threshold/high clustering effect unique to the cloud, which is an option for proof outsource in the short term. At present, Ingoyama also has implementations in this regard (the latest F1 version even reaches twice the benchmark speed of pipeMSM). However, this is still a way for a single prover to run the entire proof, while in Phase II, proof can exist in a divisible form, and the number of participants will be greater.

Phase II: Prover marketplace

The proof generation process involves different operations, some of which prefer efficiency, while others have requirements for cost/energy consumption. For example, MSM calculations involve pre-computation, which requires a certain amount of memory to support scalar particles on different pre-computations. If all scalars are stored on one computer, the memory requirements for the computer are high. If different scalars are stored on multiple servers, not only will the speed of this type of calculation be increased, but the number of participants will also increase.

Marketplace is a bold thinking in the business model for the above-mentioned outsourced computing. But in fact, there are precedents in the Crypto circle - Chainlink's oracle service, the price feed of different trading pairs on different chains also exists in the form of a marketplace. At the same time, Aleo's founder Howard Wu once co-authored a DIZK, which is a zero-knowledge proof generation methodology for distributed ledgers, which is theoretically feasible.

Having said that, this is a very interesting idea from a business model perspective, but there may be huge difficulties in implementation when it is actually implemented. For example, how to coordinate these operations to generate a complete proof, which at least needs to be no less than Phase I in terms of time and cost.

Phase III: Everyone runs prover

In the future, Prover will run locally on the user (on the web or on mobile). For example, Zprize has ZKP acceleration-related competitions and rewards based on the webassembly/android execution environment, which means that user privacy will be ensured to a certain extent (the current centralized prover is only for expansion and does not guarantee user privacy). The most important thing is that privacy here is not limited to on-chain behavior, but also includes off-chain behavior.

One issue that must be considered is the security of the web side. The web side execution environment has higher security prerequisites than hardware (an industry witness is a web wallet such as metmask, which is less secure than a hardware wallet).

In addition to the off-chain proof of on-chain data, uploading off-chain data to the chain in the form of ZKP while protecting user privacy 100% is only possible at this Phase. The current solutions inevitably face two problems - 1. Centralization, which means that user information is still at risk of being censored 2. The verifiable data is in a single form. Because the off-chain data is diverse and non-standardized, the verifiable data needs to go through a lot of cleaning/screening, and it is still in a single form. The challenge here is not even just the proof generation environment, whether the algorithm level is compatible (first of all, a transparent algorithm must be used), and cost/time/efficiency are all things that need to be considered. But the demand is also unparalleled. Imagine being able to mortgage real-life credit in a decentralized way to borrow and lend on the chain without the risk of being censored.

<<:  Iris Energy has stopped powering mining machines with over $100 million in loan collateral

>>:  Shen Bo, partner of Distributed Capital: $42 million worth of personal wallet assets were stolen

Recommend

Philtrum reveals character and destiny

Philtrum reveals character and destiny Long and d...

The appearance characteristics of a woman who is good for her husband

What are the facial features of a woman who can b...

Can a man with a hanging nose marry? What are men with hanging noses like?

In physiognomy, the nose belongs to the wealth pa...

Is it good or bad to have moles inside both ears?

We are very familiar with moles. Everyone has mol...

Tattoos are not allowed on these parts of the body

Tattoos are not allowed on these parts of the bod...

In what aspects do people with beauty moles have advantages?

Beauty mole is a more famous type of mole. In anc...

What does two love lines mean in palmistry?

The emotional problems in a person’s life can be ...

What does it mean to have eyebrows on the forehead?

The Yin Tang is located between our eyebrows. It ...

What are the facial features of a good wife?

All men in the world today hope to find a good wo...

What are the taboos of double eyelid surgery in physiognomy?

In today's society, many girls are increasing...

Mouth analysis of fate: a woman's mouth can tell her fate

In fact, some women can see from this that, in fa...