Digging deeper into the cost of ASIC chip design, we asked the ProgPow core development team nine questions

Digging deeper into the cost of ASIC chip design, we asked the ProgPow core development team nine questions

Overview

As far as ProgPow and Ethash are concerned, there are various speculations on the cost of designing and developing mining hardware, usually followed by an authoritative statement: trust the author of the prediction, because he/she has extensive experience in the relevant industry. Sometimes these speculations are related to cryptocurrency ASIC chip production, and sometimes they are about integrated circuit design.

For readers who are more familiar with code but less familiar with fan-out and rise-times, this article may help them gain a deeper understanding of the ProgPow algorithm.

(Note from Planet o-daily: Ethash is the current Ethereum mining consensus algorithm based on proof of work, and ProgPow is a mining algorithm that attempts to weaken the advantages of ASIC mining machines. Fan-out is a professional term that defines the maximum amount of digital signal input that a single logic gate can drive. Most TTL logic gates can provide signals to 10 other digital gates or drivers, so a typical TTL logic gate has 10 fan-out signals; rise time is a professional term in pulse technology, and the time interval between two moments of voltage rise is the rise time of the network transformer.)

Programmers always give people a sense of omnipotence, from writing scripts to developing iPhone apps, from embedded systems to Windows operating systems. However, being able to write code and develop applications does not mean that you can become an authority on the backend of the APP Store (or improve system efficiency), and being able to develop a real-time multitasking operating system (RTOS) does not mean that you can become an expert in the field of cost trade-offs for expanding the Windows operating system.

Of course, as the core development team of the ProgPow algorithm, IfDefElse is not saying that Windows designers are not "excellent programmers", but it must be pointed out that due to the different technical backgrounds of different people, it is easy to cause deviations in understanding and assumptions in different fields, especially when discussing the topic of economies of scale.

Similarly, a hardware designer may also dabble in different fields, such as designing a chip for an electric toothbrush or building a chip architecture for network equipment (silicon architect). An engineer who produces 100,000 electric toothbrush chips may not understand the available economies of scale considered by a network engineer who produces 1 million chips. Similarly, a cryptocurrency ASIC chip designer may know little about GPU-ASIC chip design - these industries are not very connected to each other, and some are even different from country to country.

The last point we have to mention in this overview is that programming and engineering are actually skills. Unless you are programming and writing code every day, you will soon fall behind and fail to become an authority because knowledge in this area is updated and iterated very quickly. Perhaps this is why it is difficult for new cryptocurrency ASIC manufacturers to enter the mining market based on the SHA-256 algorithm. After all, it is unlikely for a novice programmer to catch up with an engineer who has been studying the SHA-256 algorithm for six years.

On the other hand, there aren’t many articles about hardware in the cryptocurrency ecosystem. Of course, cryptocurrency is a software-driven industry, and most hardware engineering is done behind closed doors in private companies.

Some “hardware experts” are doing their best to assure software engineers that they can beat the cryptocurrency ecosystem - we have already seen this happen with cryptocurrencies such as Monero, Bitcoin, and ZCash. But the reality is that this challenge has not happened yet. Think about it, if Bitmain or Innosilicon tried to make CPUs, do you think they could beat Intel and AMD?

Analyzing the cost of ASIC chip design

Economies of scale are always prevalent - whether from a cost perspective or an experience perspective. There seems to be a lot of debate among chip designers about the cost of ASIC chip design. Let Planetary Master (WeChat: o-daily) take you to look at and analyze nine issues that have attracted attention in the industry:

Question 1: Regardless of whether the mining algorithm is ProgPow or ETHash, the hash value is determined by the storage bandwidth of the external dynamic random access memory (DRAM), is that right?

This is not the case. The hash value of ProgPow is determined by two factors:

1. Computing core

2. Memory Bandwidth

This is why there is a difference between Ethash and ProgPow, as shown in Figures 1 and 2 below:

Figure 1: Comparison of mining hash rates of Nvidia chip products

Figure 2: Comparison of mining hash rates of AMD chip products

At this stage, ETHash mining is more profitable, and the memory requirements for this algorithm have increased significantly. The growing demand for high-bandwidth memory has also prompted the development of next-generation high-speed memory technologies, such as GDDR6 (bandwidth speed reaches 768 GB/s) and HMB2 (bandwidth speed reaches 256 GB/s).

The demand for high-bandwidth memory does not all come from "Ethash". The entire high-bandwidth memory market is as large as $15 billion, of which only a small part comes from the mining industry. The core market demand for high-bandwidth memory mainly includes: GPU, field-programmable gate array (FPGA), artificial intelligence (AI), high-performance computing (HPC), and games. Compared with the $1.2 trillion artificial intelligence market, the $30 billion PC game market, the $35 billion handheld game console market, and the $29 billion high-performance computing market, the mining industry's demand for high-bandwidth memory is really "insignificant."

Question 2: Since ProgPow’s existing architecture and algorithm are similar to ETHash, will Innosilicon’s next ASIC chip be tailored for ProgPow?

In fact, the only similarity between ProgPow and ETHash is the use of a DAG in global memory. From a computational perspective, ETHash only requires a fixed "keccak_f1600" kernel and a modulo function. On the other hand, ProgPow requires the ability to execute 16-channel wide random math sequences while being able to access a high-bandwidth L1 cache. Designing a computational kernel that can execute ProgPow math sequences is much more difficult than designing one that can implement a fixed function hash like "keccak".

It is also important to note that the hashrate of ETHash depends only on memory bandwidth, while the ProgPow algorithm depends on both memory bandwidth and the core calculation of a random mathematical sequence - this is very important to understand.

The essence of proof of work (PoW) is actually to prove the mathematical calculation by consuming hardware and energy costs. As an algorithm, ETHash does not consume most of the hardware costs (computing engine) in mathematical proof. Instead, ETHash only captures the memory interface, which is why you can use an ASIC chip for cryptocurrency mining to eliminate the part of the mathematical calculation that is not captured.

Question 3: Since GPU is a general-purpose acceleration chip, the cycle of designing, manufacturing and testing GPU usually takes about twelve months, and a lot of hardware simulation and software development work is also required to enable it to cover different computing solutions and scenarios.

ProgPoW hopes to capture the full hardware cost (as much as possible), and since the updated part of the algorithm can capture the computing hardware running different computing scenarios - right down to the architectural wrinkles - it may take more than 3-4 months for ASIC chip design.

Since the time span is long, another question arises: why are floating point operations omitted? The answer to this question is actually very simple: floating point operations cannot be ported across chips, and different chips often handle corner cases related to special values ​​(such as infimum, non-numeric values, and related variants) in different ways. Corner cases are also called pathological cases. They refer to problems or situations where the operating parameters are outside the normal range, and most of them are situations where several environmental variables or conditions are at extreme values, even if these extreme values ​​are still within the parameter specifications (or boundaries). The biggest disagreement lies in the handling of non-numeric values ​​(NaN), which occurs naturally when using random inputs. Quoting the explanation on the Wikipedia page:

If there are multiple Not-a-Number (NaN) inputs, the payload result should come from one of the NaN inputs, but the standard does not specify this.

This means that basically every floating point operation needs to be paired with an “if (is_special(val)) val = 0.0” check if floating point operations are to be used, which can often be done in hardware, so ASIC chips used for cryptocurrency mining would benefit from it.

Next, what are hashrate and “hash-per-watt”?

Hashrate is a measure of energy costs, and as long as everyone measures it the same way, the energy consumed per unit doesn't matter that much - miners will continue to invest as much energy as they can into mining. But even if you switch the unit of measurement from 1 ETHash (a smaller unit, such as a joule) to 1 ProgPow-hash (a larger unit, such as a calorie), the economics of operating costs will not actually change. Global Hashrate evaluates everyone's total economic weight in protecting the network's share, and as long as everyone's contribution is measured fairly and in the same units, switching to the ProgPow algorithm will not change much for the average miner.

Of course, some people will say that if Ethereum implements the ProgPow algorithm, it may help to concentrate miners in large mining farms with high-end GPUs, and it will also stimulate mining farms to upgrade their GPUs to the latest models. However, the ProgPow algorithm development team IfDefElse needs to reiterate that economies of scale will always exist and are an unavoidable fact in the real world.

Question 4: Compared to GPUs, ASIC chip manufacturers can use smaller GDDR6 memory to gain cost advantages. While maintaining the memory cost level, 16 GDDR6 4GB memory sticks can achieve twice the bandwidth advantage, right?

First, having twice the bandwidth advantage requires twice the computation, which is actually a linear scaling and cannot be considered an advantage.

Secondly, we are not ready to produce 4GB memory chips for GDDR6 yet. Micron, the world's third largest memory chip manufacturer, only produces 8GB chips, and Samsung produces 8GB and 16GB chips. For memory chips, the GDDR6 IO interface area is very expensive, and each generation of interface takes up more of the actual memory die compared to the memory cell, because the port physical layer (PHY) cannot be shrunk by process means like the memory cell.

It is undeniable that the real drivers of the memory market are some "long-cycle buyers", such as game consoles, GPUs, etc., who also tend to support larger memory. In fact, today's memory suppliers have no motivation to mass-produce a 4GB memory, after all, the market demand for this memory capacity is not large.

Question 5: There are many modules in the RTX2090 chip that occupy a large amount of chip die area and are useless for ProgPow, including PCIE, NVLINK, L2Cache, 3072 slicing units, 64 ROPs, 192 time measurement units (TMUs), etc. How do you view this issue?

RTX2080 is not a good reference for discussing this issue, because some modules in Nvidia's RTX series chips occupy most of the chip die area due to some new features, such as ray tracing cores. The ProgPow design is used with existing chip products in the Nvidia and AMD ecosystems, so it cannot use the new features in Nvidia and AMD's new chip products.

For a better analogy, perhaps the AMD RX 5xx series or Nvidia GTX 1xxx series are good references. As we mentioned before, there are also some functions in the GPU that are not used by ProgPow, such as floating point logic, L2 cache, texture cache and ROP. The tile unit is where the vector math is performed, which is absolutely required by ProgPow. ASIC chips used for cryptocurrency mining will also want to add areas to implement "keccak" functions. As the development team of the ProgPow algorithm, we estimate that the die area of ​​the ProgPow ASIC chip will be 30% smaller than that of the equivalent GPU - but even in the best case, the power consumption will only be reduced by 20% at most. In contrast, although some logic blocks on the GPU are not fully utilized and some chip die area is wasted, the power consumption is minimal.

Question 6: Will small chips generate higher returns than large chips?

How should I put it? This sounds like a general introduction to chip manufacturing knowledge. Maybe we need to write a training document called "Chip Manufacturing 101". In addition, for the revenue calculation formula, you can refer to an article published in 2006, "Compare Logic-Array To ASIC-Chip Cost per Good Die", where you will find that there were great innovations in chip revenue and process control as early as 13 years ago.

For chips with a single functional unit, the yield of a chip with a smaller die area is higher than that of a chip with a larger die area. However, this is not the case for modern GPUs. Today's GPUs can be restored and combined almost arbitrarily, and the defects of small duplicate units are basically negligible. As long as each compressible functional unit is small enough, the yield of GPU chips can be almost as high (or even higher) as chips with larger functional modules.

To better explain this concept, we can give a simple brain hole experiment:

1. Suppose you have a large chip "Giant ChipA" that occupies the entire wafer. This "Giant ChipA" is composed of 100,000 removable sub-components, but 80% of the sub-components must be defect-free to ensure that "Giant ChipA" works properly. During the embedding process, the bad sub-components will be bypassed.

2. In addition, suppose you have a small chip "Tiny ChipB", which consists of only one functional module (not embeddable), but this small chip is small enough to assemble 100,000 subcomponents on the same wafer. In this case, as long as one subcomponent is broken, it means that the entire "Tiny ChipB" chip is broken.

3. If there are 20,000 defective sub-components evenly distributed on each wafer, then the profit of "Giant ChipA" can be 100% because they can remove 20% of the defective sub-components, while the profit of "Tiny ChipB" may be only 80% because they cannot remove the defective sub-components.

If you look at AMD's Polaris 20 series and Nvidia's GP 104, you'll see under simulated footage that these GPUs are made up of a large number of tiny "detachable" submodules.

Question 7: The voltage of ASIC miners can be easily reduced to 0.4V, which is only half of that of GPU... Such a low-voltage ASIC design has been adopted by Bitcoin mining equipment ASIC miner manufacturers, so now we have no reason not to believe that they will not apply this strategy to ProgPow ASIC miners. Can you talk about this issue?

Low voltage designs only work when the chip consists solely of computation, such as an ASIC miner dedicated to SHA256d mining algorithm computation. Integrating other components - such as SRAM, which is also required for ProgPow data caching - is extremely difficult and impossible to operate at low voltage.

Question 8: The same energy-saving effect can also be achieved on LPDDR4x DRAM, which consumes even less power than GDDR6. Let’s talk about this issue.

We cannot only consider the energy consumption issue. The bandwidth of LPDDR4x is much lower than that of GDDR6. The bandwidth of each pin of the former is 4.2Gb/s, while that of the latter is 16Gb/s. LPDDR4x computing chips require four times the memory chips and four times the memory interfaces to achieve the same performance as GDDR6. With this calculation, the cost is actually significantly increased.

It is worth noting that the interface of high-bandwidth computing chips is usually limited, which means that the chip module area must be large enough to allow almost no signals to fall off the chip to the printed circuit board (PCB) around it. The LPDDR4x design requires about four times the number of pads around the chip to achieve the same bandwidth, that is, the cost is not only on the memory chip, but also on the computing chip area. The cost is also included, so the total cost is not low. What's worse is that since any chip is speed-oriented, when the chip module area is larger, it means more wasted power.

So let's think again about why today's GPUs can't run on LPDDR4x. First, LPDDR4x doesn't perform as well as expected in terms of bandwidth cost. For a given bandwidth level (four times the number of chips), the cost of LPDDR4x is more than four times higher, which leads to a significant increase in cost - LPDDR4x costs about $150 for 256 GB/s bandwidth at 9W power, compared to less than $40 for the same bandwidth of GDDR6 at 11W power, so LPDDR4x doesn't save miners much money (note that this is bandwidth cost, not memory capacity cost).

Question 9: GPU manufacturers like Nvidia employ about 8,000 people to develop GPUs, which are also very complex; while ASIC manufacturers like LinZhi employ only a dozen people and only develop ASIC miners for the ETHash mining algorithm. The labor costs of these companies differ by 100 times, so can it be said that ASIC chips have more advantages than GPU chips in terms of cost and time to market?

It should be said here that economies of scale are an important factor. The GPU industry is also amortized in various sales channels around the world. The current total market size is about $420 billion, of which AMD's market value is about $11.6 billion, Nvidia's is about $154.5 billion, and the largest Intel is about $254.8 billion. As far as the memory market alone is concerned, the cost of physical ports (PHY) and chips needs to be shared in this industry with a total size of $500 billion. Among them, Samsung Electronics, with 320,671 employees, has a market value of about $325.9 billion. They are also the most active patent applicants in the United States; the second is Micron Technology, with 34,100 employees, with a market value of about $60.1 billion, but it is the first chip manufacturer to develop 20Gbps high-speed GDDR6 memory; Hynix has 187,903 employees and a market value of about $56.8 billion. They developed the world's first 1Ynm 16Gb DDR5 DRAM. In comparison, the total market value of the ASIC chip industry used for cryptocurrency mining is only $146 billion, of which $73 billion belongs to Bitcoin.

We also need to look at the time to market and the Technology Acceptance Model (TAM), and here we can use the development time of the successor of the famous S9 miner as a reference. If a fully developed SHA256d algorithm computing chip with low computational difficulty takes three years to iterate, then what can guarantee that ASIC miners that support the ProgPow algorithm like GPUs can be put into production and market quickly? We can also analyze the recent situation of ASIC miners for mining Ethereum cryptocurrency. The GDDR6 chip sample trial period has been one year, and there is still no new version of the product that can be widely used.

Final thoughts on IfDefElse from the ProgPow core development team

ProgPow actually targets a type of mining hardware that is supported by economies of scale, has high visibility and gains a significant competitive advantage.

ProgPow core development team IfDefElse is not large, and team members all have full-time jobs, so they can't respond to all questions and articles in time, and they don't have time to chat in various cryptocurrency and blockchain online forums. Although IfDefElse is very interested in hardware design and development, they still recommend that people involved in this field need to be cautious, because hardware, like software, is a diversified field. Even if you are a big shot who is very familiar with cryptocurrency mining ASIC chips, you may not be an expert in the GPU-ASIC field.

<<:  Bitmain and IDG continue to make plans, Liquid receives Series C financing with a valuation of over US$1 billion

>>:  VeriBlock coin official website, VBK coin introduction, VBK mining, VBK coin official website, VBK graphics card mining tool

Recommend

Can facial features bind us for life?

Can facial features bind us for life? Many people...

Are people with white eyes mean?

In real life, mean people generally make us feel ...

Five Elements Physiognomy

Rhyme for the Five Elements : Yellow head, round ...

The facial features of a naturally rich wife

The facial features of a naturally rich wife Beco...

The seven most romantic faces

The seven most romantic faces Everyone likes a co...

Do people with crooked teeth have bad relationships?

In some cases, we will consider a person's po...

Look at your palm to see if you can gather wealth

Look at your palm to see if you can gather wealth...

Why are some blocks not packaged with transactions? Did the miners forget?

Author | Produced by Yan Wenchun In the article &...

Women's appearance and destiny

Everyone has a facial feature, but each person...

Understand the logic and principles of blockchain technology in one article

1. The foundation needed for blockchain: artifici...

Is the lines on the forehead auspicious or ominous?

Is the lines on the forehead auspicious or ominou...

Societe Generale is hiring Bitcoin developers

Bitcoin House News July 23 CoinDesk reported that...

How to predict your fortune before the age of 50 from your face

Age of body part: Zhongyang (37 years old), Zhong...

Look at a person's fortune through eight types of face shapes

Everyone's face shape is different, but if we...