Hard-core analysis of blockchain (I): Is blockchain a shared database?

Hard-core analysis of blockchain (I): Is blockchain a shared database?

introduction

In recent years, a lot of misunderstandings have arisen in the academic and industrial circles regarding the understanding and application of blockchain, which I have gradually clarified and redefined in my past articles. However, I always feel that I have not finished my article and have not written a special chapter on it. Recently, as I am designing a distributed industrial collaboration model, I feel that I need to write a series of articles to explain these misunderstandings one by one when I come across the subtle application of blockchain technology. I hope that through my repeated advocacy, I can provide more solutions and definitions for the application of blockchain industry.

This time, we first discuss whether there is any problem with the statement that "blockchain is a shared database". Baidu Encyclopedia describes blockchain as follows: "Blockchain is a term in the field of information technology. In essence, it is a shared database , and the data or information stored in it has the characteristics of "unforgeable", "full traceability", "traceable", "open and transparent", and "collectively maintained". It can be said that most people who think that blockchain is a shared database are greatly influenced by Baidu Encyclopedia.

Next, let’s analyze what is a shared database?

1. What is a shared database?

I searched the keyword "shared database" on CNKI, but did not find any directly matching papers. Instead, I found papers on data sharing models. I also searched Baidu Encyclopedia entries and did not find any descriptions of the concept of "shared database". Instead, I found "shared storage" (the concept is completely different. If you are interested, you can search for it yourself). It can be said that "shared database" has never been a concept in academic and system software practice. It is more of a deformed product of Internet word creation.

This is because, whether from the perspective of data integration and sharing model analysis or from the perspective of database classification analysis, "shared database" is a false proposition.

First, from the definition of a database: "A database is a warehouse that organizes, stores, and manages data according to data structures. It is a collection of large amounts of data that is stored in a computer for a long period of time and is organized, shareable, and uniformly managed"[1]. In other words, data sharing is itself one of the basic functions of a database, and there is no need to use blockchain technology to establish the database's data sharing capabilities.

Secondly, from the perspective of database classification, the current common classification is based on the different organization of data structures, which can be divided into: "relational database" and "NoSQL database"; based on different deployment modes, it can be divided into: "stand-alone database" and "distributed database", etc. There has never been a database classified according to the degree of data sharing.

Furthermore, from the perspective of data sharing, the industry often uses data integration to achieve logical or physical integration of data from different sources, formats, and characteristics, thereby providing comprehensive data sharing for enterprises. Data integration systems are usually constructed using methods such as federation, middleware models, and data warehouses, and there are many mature frameworks that can be used.

Therefore, the concept of "shared database" has never appeared in the development of database technology or enterprise data sharing models, because the original intention of developing database software is essentially to solve the organization, storage, management and sharing of data.

2. Why do we think that blockchain is a shared database?

The above statement "Is blockchain a shared database?" is a false proposition, because one of the missions of a database is to improve the convenience of data access and sharing. So why do we have such a definition? I guess that "blockchain is a shared database" is mainly influenced by some general underlying blockchain platforms or products.

First of all, most public chain platforms, such as Bitcoin, Ethereum, EOS, etc., are not a general underlying blockchain platform. They are all based on peer-to-peer asset transactions and build a combination of blockchain-related technologies, including encryption technology, distributed technology, P2P data transmission, consensus algorithm, chain data structure, game theory, etc. The application of technology is to serve the point-to-point, safe and efficient asset transactions. Therefore, in non-asset processing industry fields, such as government affairs, industry, supply chain, etc., the direct use of blockchain technology based on public chains is often incompatible. Since the business purpose of the public chain platform is clear, people will not discuss whether Bitcoin is a shared database.

Secondly, in most industries that have developed alliance chain applications, the underlying Apache Hyperledger series platforms are widely used, which are deeply influenced by Hyperledger. Take Fabric, the core of Hyperledger, as an example. Fabric is a general blockchain platform with unclear business purposes. From the figure below, we can see that Fabric's nodes are mainly composed of smart contracts (early Chaincode) and distributed ledgers. The data in the nodes is mainly stored in the distributed ledger.

Fabric node composition

Source: Hyperledger Fabric Technical White Paper [2]

The distributed ledger is mainly composed of Blockchain and global state. The update of global state is triggered and determined by transactions in the block. See the figure below:

Fabric ledger composition

Source: Hyperledger Fabric Technical White Paper [2]

As can be seen from the figure below, the global state World State in the distributed ledger is essentially a distributed KV storage model. Combined with the distributed node network, it is not difficult to explain why blockchain is considered a shared database.

Fabric State Model

Source: Hyperledger Fabric Technical White Paper [2]

As mentioned above, Fabric is a general blockchain platform with unclear business purposes. In Fabric's ledger model, it is actually not directly related to the financial ledger we understand in daily life. Ledger is just a general KV storage model that can store any data. In the actual use of Fabric, if there is no domain model driver, Fabric is really a distributed data storage architecture.

Influenced by this factor, we actually use Fabric's global state storage World State in blockchain industry applications to realize distributed storage chain. I have repeatedly emphasized in other articles that if the blockchain is positioned as a distributed data storage mechanism, it has no technical advantages compared with the currently commonly used distributed databases, but it is more complex to implement and less efficient.

3. Data sharing has nothing to do with data storage structure

Through the above analysis, it is indeed possible to define the blockchain general platform represented by Fabric as a distributed data storage model, but can this distributed storage mechanism bring about data sharing and openness? There is a misunderstanding here. We have a one-sided understanding that data distribution can bring about data sharing, but this article wants to emphasize that whether data is shared has nothing to do with the storage structure and deployment mode.

The storage structure and deployment mode of data are physical models, while data sharing is a business model. At a time when "data is an asset" and personal privacy protection and commercial data security are being strengthened by the public and public opinion, the key to determining whether data is shared is not how the data is stored and deployed, but whether the business necessity of data sharing and the interests of multiple participants are balanced and guaranteed. It is obviously a fantasy to simply use a distributed storage mechanism to solve the "information island" problem.

Moreover, most of the "information island" problems are caused by the decentralized storage and management of data. It can be said that data distribution is the current situation, not the future. To solve the "information island" problem caused by data dispersion, we must first distinguish the data sovereignty relationship. Under single data sovereignty (absolute data sovereignty), the most efficient method is data integration, which realizes data aggregation through data federation, data middleware and data warehouse; under multi-party data sovereignty (relative data sovereignty) relationship, it is through legal enforcement or business model drive, so that data can flow safely between data application stakeholders under the premise of legality and compliance.

In an environment where data integration cannot be established, such as multi-party data sovereignty, integration costs and legal restrictions, blockchain technology can indeed be used to establish a trusted data sharing network where data is tradable, mobile and supervised. However, the focus of blockchain technology application is not distributed data storage, but the transaction of data assets. If a data asset transaction model is not established, data sharing cannot be achieved by simply using the global state of Fabric.

In fact, classic blockchain technology represented by Bitcoin has proven that data storage in distributed nodes of the blockchain is only to protect each node and to verify the authenticity of transaction data locally and efficiently, rather than for the ultimate purpose of data sharing.

4. New technology drives always bring about a dumbbell effect first

Since entering the Internet Web2.0 era, a large number of new technologies and new terms have flooded into the industry. From big data, AI, 5G, blockchain to this year's quantum computing, every combination of new technology and industry cannot avoid the "dumbbell effect" of technological cognition in the domestic industrial circle, that is: one end of the dumbbell is highly conceptual and abstract, while the other end is highly instantiated and instrumentalized.

The same is true for the rise of blockchain technology. On one hand, blockchain is conceptualized and abstracted as a decentralized value internet that replaces centralized systems with network autonomy; on the other hand, blockchain is described as a shared database, a distributed storage tool. Why is there such a perception? I think a big reason is that the sudden rise of a new technology is often ignited by just a few papers and a few application scenarios, but the application-oriented supporting research in a wide range of fields has not yet fully caught up. Using highly conceptualized, abstract or instantiated, tool-based definitions can always find mapping relationships in the real world, which is a low-cost interpretation path.

It can be said that the dumbbell effect of new technology development is an inevitable process, but with the accumulation of knowledge and model precipitation of new technologies in field practice, the two ends of the dumbbell will be constantly corrected to make value cognition smoother and more practical. Einstein once said: "You can't solve a problem at the same level of thinking as you created it." When looking at new technologies, we often cannot directly match and map them from real things, but need to develop and improve the definition and value of new technologies in the application field with innovative thinking.

Summarize

Blockchain technology can indeed be used as a distributed database or data sharing mechanism to some extent, but in actual applications, it has no advantages over traditional data integration frameworks. At the same time, due to the use of technologies such as distributed consensus algorithms, P2P network transmission and block data structures, the system complexity is higher, and the performance and maintainability are worse. Such a large cost just to establish a distributed consistency storage mechanism is obviously not worth the loss, and there is no actual business prospect. The use of blockchain technology requires attention to the construction of a distributed peer-to-peer, secure and fair trading environment, with the optimization of the data trading environment as the premise, to indirectly achieve full sharing and utilization of data. It can be said that in the field of data sharing, blockchain technology is only one of the basic conditions, not an absolute factor. In an environment where data ownership is dispersed, the most important thing to determine whether data can be shared is the establishment of business and business models.


<<:  Mainstream coins are diving, but bullish sentiment remains strong

>>:  It's time to test your faith again. Should you run or stay?

Recommend

The location and fate of moles-what does a mole on a woman's hand mean

People with moles on their hands have very good f...

How to tell a person's face

How to tell a person's true face ? Appearance...

Are good-looking people suitable for business? Is it good to have long eyebrows?

Different people are suitable for different thing...

What does a mole on a woman's left lip mean?

Many people know about moles, because everyone ha...

US SEC requests 60-day extension of Ripple's disclosure period

After two and a half years of investigation and s...

What does a rich man look like?

Basically every man wants to be rich, and of cour...

The bridge of your nose determines whether you will live longer

The bridge of your nose determines whether you wi...

What fortune can be seen from palmistry?

What fortune can be predicted through palmistry? ...

What does it mean to have a birthmark on your lips?

Birthmarks are marks on our bodies. They are pres...

What does a woman look like when she is rich and powerful?

Some women want to be rich and powerful. In fact,...