Dai Jiale: Detailed explanation of the essence, technical architecture and application of IPFS

Dai Jiale: Detailed explanation of the essence, technical architecture and application of IPFS

Last Sunday, Blockchain Catcher held the second Catcher Salon. This salon invited three industry experts in the IPFS field to share their practices or research experiences. This article is the first in the series of sharing content in this salon. I hope it can inspire you who are interested in IPFS.


This article is shared by Dai Jiale , an IPFS application practitioner . Dai Jiale is a former senior R&D engineer at Baidu and a columnist for ipfser.org & Babbitt. Dai Jiale participated in the crowdfunding of FileCoin in August last year, and learned about IPFS technology. He independently developed two open source applications based on IPFS, one is a wiki system related to IPFS, and the other is a geographic location retrieval system related to IPFS.


Author: Dai Jiale

Editor: Wang Yanzhi

*Please leave a message for reprinting


01

What is IPFS?


The Chinese name of IPFS is Interstellar File System, which was initiated by Juan Benet in May 2014. Juan Benet's personal experience is legendary. He graduated from Stanford University. Before creating the IPFS project, the first company he founded was acquired by Yahoo. In 2015, the IPFS he created received a huge investment in the YCombinator incubation competition, and the Protocol Lab was established. The lab team consists of 14 core developers and hundreds of code contributors in the community.


IPFS is essentially a content-addressable, versioned, peer-to-peer hypermedia distributed storage and transmission protocol . Its goal is to supplement or even replace the Hypertext Media Transfer Protocol (HTTP) used in the past 20 years, hoping to build a faster, safer, and freer Internet era.


Every day when we surf the Internet and use apps to browse Moments and Weibo, we all use the HTTP protocol, which is based on the computer application layer of TCP/IP. It transmits hypertext data from the server to the local browser, which then renders it and presents it to the user. Based on such a network environment, the CS or BS architecture is formed, and finally injected into large network providers such as BAT.


The network services provided by the Internet platform have roughly gone through three modes of iteration:


The first model is centralized. For example, the early 12306 had only one central service group because it could not be decentralized. All ticket purchasing traffic was directly carried on this service group , which put a lot of pressure on it.


The second model is a decentralized cluster, which is similar to the O2O Hundred Regiments Campaign where each website needs to establish service groups in different regions . The IDC computer rooms behind them will disperse the same service within a local area, which reduces the pressure on the central server.


The first two models are prone to some disadvantages: In the first model, the service is highly dependent on the central network. Large companies or startups cannot afford downtime. Operation and maintenance has a KPI index called SLA. If the stability is not 99.9%, it is basically unqualified. SLA requires a particularly large cost. Large companies need to hire a group of operation and maintenance experts or professionals to ensure the stability of the system. In the second model, there is a risk of loss of stored data. People often joke that the cables are dug up and the employees delete the database and run away, which are all hidden dangers.


At the same time, the bandwidth costs of these two modes are relatively high, which will cause a certain amount of bandwidth resource waste. For example, the video of the first audition of "The Rap of China" was played 1 billion times. Assuming the video file size is 1GB, playing the entire video requires 1000PB of bandwidth. If the bandwidth cost is 0.001 US dollars per GB, iQiyi needs to pay 1 million US dollars to the ISP (Internet Service Provider) for just one episode.


IPFS has the potential to become the third model. IPFS wants to create a peer-to-peer network topology, which is equivalent to subverting the distribution relationship represented by HTTP. It has the characteristics of content addressability and generates a unique hash identifier through file content, which saves space overhead costs to a certain extent.


The domain name addressing used by the HTTP protocol will eventually be mapped to the bottom layer to find a host under the IP address corresponding to a domain name, as well as a file in a file directory. It does not care whether the same file exists, but content addressing will access it through a unique identifier and check in advance whether the identifier has been stored. If it has been stored, it will be read directly from other nodes without repeated storage, which saves space in a certain sense.


Let's take a specific scenario as an example. Suppose I want to watch the movie "Pacific Rim". Xiao Ming has downloaded this movie before. He starts the IPFS node and adds this video file to the IPFS network. He will get a hash fingerprint b and publish it to the public gateway at the same time, and get a path name of /IPFS/b.


He told me the hash fingerprint and path name. What I had to do was start a local node and send a request for the addressing PIN to the gateway. IPFS automatically indexed the hash value of the distributed hash table and found the node list corresponding to fingerprint b.


Large videos usually don't exist on one node, but may be fragmented and stored on other sub-nodes. IPFS fetches all these node lists in parallel, and finally the local manager assembles the complete file. The parallel speed is much faster than the speed of directly downloading the complete file. I can quickly watch the movie on the local browser and continue to share it with others.


02

IPFS Architecture


IPFS has at least eight layers of sub-protocol stacks, from top to bottom: identity, network, routing, exchange, object, file, naming, and application. Each protocol stack has its own functions and complements each other.



The identity layer and routing layer can be explained together. The generation of peer node identity information and routing rules are generated and formulated through the Kademlia protocol. The KAD protocol essentially builds a distributed loose hash table, referred to as DHT. Everyone who joins this DHT network must generate their own identity information, and then use this identity information to store the resource information in this network and the contact information of other members. Just like WeChat business card sharing, if you cannot directly search for WeChat ID, if you want to find someone, you can establish contact by sharing the business card with a friend who has the person's contact information.


The network layer is more core, and the LibP2P used can support any transport layer protocol. NAT technology allows devices in the intranet to share the same external IP. The home routers we have all experienced are based on this principle.


The exchange layer is a BT tool like Xunlei. Xunlei actually simulates a P2P network and creates a central server. When the server registers a user to request a resource, it allows users who request the same resource to form a small cluster swarm to share data. This method has disadvantages. A server is maintained by Xunlei. If there is a failure or downtime, the download operation cannot be performed.


Centralized services can also limit some download requests. People have invented a smarter way, Bittorrent, in which the data to be stored by each seed node is stored in a hash table. BT tools are relatively less regulated and the service is more stable.


The IPFS team innovated BitTorrent and called it Bitswap, which added a credit and billing system to incentivize nodes to share. I infer that FileCoin is very likely based on Bitswap. Users who add data to Bitswap will increase their credit points, and the more they share, the higher their credit points will be. If users only retrieve data but do not store it, their credit points will get lower and lower, and other nodes will give priority to nodes with high credit points when embedding connections.


This design can solve the problem of witch attacks. Credit score cannot be improved by machine brushing. If you keep brushing search requests, the credit score will be lower and lower. There is a more sophisticated algorithm between the variables of request number and storage volume, similar to a parabola. In the early stage, many things can be tolerated, but they will no longer be trusted after reaching a certain number of times.


The object layer and the file layer are suitable for discussion together. They manage 80% of the data structures on IPFS. Most data objects exist in the structure of MerkleDag, which facilitates content addressing and deduplication. The file layer is a new data structure, parallel to DAG, and uses the same data structure as Git to support version snapshots.


The naming layer has the feature of self-verification (when other users obtain the object, the fingerprint public key is used for signature verification, that is, whether the public key used matches the NodeId, which verifies the authenticity of the object published by the user and also obtains the mutable state), and the ingenious design of IPNS is added to make the encrypted DAG object name definable and enhance readability.


Finally, there is the application layer. The core value of IPFS lies in the applications running on it. We can use its CDN-like functions to obtain the desired data at a very low bandwidth cost, thereby improving the efficiency of the entire application.


There are only two reasons why new technology replaces old technology: first, it can improve system efficiency; second, it can reduce system costs. IPFS has achieved both of these points.


I compiled an IPFS family tree relationship diagram, which is also a vertical data flow diagram. The eight-layer protocol mentioned above actually has each layer implemented in a corresponding module, and an intuitive chart design has been made.


The IPFS team adopted a highly modular integrated approach when developing the project, developing the entire project like building blocks. The Protocol Labs team was founded in 2015, and has been developing three modules, IPLD, LibP2P, and Multiformats, for 2017, which serve the bottom layer of IPFS.


Mutiformats is a collection of hash encryption algorithms and self-describing methods (you can know how the value is generated from the value). It has 6 mainstream encryption methods such as SHA1\SHA256\SHA512\Blake3B, which are used to encrypt and describe the generation of nodeID and fingerprint data.


LibP2P is the core of the IPFS core. Faced with a variety of transport layer protocols and complex network equipment, it can help developers quickly establish a usable P2P network layer, which is fast and cost-effective. This is why IPFS technology is favored by many blockchain projects .


IPLD is actually a conversion middleware that unifies existing heterogeneous data structures into one format to facilitate data exchange and interoperability between different systems. Currently, IPLD supports the block data of Bitcoin and Ethereum, and also supports IPFS and IPLD. This is also the second reason why IPFS is popular in blockchain systems. Its IPLD middleware can unify different block structures into one standard for transmission, providing developers with a relatively high standard for success without having to worry about performance, stability, and bugs.


IPFS applies the functions of these modules and integrates them into a containerized application, which runs on independent nodes and is accessible to everyone in the form of a web service.


Finally, there is Filecoin. As a project that was only announced in July last year, its development progress has been kept secret. Filecoin makes the data of these applications valuable, and through incentive policies and economic models similar to Bitcoin, it encourages more people to create nodes and use IPFS.


I would rather everyone look at IPFS and FileCoin separately. If IPFS is used well, many FileCoin projects can be created. However, its own value and significance are not as great as IPFS.


03

The application significance of IPFS


First, it can bring a certain degree of freedom to content creation. Akasha is a typical application, which is a social blog creation platform based on Ethereum and IPFS. The blog content created by users is published through an IPFS network instead of a central server.


At the same time, users are bound to Ethereum wallet accounts, and users can reward high-quality content with ETH, and content creators can earn ETH from it, just like brain mining. It does not have too many regulatory restrictions, nor does it have middlemen who take commissions, and the content revenue belongs directly to the creator.


Second, it can reduce storage and bandwidth costs . I have mentioned the example of iQiyi before, and a relatively successful video project is called "Dtube". It is a decentralized video playback platform built on Steemit. The video files uploaded by its users are stored through the IPFS protocol and have unique identifiers. Compared with traditional video websites, it reduces the redundancy of the same resources and greatly saves the bandwidth costs incurred by a large number of users when playing videos.


Third, it can be perfectly combined with blockchain. The essence of blockchain is a distributed ledger. One of its bottlenecks is the storage capacity of the ledger. The biggest problem of most public chains is that they cannot store a large amount of hypermedia data on their own chains. The total block data of Bitcoin is only about 30-40G. Programmable blockchain projects such as Ethereum can only execute and store small pieces of contract code. DApp is greatly restricted from developing into a super App.


Using IPFS technology to solve storage bottlenecks is currently a transitional solution, and the most typical application is EOS. EOS is proud of its ability to support millions of TPS concurrency. In addition to the DPOS consensus mechanism, it is also due to the fact that its underlying storage design uses IPFS to solve the transmission efficiency of large data.


EOS processes its packaged block data heterogeneously through IPLD, unifies it into a data structure type that is easy to address content, and mounts it on the IPFS link, allowing the IPFS network to take on the logic of storage and P2P retrieval without consuming too many computing resources of the EOS blockchain system itself.


Fourth, it can provide a distributed cache solution for traditional applications. I wrote IPFS-GEO before. It is a project that provides distributed cache for traditional LBS applications. It can convert geographic location coordinate data into a one-dimensional string through the GeoHash algorithm, and store the associated data with retrieval value into the IPFS network. The IPFS network identifies the uniqueness and distributes it to each neighboring node.


When a search request arrives, the system first compares the string similarity range to narrow the search scope and speed up the search efficiency. It then obtains hypermedia data from nearby nodes through NodeID to achieve an effect similar to distributed caching, greatly improving the efficiency of the entire search action of the LBS application.


04

IPFS’s star applications


OpenBazaar is a star application on IPFS . I gave it a Chinese name, Open Market. It just received a $5 million investment from Bitmain some time ago.


In the previous version 1.0, OpenBazaar was called a black market. At that time, IPFS was not applied, and ZeroMQ was used to implement P2P transactions, which also bypassed the centralized inspection to a certain extent and gave the transaction fees to users as dividends. At the same time, it integrated Bitcoin as a payment channel, which caused a sensation and the number of users increased rapidly in a short period of time.


After the release of version 2.0 of Open Market, the official added a layer of review mechanism considering legal factors, supported digital currencies such as BCH in addition to Bitcoin, and integrated and reconstructed IPFS, replacing the previous ZeroMQ .


Now, many stores on the open market can be run on the host even without users online. In the past, you had to log in at the same time to trade, but now using IPFS is equivalent to realizing offline stores. This also means that the more people visit your store, the more store data is copied, which is conducive to the publicity and promotion of high-quality stores. This is a return to value in a certain sense.


I call it a star project, not only because it is based on IPFS and does a good job, but also because it completely reconstructs IPFS. It reopens all the source code, protocols and various supporting facilities of IPFS. It not only reconstructs the branch of IPFS and changes the protocol name, but also changes the protocol header, which in a certain sense isolates the OB network from the main IPFS network.


OpenBazaar hopes to have more centralized control and wants to build its own mining farm and computer room to ensure the stability of the service, choosing a middle point between decentralization and centralization. I think most traditional applications and companies can also consider this. Complete decentralization is unlikely, so we can take the essence and discard the dregs.


<<:  Thoughts on EOS Super Nodes

>>:  Exclusive interview with Bitmain: With an annual income of $2.5 billion from mining machines, why is it eyeing AI chips?

Recommend

Female celebrities whose faces are slightly damaged and affect their fate

A female celebrity whose destiny is affected by s...

Who will be lonely in life?

Who will be lonely in life? I believe many people...

Is it good to have one high zygomatic bone and one low zygomatic bone?

Is it good to have one high cheekbone and one low...

Choosing the right color for your nails can change your fortune

Nowadays, many girls like to paint their nails in...

Good looking man

In the glamorous world of playboys, it was very n...

Is it good for a man to have protruding ears and fan-shaped ears?

In my impression, protruding ears are a good faci...

Several key points you must know about Filecoin investment

1. What is Filecoin Filecoin is a decentralized s...

How to judge people by their ears in business

To observe a person's character from his ears...

What kind of face does a woman have good fortune?

If a woman has good fortune in wealth, what are h...

Are women with hump noses very lucky in their careers?

Sometimes, people always have a prejudice that wo...

What are the facial features of lower sclera?

In fact, sometimes, although our eyes look simila...

What kind of face looks stupid and has wide eye spacing?

Different facial features actually have different...