What is distributed storage? An in-depth study of Filecoin

In 2020, distributed storage has entered our field of vision with its unique attitude, but we seem to have never thought about what distributed storage is and why we need it?

In fact, the use of distributed storage can be said to be "forced", because in the face of the increasingly rapid development of the Internet, the continuous innovation of the entire ecological application, the ever-increasing number of users, and the step-by-step growth of data, these undoubtedly put tremendous pressure on the existing local storage. Therefore, it is necessary to use other distributed storage systems to alleviate the corresponding pressure, so distributed storage and distributed file systems came into being.

Today, this article will introduce distributed storage and conduct an in-depth study of Filecoin.

0 1

Definition

Distributed storage system is to store data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which cannot meet the needs of large-scale storage applications. Distributed network storage system adopts scalable system structure, uses multiple storage servers to share storage load, and uses location server to locate storage information. It not only improves the reliability, availability and access efficiency of the system, but also makes it easy to expand.

Introduction to Distributed Storage

Modern computing is in a highly centralized stage. In the past decade, some large storage companies have gained huge wealth by dividing traditional computer systems into cloud storage products. This reflects the phenomenon of modern network centralization. If one of the suppliers has a problem, it will lead to a major Internet event. For example: Amazon's network suddenly interrupted in 2017; Github was interrupted for a long time in June 2020; Microsoft's cloud service had problems for many weeks in October 2020, etc.

The content we host on these services is also worthy of our concern, hidden behind fragile links that often break, which has profound implications for the computer systems we build and the society that increasingly relies on them. Centralized architectures succeed in part because they are easy to build. To prevent mergers, developers need basic building blocks that can be composed. Distributed storage is such a cornerstone, a prerequisite for the distributed web.

Basic characteristics of distributed storage

1. Flexibility

The modern Internet is extremely vulnerable. Today, web content is hidden behind URLs, each of which belongs to a specific server at any given moment, and the content it points to becomes inaccessible if the provider interrupts the network connection for any reason. Centralization increases this effect, creating single point failures and facilitating censorship. As a result, Internet link failures are common in today's Internet, as if when a link is broken or permanently unavailable, nation-state censorship and distributed denial-of-service attacks can disrupt access to any file.

In an ideal decentralized system, the loss of an operator should not prevent users from accessing content previously stored and served. By spreading responsibility across many nodes of the network, decentralized systems are also naturally resistant to censorship and other denial of service attempts, as there is no centralized target for attackers to massively harvest resources.

Centralized storage systems are susceptible to censorship. A classic example was when Catalonia, one of Spain’s 17 autonomous communities, held a referendum on independence. The Spanish government, which opposed the independence plan, blocked websites that had voting information at the ISP level. By severing these critical links, the government effectively prevented many individuals from accessing this information.

However, many of the sites are also mirrored using the peer-to-peer storage network Interplanetary File System (IPFS). Anyone running an IPFS node can download censored information from other nodes on the network and start sharing it themselves. The decentralized nature of IPFS runs counter to the Spanish government's attempts to block access to the files - whenever one node is blocked, another can easily take its place. In general, distributed storage systems make it more difficult to block at the network level.

2. Efficiency

All computing system architectures have their strengths and weaknesses, and no single solution is right for all possible use cases. Unfortunately, the modern web’s emphasis on centralization is no different. Today, a few centralized data centers in a handful of cities around the world store most of the content. For example, if two users on the same network want to send each other messages, those messages will typically be sent to one of the data centers; if 100 users in a room are watching the same video on their devices, instead of downloading one copy and sharing it across the local network, they will each access a central server and download 100 copies.

In the simplest terms: distributed storage makes it easier to share files without having to send requests to a few specific data centers on the Internet. Instead, nodes are connected through as few middlemen as possible. For example, connecting to nodes in other countries still requires several jumps, but nodes on the same network can share files directly. The ultimate goal of distributed storage is to establish numerous nodes so that everyone can get information through local peer nodes.

Distributed storage solutions can introduce fundamental new efficiencies into such activities. By bypassing data centers, distributed systems can enable nodes to be placed much closer to end users than even modern content delivery networks, greatly speeding up file retrieval. Peer-to-peer file sharing over local networks can also save bandwidth, especially in areas with limited access to the wider internet.

Desirable properties of distributed storage

While resilience and efficiency are hallmarks of distributed storage, an ideal storage system may have many other characteristics:

1. Accessible

An ideal distributed system should be accessible. It should be easy to participate in the network, allowing as many nodes as possible to store and distribute files on behalf of the network.

If you’re reading this and wondering: Can I be a node? The answer is: It depends. With Filecoin, anyone who is relatively tech-savvy should be able to run a client node to interact with the network. As for running a storage miner node, that’s not something that just anyone can do, as you need to have hardware that meets certain specifications.

With IPFS, the hardware requirements for nodes are lower, which means that it is possible for more users to contribute to the network by running a node, perhaps by running a built-in web browser. Cloud service providers have made cheap, reliable storage more accessible than ever before. A major aspect of their success is the ability to configure and manage storage through code via an API. Any competing system should be able to provide the same level of convenience.

2. Content Addressing

As mentioned earlier, URLs embody some inherent design tradeoffs. They describe the location of data, not the content of the data. To explain how centralized systems make it difficult to find data - suppose you want to download a photo of a fluffy kitten. Consider the following two URLs:

https://example1.com/cat.jpeg

https://example2.com/cat.jpeg

These URLs both refer to a file called cat.jpeg, but there is no guarantee that the two files are the same. If example1.com goes offline, you can't be sure that example2.com will meet your needs - cat.jpeg might be something completely different. In fact, it might even be a picture of a dog! There is no inherent relationship between a URL and what it refers to.

Therefore, there is no way for you to ask today's Internet, "Does anyone have this file?" because you know nothing about the file other than its location.

Problems can arise when you share files using a URL. The server might serve a different file from that URL, or someone could perform a man-in-the-middle attack and modify the file (oddly, this attack is not uncommon). It's hard to confirm that everyone who accesses the URL receives the file they want.

In contrast, content addressing locates files based on content identifiers (CIDs), which act as digital fingerprints of files. Addressing files in this way solves the problem of location addressing. When a client needs a file, they ask a node in the network for the file with a specific CID, rather than asking a server for a URL. Once the client downloads the file, it fingerprints it itself.

Looking back at our previous example, it's as if all websites have a common understanding of the file they want to deliver when they input cat.jpeg. So while there's no guarantee that any node will have a specific cat.jpeg, those nodes will check the fingerprint of that file to try to find a match.

While steps like fingerprinting require more technical knowledge than the average person would need, Filecoin and IPFS clients can easily automate this process. This way, clients can guarantee that they have received the files they requested — in this system, finding an alternate provider of data is simple.

Main Takeaways: CIDs mean you can find content that might otherwise be lost in a centralized system. CIDs also protect against man-in-the-middle attacks or servers suddenly changing files at a specific URL.

3. Independence

Trustless systems allow two parties to collaborate without having to know each other or seek out a third party. She believes the system’s incentives drive participants toward behaviors necessary for the network to function.

4. Verifiable

An ideal storage system should be able to easily and consistently prove that nodes are storing the exact data they promise. This type of auditability is key to achieving trustlessness. If you can always be sure that the data is being stored correctly, then you have less need to trust the party providing the storage.

5. Openness

Finally, the ideal distributed storage system is open: its code is open source and auditable. In addition, the storage system should not be monolithic. Instead, it should expose an open protocol that anyone can implement and build upon, rather than encouraging lock-in.

How does Filecoin embody these characteristics?

The Filecoin project is a distributed storage system designed to meet these characteristics. Described in 2014, the Filecoin protocol was originally developed as an incentive layer for the InterPlanetary File System (IPFS), a peer-to-peer storage network. Like IPFS, Filecoin is an open protocol that builds on the properties of its earlier versions, leveraging the same underlying peer-to-peer and content addressing capabilities.

The Filecoin node network provides a decentralized storage market for the retrieval and storage of files. The network is supported by a new blockchain that records the commitments made by network participants. Users use the blockchain's cryptocurrency FIL to conduct transactions on the network.

1. Search the market

In the retrieval market, nodes called retrieval miners compete to serve files to clients as quickly as possible. Retrieval miners are rewarded with a small amount of FIL fees. This gives nodes in key locations an incentive to join the network and facilitates the rapid distribution of files. It also encourages the establishment of a robust network to replicate and save much-needed files.

2. Storage Market

In Filecoin's storage market, nodes called storage miners are authorized to compete on various features, such as price and location, to provide clients with file hosting contracts for a specified period of time. Storage miners must first pledge FIL before accepting a contract; if storage miners fail to fulfill their obligations to clients, this feature can be used to automatically reimburse clients.

When storage miners and their clients reach an agreement, the clients transfer their data to the storage miners. Storage miners add their data to a sector, which is the basic unit of storage in Filecoin. Miners then perform computationally intensive operations (called packaging) to create a unique copy of the sector's data.

If a client wishes to store multiple unique copies of their data, the sealing process ensures that each copy has a unique fingerprint, and the computation required to obtain that fingerprint prevents nodes from regenerating the fingerprint from the base data, thereby avoiding cheating. The data is ultimately used to publish proofs of replication to the Filecoin blockchain.

During a storage transaction, storage miners are periodically asked to submit proofs of spacetime to the blockchain. Miners obtain these proofs using randomness (provided by the blockchain itself), sealed sectors, and proofs of replication published to the blockchain. These proofs provide clients with a strong probabilistic argument that the storage miner has a complete, unique copy of the data. This is an extremely strong guarantee - something that even modern cloud storage providers cannot offer their customers.

Clients reward Filecoin storage miners by paying FIL as transaction fees. Storage miners are also given the opportunity to mine blocks for the blockchain, which requires both FIL rewards and transaction fees from others who want their messages included in the mined blocks. Filecoin's proof system means that miners need some additional hardware, but the requirements are still low for technically skilled individuals. The hardware requirements for participating in the network as a client are modest, and Filecoin nodes also expose an API for programmatic interaction with the network, allowing third-party services to build on top of the core network functionality.

in conclusion

Distributed storage provides a powerful alternative to traditional centralized storage. It provides developers with the opportunity to explore the design of computing space, thereby emphasizing the stability and efficiency of content storage and delivery. Filecoin shows that distributed storage can not only make data more secure; it can also make the Web3.0 network available to more people.

——End——

<<: 15 people sentenced, 14.8 billion digital currency pyramid scheme case sentenced! Nearly 2.7 million registered members, using blockchain to defraud, with 3293 levels

>>: Wall Street giant Guggenheim joins the game, and Bitcoin investment appeal grows rapidly

Analyze your personality based on your face shape

Blog

The face of a woman who likes to deliberately find fault with others

Blog

New progress in Bitcoin "mining" case: Sichuan Provincial Development and Reform Commission adopted the court's judicial suggestions to investigate and clean up multiple "mining farms"

Cryptocurrency

People with knotty noses often have unhappy marriages, but will they definitely get divorced?

What is distributed storage? An in-depth study of Filecoin

Analyze your personality based on your face shape

The face of a woman who likes to deliberately find fault with others

New progress in Bitcoin "mining" case: Sichuan Provincial Development and Reform Commission adopted the court's judicial suggestions to investigate and clean up multiple "mining farms"

People with knotty noses often have unhappy marriages, but will they definitely get divorced?

EBON International (EBON.US) announced the completion of the design of 6nm Bitcoin mining chip

What is a phoenix eye? A detailed explanation of phoenix eye physiognomy

The impact of the US monetary policy shift on cryptocurrencies

Hedge funds are investing in Bitcoin, but three reasons are a hindrance

What is the best face shape?

Face analysis: Which industry is more suitable for you based on your face

Recommend

People with certain eyes have a bad fate (Part 1)

What does a man worth marrying look like? Which types of men are worth entrusting your life to?

What is the face of the inner corner eagle eye?

Narrow forehead analysis

A widow's face

Four reasons why 2017 will be a big year for blockchain

Summary of favorable factors for the industry's deep V rebound this week

How do workers choose their bosses based on their appearance?

Revealing the secrets of a wealthy man with a complete illustration

Your relationship with your parents can be seen from your face

Bitcoin plunges again, and recent decline gradually emerges

Why Qitmeer wants to build a DeFi foundation based on Islamic financial concepts

OKCoin's Xu Mingxing won the 2016 Financial Technology Innovation Person of the Year Award

Palmistry Analysis

What does a triangle on the palm mean?