Original author: Vaibhav Saini Original link: medium Translator: Interstellar Continent Editor's note: The original title is "[Must-collect] The most comprehensive and useful ultimate guide to Filecoin" Since the decentralization revolution began in 2009, many promising projects have emerged and changed the way we see and live in this world. Protocol Labs is one such project that gave birth to projects like IPFS. IPFS lacks an incentive layer that can help it get mass adoption, so its ultimate goal is to replace HTTP. This is where Filecoin comes in. Since its release, Filecoin has generated a lot of interest in the community. With the launch of the test network in December 2020, there are a lot of things you can explore. There is a lot of information on the web about its technology and economics, which can be confusing and overwhelming. So here we have consolidated all the available information in "one source". If you like high-tech Web3 concepts like Filecoin explained simply with interactive tutorials, head here. Today, Amazon S3 is the workhorse of file storage on the internet. There are many reasons for this: 1. It’s incredibly cheap: $0.023 per GB stored. 0.04 cents per 10,000 read requests. 2. It’s damn fast. 3. It’s reliable: well, it’s had a few outages that effectively took much of the internet offline. But it still has 99.9% uptime. 4. It’s highly scalable. 5. And it offers a great developer experience. It easily integrates with the rest of the Amazon suite of services for scaling (e.g. CloudFront) In a world where we have such an amazing cloud storage service, any competition has to perform better than this, or at least on par. On a small scale, the decentralized web doesn’t work well. But if it (IPFS) is adopted on a large scale (higher adoption than BitTorrent), it could prove to be a better version of the internet, and therefore unlock a whole new economy. We divide it into 4 parts: 1. Overview of how the Filecoin network works? 2. In-depth study of the Filecoin protocol; 3. Other issues (not discussed in the white paper); 4. Possible improvements to the Filecoin protocol There are 3 groups of users in Filecoin: clients, storage miners, and retrieval miners. Clients pay to store and retrieve data. They can choose from available service providers. If they want to store private data, they need to encrypt it before submitting it to the provider. Storage miners store clients' data for rewards. They decide how much space they are willing to reserve for storage. After the client and storage miner reach an agreement, the miner is obliged to continue to provide evidence of their stored data. Everyone can view the evidence and ensure that the storage miner is reliable. Retrieval miners provide clients' data according to their requirements. They can get data from clients or storage miners. Retrieval miners and clients use micropayments to exchange data and coins: the data is divided into several parts, and the client pays a small amount of coins per piece. Retrieval miners can also act as storage miners. Finally, the network represents all full nodes that verify the behavior of clients and miners. These nodes count available storage, check storage proofs, and fix data failures. Some terms used in this article: Fragment: A fragment is a portion of data stored by a client in a decentralized storage network. For example, data (perhaps a directory) can be intentionally divided into many parts, and each part can be stored by a different set of storage miners. Sectors: A sector is a bit of disk space that a Storage Miner provides to the network (it can be thought of as a unique ID associated with a specific portion of a particular storage provider’s disk space). Miners store clients’ items in their sectors and earn tokens for their services. In order to store pieces, storage miners must pledge their sectors to the network. AllocTable: The AllocTable is a data structure that keeps track of parts and their allocated sectors. The AllocTable is updated on every block in the ledger, and its Merkle root is stored in the latest block. In practice, this table is used to keep the state of the DSN for fast lookups during validation verification. Orders: An order is a statement of intent to request or provide a service. Clients submit bid orders to the market to request services (the storage market for storing data and the retrieval market for fetching data, respectively), while miners submit request orders to provide services. Order Book: An order book is a set of orders. Filecoin maintains separate orders for the storage market and the retrieval market. Pledge: A pledge is a commitment to provide storage (specifically sectors) to the network. Storage miners must submit a pledge to the ledger (the filecoin blockchain) to start accepting orders in the storage market. The pledge consists of the size of the pledged sector and the collateral deposited by the storage miner. Clients submit a bid order to the storage order book (using the PUT protocol, explained in the next section). Clients must deposit the coins specified in the order and specify the number of replicas they want to store. Clients can submit multiple orders and can also specify a replication factor in their order. Higher redundancy (higher replication factor) leads to higher tolerance to storage failures (described below). Storage miners pledge their storage to the network by depositing collateral via a pledge transaction in the blockchain via Manage.PledgeSector. The collateral (filecoin) is deposited for the time the service is provided and is returned if the miner generates storage certificates for the data they promised to store. If some storage certificates fail, a percentage of the collateral is lost. Once the pledge transaction appears in the blockchain, miners can offer their storage in the storage market: they set the price and add the ask order to the market's order book. All storage allocations are public to every participant in the network. In each block, the network checks if the proofs required for each job are present, checks if they are valid, and takes appropriate action: 1. If any proof is missing or invalid, the network penalizes the storage miner by exploiting their collateral, 2. If a large number of proofs are missing or invalid (defined by the system parameter Δfault), the network considers the Storage Miner to be faulty, settles the order as failed, and then reintroduces new orders for the same block to the market. 3. If every Storage Miner that holds the miner's storage is faulty, the miner will lose and the customer will receive a refund. Let’s see how it works. Retrieval miners announce their work by broadcasting their ask orders across the network: they set a price and add the ask order to the market’s order book. Adding it all up the following graph shows all the activity happening in the network. Filecoin introduces the concept of a decentralized storage network (DSN). A DSN is a scheme that describes a network of independent clients and storage providers. A DSN aggregates storage provided by multiple independent storage providers and self-coordinates to provide data storage and data retrieval to clients. Coordination is decentralized and does not require trusted parties: secure operation of these systems is achieved through protocols that coordinate and verify actions performed by individual parties. DSNs can employ different coordination strategies, including Byzantine agreements, gossip protocols, or CRDTs, depending on the requirements of the system. A DSN involves the implementation of three functions: put, get, and manage. Put allows a client to store data under a unique identifier. Get allows a client to retrieve data using an identifier. Management orchestrates the network by measuring space available for rent, auditing providers, and fixing possible data errors. The management protocol is typically run by storage providers in conjunction with a network of clients or auditors (this involves Byzantine failures, which will be discussed below). A DSN has several properties. The first two are required. 1. Data integrity means that the client will always receive the same data as the storage, and the storage provider cannot convince the client to get the wrong data. 2. Retrievability simply means that the client will be able to retrieve its data over time. Optional attributes of a DSN: 1. Public verifiability allows everyone on the network to verify that data is being stored without knowing the data itself. Auditability allows verification that data is stored for the correct period of time. 3. Incentive compatibility aims to reward excellent service providers and punish poor ones. 4. Achieve confidentiality: Clients who wish to store their data privately must encrypt their data before submitting it to the network. Fault Tolerance DSN can tolerate two possible types of faults: Management Faults: These faults are Byzantine faults and are caused by the participants in the management protocol (storage providers, clients, and auditors). The DSN scheme relies on the fault tolerance of its underlining Manage protocol. Violations of the fault tolerance assumption of management faults can compromise the viability and security of the system. For example, consider a DSN scheme where the Manage protocol requires that storage providers be audited (if they are storing all the data they should store according to the protocol conditions) using Byzantine agreement (because nodes can audit them). In such a protocol, the network receives storage proofs from storage providers and runs a Byzantine Agreement (BA) to agree on the validity of these proofs. If the BA tolerates no more than f faults up to n in total, then our DSN can tolerate f < n / 2 faulty nodes. In the case of violations of these assumptions, audits can be compromised, rendering the entire system useless. Storage Faults: Storage faults are Byzantine faults that prevent clients from retrieving data: i.e. storage miners lose shards, retrieval miners stop providing shards. A successful Put execution is allowed if its input data is stored in m independent storage providers (out of n total) and it can tolerate up to f Byzantine providers. The parameters f and m depend on the protocol implementation; the protocol designer can fix f and m or leave the choice to the user, extending Put(data) to Put(data,f,m). A Get execution succeeds on the stored data if there are fewer than f faulty storage providers. For example, consider a simple scenario where the protocol is designed so that every storage provider stores all the data. In this scenario, m = n and f = m-1. Is it always f = m-1? No, some scenarios can be designed using erasure coding where each storage provider stores a specific portion of the data, so that x out of m storage providers are required to retrieve the data. In this case, f = MX. Non-reusable work: Most permissionless blockchains require miners to solve a tricky computational puzzle, such as inverting a hash function. Often, the solutions to these puzzles are useless and have no intrinsic value other than securing the network. Some blockchains such as Ethereum (executing smart contract logic) and Primecoin (finding new prime numbers) attempt to use some of the computational power to do useful work. Wasted work: Solving puzzles is really expensive in terms of machines and energy consumption, especially if they rely solely on computational power. When mining algorithms are embarrassingly parallel, the main factor in solving puzzles is computational power. Trying to reduce waste: Ideally, most of the network’s resources should be used for useful work. Some efforts require miners to use more energy-efficient solutions. For example, Spacemint requires miners to dedicate disk space instead of computation. Although these disks are more energy-efficient, they are still “wasted” because they are filled with random data. Other efforts replace puzzle solving with traditional proof-of-stake-based Byzantine agreements, in which stakeholders vote on the next block in proportion to their share of the currency in the system. Therefore, instead of wasting wasted proof-of-work computations, the work done by Filecoin miners enables them to participate in consensus. Useful Work: We consider the work done by miners in the consensus protocol to be useful if the results of the computations are valuable to the network beyond just securing the blockchain. Filecoin proposes a useful work consensus protocol where the network elects miners with a probability of creating new blocks (we call it the voting power of miners) proportional to the storage space they are currently using on the network. The Filecoin protocol is designed so that miners would rather invest in storage than in computing power to parallelize mining computations. Miners provide storage and reuse computations to prove that they have stored data to participate in consensus. Power in Filecoin: In Filecoin, the power p of a miner M at time t is the sum of M’s storage allocations. The impact of M on M is the fraction of M’s power over the total power in the network. In Filecoin, power has the following properties: 1. Public: The total amount of storage currently in use in the network is public. By reading the blockchain, anyone can calculate each miner's storage allocation - so anyone can calculate the amount of power and total power of each miner at any point in time. 2. Publicly verifiable: For each storage allocation, miners are required to generate a proof of spacetime to prove that the service is being provided. By reading the blockchain, anyone can verify that the power claimed by the miner is correct. 3. Variable: At any point in time, a miner can add new storage to the network by committing to a new sector and filling it in. In this way, miners can change the amount of power they have over time. To learn more about how this works (mathematically) in the consensus algorithm, see the whitepaper. We also need a mechanism to prevent malicious miners from using three attacks to obtain rewards for storage that they did not provide: Sybil attacks, Outsourcing attacks, and Generation attacks. Sybil attack: By creating multiple Sybil identities, a malicious miner can pretend to store (and get paid for) more copies than they actually store, but only store the data once. Outsourcing attacks: Malicious miners may rely on quickly obtaining data from other storage providers to commit to storing more data than they actually store. Generation Attacks: A malicious miner could claim to store a large amount of data, while they instead use a small program to efficiently generate that data on demand. If the program is smaller than the data allegedly stored, this increases the likelihood that the malicious miner will win a Filecoin block reward, which is proportional to the storage the miner is currently using. Storage providers must convince their clients that they have stored the data they are paying to store. In practice, the storage provider will generate a Proof of Storage (PoS) for the blockchain network (or the client itself) to verify. To make storage behavior publicly verifiable, Filecoin introduces two consensus algorithms: Proof of Replication (PoRep) and Proof of Spacetime (PoSt). Proof of Replication (PoRep) is a novel type of storage proof that allows a server (i.e., prover P) to convince a user (i.e., verifier V) that some data D has been replicated to its own unique dedicated physical storage. Our scheme is an interactive protocol where the prover P: (a) commits to store n different copies (physically independent copies) of some data D, and then (b) convinces the verifier V that P indeed stores each copy via a challenge/response protocol. PoRep improves upon PoR and PDP schemes, preventing Sybil attacks, outsourcing attacks, and generation attacks. Proof of Spacetime: Proof of Storage schemes allow users to check if a storage provider is storing outsourced data when challenged. How can we use PoS schemes to prove that some data is stored for a period of time? The natural answer to this problem is to require users to repeatedly (e.g., every minute) send challenges to storage providers. However, the communication complexity required for each interaction can become a bottleneck in systems such as Filecoin, where storage providers need to submit their proofs to the blockchain network. To address this problem, we introduce a new proof, Proof of Spacetime, in which verifiers can verify if a prover is storing their outsourced data for a period of time. 1. The intuition is to require the prover to generate sequential storage proofs (in our case, replication proofs) as a way to determine time. 2. Recursively combine executions to generate short proofs. Filecoin supports contracts specific to data storage, as well as more general smart contracts: File Contracts: We allow users to program the conditions under which they provide or offer storage services. There are several examples worth mentioning: (1) Contracting with miners: Clients can pre-specify miners who will provide services without participating in the market; (2) Payment policies: Clients can design different reward policies for miners, for example one contract can pay miners more and more over time, and another contract can set the storage price informed by a trusted oracle; (3) Ticketing services: Contracts can enable miners to deposit tokens and pay storage/retrieval fees on behalf of their users, and (4) More complex operations: Clients can create contracts that allow data updates. Smart contracts: Users can associate programs with their transactions, just like in other systems (e.g. in Ethereum), and they do not directly rely on the use of storage. We foresee applications such as decentralized naming systems, asset tracking, and crowdfunding platforms. Filecoin on other platforms: Other blockchain systems, such as Bitcoin, Zcash, and especially Ethereum and Tezos, allow developers to write smart contracts; however, these platforms provide very little storage functionality and are very expensive. We plan to provide a bridge to provide storage and retrieval support for these platforms. We note that IPFS is already used by multiple smart contracts (and protocol tokens) as a way to reference and distribute content. Adding support for Filecoin will enable these systems to guarantee storage of IPFS content in exchange for Filecoin tokens. Other platforms in Filecoin: We plan to provide bridges to connect other blockchain services with Filecoin. For example, integration with Zcash will allow support for sending requests to store data privately. Here we list some potential issues that are not well discussed in the whitepaper. Scalability of the retrieval market: The micropayment system (retrieval market) creates a lot of overhead on the retrieval protocol. In order to achieve retrieval speeds that match today’s centralized infrastructure, there needs to be a massive adoption of Filecoin and IPFS to create a dense state channel network. Censorship (illegal content): As we have seen in the past with Napster and The Pirate Bay, the lack of censorship will eventually lead to illegal content on the web, effectively bringing the dark web to the surface. A possible solution could be AI-based protocols that learn over time and automatically detect illegal content and take necessary actions. But in order for the web to be democratic, the protocol needs to be governed by the users themselves (thus introducing Byzantine behavior) to decide whether the content requires certain actions. So, to summarize censorship is a different problem for different people, and it requires a more personalized approach rather than a central public approach. Filecoin’s job is to create a market for data management, not to propose censorship management policies. Therefore, this “personalized” censorship layer can be transferred to applications on top of Filecoin. Here we list some possible improvements in the Filecoin protocol. Tahor-LAFS encryption scheme: When adding value, the client first encrypts it (using a symmetric key), then splits it into segments of manageable size, which are then erasure coded for redundancy. So, for example, "2-of-3" erasure coding means that the segment is split into 3 fragments in total, but any 2 of them are sufficient to reconstruct the original fragment (more on ZFEC). These segments then become shares, stored on specific storage nodes. Storage nodes are shared data repositories; users do not rely on them to guarantee the integrity or confidentiality of their data. Eventually, the encryption key and some information that helps find the right storage node become part of the "capability string" (more on the encoding process later). The important point is that the capability string is both necessary and sufficient to retrieve a value from the Grid - this operation will fail if too many nodes become unavailable (or offline) and you can no longer retrieve enough shares. There are write capabilities, read capabilities, and verify capabilities. A "less authoritative" capability can be used offline. That is, someone with a write capability can turn it into a read capability (without interacting with the server). The verify capability confirms the existence and integrity of a value, but cannot decrypt the contents. Both mutable and immutable values can be put into the Grid. Naturally, immutable values have no write capability at all. Awesome IPFS is a community-maintained and updated list of projects, tools, or just about anything IPFS-related that is awesome. To see more, or add your information to the list, visit Awesome IPFS on GitHub. About the author: Vaibhav Saini is the co-founder of TowardsBlockchain (a startup incubated by MIT Cambridge Innovation Center). He is a senior blockchain developer and has worked on multiple blockchain platforms like Ethereum, Quorum, EOS, Nano, Hashgraph, IOTA, etc. |
<<: William: Why are the miners so confident?
>>: Market activity declines, mining machines face replacement (201912)
Five of Britain’s largest funds have joined a sec...
Rage Review : Russia's only Central Securitie...
LTHN coin - Monero series V8 algorithm new virtua...
What do two marriage lines indicate? The marriage...
In modern society, unlike previous times, women d...
Everyone has different palm lines . Some people h...
According to Cuba's official news agency Lati...
Since October, the DCR hashrate has increased fro...
When we see others being pitiful, most people are...
Following the second Filecoin Miners Conference o...
In fact, a lot of problems can be revealed throug...
Judging from the face whether the children are go...
Judging a person's marital status by their ap...
A woman who is afraid of emotional hurt Since anc...
Everyone has some moles on their body. Moles on t...