How Ethereum works

How Ethereum works

Introduction

Whether you know what the Ethereum blockchain is or not, you’ve probably heard of it. It’s been in the news a lot lately, including on the cover of several professional magazines, but it can be hard to understand without a basic understanding of what Ethereum is. So, what is Ethereum? Essentially, it’s a public database that keeps a permanent record of digital transactions. Crucially, this database doesn’t require any central authority to maintain and protect it. Instead, it operates as a “trustless” transaction system — an architecture where individuals can conduct peer-to-peer transactions without having to trust any third party or other parties.

Still confused? That's why this post exists. My goal is to explain how Ethereum works on a technical level, but without complicated math or scary-looking formulas. Even if you're not a programmer, I hope that after reading this you'll at least have a better understanding of the technology. If some parts are too technical to understand, that's perfectly normal, and there's really no need to fully understand every little detail. I recommend just understanding things at a high level.

Many of the points in this post are breakdowns of concepts discussed in the Ethereum Yellow Paper. I added my own explanations and diagrams to make understanding Ethereum a little easier. Those brave enough to take the technical plunge can go read the Ethereum Yellow Paper.

Okay, let’s get started!

Blockchain definition

A blockchain is a cryptographically secure transactional singleton machine with shared-state. [1] That’s a bit long, isn’t it? Let’s break it down:

  1. Cryptographically secure means that a complex mathematical mechanism that is difficult to decipher is used to ensure the security of the production of digital currency. Think of it as a firewall. They make it almost impossible to cheat the system (for example: construct a fake transaction, eliminate a transaction, etc.).

  2. "Transactional singleton machine" means that there is only one authoritative machine instance responsible for transactions generated in the system. In other words, there is only one global truth that everyone believes in.

  3. "With shared-state" means that the state stored on this machine is shared and open to everyone.

Ethereum implements this paradigm of blockchain.

Ethereum Model Explanation

The essence of Ethereum is a transaction-based state machine. In computer science, a state machine is something that can read a series of inputs and then transform them into a new state based on these inputs.

According to Ethereum's state machine, we start from the genesis state. This is almost like a blank slate, and no transactions have been generated in the network yet. When transactions are executed, this genesis state will be transformed into the final state. At any time, this final state represents the current state of Ethereum.

The state of Ethereum has millions of transactions. These transactions are "grouped" into a block. A block contains a series of transactions, and each block is linked to its previous block.

In order for one state to transition to the next, the transaction must be valid. In order for a transaction to be considered valid, it must go through a verification process, which is mining. Mining is a group of nodes (i.e. computers) using their computing resources to create a block containing valid transactions.

Any node that claims to be a miner on the network can try to create and verify blocks. Many miners around the world create and verify blocks at the same time. Each miner provides a mathematical "proof" when submitting a block to the blockchain. This proof is like a guarantee: if this proof exists, then the block must be valid.

In order for a block to be added to the main chain, a miner must provide this "proof" faster than other miners. The process of verifying each block through a mathematical mechanism "proof" provided by the miner is called proof of work.

Miners who confirm a new block will be rewarded with a certain value. What is the reward? Ethereum uses an intrinsic digital token, Ether, as a reward. Every time a miner proves a new block, a new Ether is generated and rewarded to the miner.

You might be wondering: what ensures that everyone is on the same chain of blocks? How can we be sure that a small group of miners won’t create their own chain?

Earlier, we defined a blockchain as a single transaction machine with a shared state. Using this definition, we know that the correct current state is a global truth that everyone must accept. Having multiple states (or multiple chains) destroys the system because it is impossible to reach a consensus on which is the correct state. If the chain forks, you may have 10 coins on one chain, 20 coins on one chain, and 40 coins on another chain. In this scenario, there is no way to determine which chain is the most "valid".

Whenever multiple paths emerge, a "fork" occurs. We generally want to avoid forks because they corrupt the system and force people to choose which chain they believe in.

In order to determine which path is the most valid and prevent the generation of multiple chains, Ethereum uses a mathematical mechanism called the "GHOST protocol."

GHOST = Greedy Heaviest Observed Subtree

In simple terms, the GHOST protocol forces us to choose a path on which the most computations have been done. One way to determine the path is to use the block number of the most recent block (the leaf block), which represents the total number of blocks on the current path (excluding the genesis block). The larger the block number, the longer the path will be, which means that more mining power is consumed on this path to reach the leaf block. Using this reasoning allows us to agree on the authoritative version of the current state.

Now that you have a general idea of ​​what blockchain is, let’s take a deeper look at the main components of the Ethereum system:

  1. Accounts

  2. state

  3. Gas and fees

  4. Transactions

  5. Blocks

  6. Transaction execution

  7. Mining

  8. Proof of Work

A word of caution before we begin: whenever I say something is a hash, I’m referring to the KECCAK-256 hash, which is the hash algorithm used by Ethereum.

Account

Ethereum's global "shared state" is made up of many small objects (accounts) that can interact with each other through a message passing architecture. Each account has a state associated with it and a 20-byte address. In Ethereum, an address is a 160-bit identifier that identifies an account.

There are two types of accounts:

  1. Externally owned accounts that are controlled by private keys and have no code associated with them

  2. Contract accounts are controlled by their contract code and have code associated with them


Comparison of Externally Owned Accounts and Contract Accounts

It is important to understand the basic difference between externally owned accounts and contract accounts. An externally owned account can send a message to another externally owned account or contract account by creating and signing the transaction with its own private key. A message sent between two externally owned accounts is just a simple transfer of value. But a message from an externally owned account to a contract account activates the contract account's code, allowing it to perform various actions. (such as transferring tokens, writing to internal storage, mining a new token, performing some calculations, creating a new contract, etc.).

Unlike externally owned accounts, contract accounts cannot initiate a transaction on their own. Instead, a contract account can only trigger a transaction in response to a transaction after receiving one (from an externally owned account or another contract account). We will learn about communication between contracts in the "Transactions and Messages" section.

Therefore, any action on Ethereum is always initiated by an exchange triggered by an externally controlled account.


Account Status

There are four components to the account status, which exist regardless of the account type:

  1. nonce: If the account is an externally owned account, nonce represents the transaction serial number sent from this account address. If the account is a contract account, nonce represents the contract serial number created by this account.

  2. Balance: The amount of Wei owned by this address. 1Ether=10^18Wei

  3. storageRoot: The Hash value of the root node of the Merkle Patricia tree (we will explain the Merkle tree later). The Merkle tree will encode the Hash value of the storage content of this account. The default value is null

  4. codeHash: The hash value of the EVM (Ethereum Virtual Machine, which will be discussed later) code for this account. For contract accounts, this is the hashed code and saved as codeHash. For externally owned accounts, the codeHash field is a hash value of an empty string.


World State

Well, we know that the global state of Ethereum is composed of a mapping between account addresses and account states. This mapping is stored in a data structure called a Merkle Patricia tree.

A Merkle Tree (also called a Merkle trie) is a binary tree consisting of a series of nodes:

  1. A large number of leaf nodes at the bottom of the tree that contain source data

  2. A series of intermediate nodes, which are the hash values ​​of the two child nodes

  3. A root node, which is also the hash value of two child nodes, represents the entire tree

The data at the bottom of the tree is generated by splitting the data we want to store into chunks, then dividing the chunks into buckets, and then getting the hash value of each bucket and repeating until there is only one hash left: the root hash.

This tree requires that each value in it has a corresponding key. Starting from the root node of the tree, the key will tell you which child node to follow to get the corresponding value, and this value is stored in the leaf node. In Ethereum, key/value is a mapping between the state of the address and the account associated with the address, including the balance, nonce, codeHash and storageRoot of each account (storageRoot itself is a tree).

The same tree structure is also used to store transactions and receipts. More specifically, each block has a header that stores the hashes of the root nodes of three different Merkle trie structures, including:

  1. State Tree

  2. Transaction Tree

  3. Receipt Tree

The efficiency of storing all information in Merkle tries is extremely useful in “light clients” and “light nodes” in Ethereum. Remember that a blockchain is maintained by a collection of nodes. Broadly speaking, there are two types of nodes: full nodes and light nodes.

A full node syncs by downloading the entire chain, from the genesis block to the current block, and executing all transactions contained in it. Usually, miners store full nodes because they need them for the mining process. It is also possible to download a full node without executing all transactions. In any case, a full node contains the entire chain.

However, unless a node needs to execute all transactions or easily access historical data, there is no need to save the entire chain. This is where the concept of light nodes comes from. Instead of downloading and storing the entire chain and executing all transactions in it, light nodes only download the header of the chain, from the genesis block to the header of the current block, without executing any transactions or retrieving any associated states. Since light nodes can access the header of the block, which contains the hash of the 3 tries, all light nodes can still easily generate and receive verifiable answers about transactions, events, balances, etc.

This works because hashes propagate upward in a Merkle tree - if a malicious user attempts to swap a transaction at the bottom of the tree with a fake one, this will change the hash of the node above it, which will change the hash of the node above it, and so on, all the way to the root of the tree.

Any node that wants to verify some data can do so through Merkle proof, which consists of:

  1. A piece of data that needs to be verified

  2. The root node of the tree Hash

  3. A "branch" (all hash values ​​on the path from chunk to root)

Anyone who can read the proof can verify that the hash of the branch is coherent, and therefore that a given block actually exists at that location in the tree.

In summary, the benefit of using a Merkle Patricia tree is that the root node of the structure is cryptographically dependent on the data stored in the tree, and the hash of the node can also serve as a secure identifier for that data. Since the block header contains the root hash of the state, transaction, and receipt tree, any node can verify a small portion of the Ethereum state without having to save the entire state, which could be very large.

Gas and Fees

An important concept in Ethereum is fees. Every calculation generated by a transaction on the Ethereum network will incur a fee - there is no free lunch. This fee is paid in the form of something called "gas".

Gas is a unit used to measure the cost required for a specific calculation. Gas price is the amount of Ether you are willing to spend on each gas, measured in "gwei". "Wei" is the smallest unit of Ether, 1Ether represents 10^18Wei. 1gwei is 1,000,000,000 Wei.

For each transaction, the sender sets a gas limit and a gas price. The gas limit and gas price represent the maximum amount of Wei the sender is willing to pay to execute the transaction.

For example, suppose the sender sets the gas limit to 50,000 and the gas price to 20gwei. This means that the sender is willing to pay a maximum of 50,000*20gwei = 1,000,000,000,000,000 Wei = 0.001 Ether to execute this transaction.

Remember that the gas limit represents the maximum amount of money a user is willing to spend on gas. If there is enough Ether in their account balance to cover this maximum fee, then there is no problem. Any unused gas at the end of the transaction is returned to the sender, redeemed at the original rate.

When the sender does not provide enough gas to execute the transaction, the transaction will fail due to "out of gas" and will be considered invalid. In this case, the transaction processing will be aborted and all the changed state will be reverted, and we will end up back to the state before the transaction - the exact state before the transaction had never taken place. Because the machine still made an effort to perform the calculation before running out of gas,
So in theory, no gas will be returned to the sender.

Where does the gas money go? All the money the sender spends on gas is sent to a "beneficiary" address, which is usually the miner's address. Because miners have put in the effort to calculate and verify transactions, miners receive gas fees as a reward.

Generally, the higher the gas price the sender is willing to pay, the more value the miner will get from the transaction. Therefore, the miner is more willing to choose the transaction. In this way, miners can freely choose a transaction to verify or ignore. In order to guide the sender on how much gas price should be set, miners can choose to suggest a minimum gas value they are willing to execute a transaction.

Storage also costs money

Gas is not only used to pay for the computation, but also for storage. The total cost of storage is proportional to the smallest multiple of 32-bit bytes used.

There are some subtle aspects to storage fees. For example, since added storage increases the size of the Ethereum state database on all nodes, there is an incentive to keep data storage small. For this reason, if a transaction has a step that clears a storage entity, the fee for performing this operation is waived and a refund is returned to the sender for the freed storage space.

What is the role of fees?

An important aspect of Ethereum's ability to work is that every operation performed on the network is also affected by a full node. However, computational operations are very expensive on the Ethereum Virtual Machine. Therefore, Ethereum smart contracts are best used to perform the simplest tasks, such as running a simple business logic or verifying signatures and other cryptographic objects, rather than for complex operations such as file storage, email, or machine learning, which would put pressure on the network. Imposing fees prevents users from overloading the network.

Ethereum is a Turing-complete language (in short, a Turing machine is a machine that can simulate any computer algorithm. For those who are not familiar with Turing machines, you can check out this and this). This allows loops and makes Ethereum subject to the Halting Problem, which is a problem where you cannot be sure that a program will run indefinitely. Without fees, a malicious actor could easily bring down the network without any repercussions by executing a transaction containing an infinite loop. Therefore, fees protect the network from deliberate attacks.

You might be thinking, “Why do we need to pay for storage?” Just like computation, storage on the Ethereum network is a cost that must be borne by the entire network.

Transactions and messages

As mentioned before, Ethereum is a transaction-based state machine. In other words, transactions between two different accounts are what transform the Ethereum global state from one state to another.

In the most basic sense, a transaction is a cryptographically signed instruction generated by an external account, serialized, and then submitted to the blockchain.

There are two types of transactions: message communication and contract creation (that is, a transaction generates a new Ethereum contract).

Regardless of the type of transaction, it includes:

  1. nonce: a count of the number of transactions sent by the sender

  2. gasPrice: The number of Wei per gas the sender is willing to pay to execute the transaction

  3. gasLimit: The maximum amount of gas the sender is willing to pay to execute the transaction. Once this amount is set, it will be deducted before any calculation is completed.

  4. to: The address of the recipient. In the contract creation transaction, the address of the contract account does not exist yet, so the value is empty.

  5. value: The amount of Wei transferred from the sender to the receiver. In a contract creation transaction, value is used as the starting balance of the newly created contract account.

  6. v,r,s: used to generate a signature that identifies the transaction

  7. init (only exists in contract creation transactions): The EVM code snippet used to initialize the new contract account. The init value is executed once and then discarded. When init is executed for the first time, it returns an account code body, which is a piece of code permanently associated with the contract account.

  8. data (optional field, only exists in message communication): input data (i.e. parameters) in the message call. For example, if the smart contract is a domain name registration service, then the calling contract may expect input fields such as domain name and IP address

In the "Account" chapter we learned that transactions - both message communication and contract creation transactions are always triggered by external accounts and submitted to the blockchain. Another way of thinking is that transactions are the bridge between the outside world and the internal state of Ethereum.

But this does not mean that one contract cannot communicate with another. Contracts in the global scope of the Ethereum state can communicate with contracts in the same scope. They communicate through "messages" or "internal transactions". We can think of messages or internal transactions as similar to transactions, but with the biggest difference from transactions - they are not generated by externally owned accounts. Instead, they are generated by contracts. They are virtual objects, unlike transactions, are not serialized and only exist in the Ethereum execution environment.

When a contract sends an internal transaction to another contract, the code associated with the recipient contract account is executed.

An important thing to note is that internal transactions or messages do not contain a gasLimit. This is because the gas limit is set by the external creator of the original transaction (i.e. the external owning account). The gas limit set by the external owning account must be high enough to complete the transaction, including any "child executions" spawned by this transaction, such as contract-to-contract messages. If, in a transaction or message chain, one of the message executions runs out of gas, then the execution of that message will be reverted, including any child messages triggered by that execution. However, the parent execution does not necessarily have to be reverted.

Block

All transactions are grouped into a "block". A blockchain consists of a series of these blocks chained together.

In Ethereum, a block contains:

  1. Block header

  2. Information about the set of transactions included in this block

  3. A list of other block headers related to the current block's ommers

Ommers explains

What exactly is “ommer”? An ommer is a block whose parent is the same as the parent of the current block’s parent. Let’s quickly go over what ommers are used for and why a block needs to include a header for ommers.

Due to Ethereum's construction, its block production time (about 15 seconds) is much faster than other blockchains such as Bitcoin (about 10 minutes). This allows transactions to be processed faster. However, one disadvantage of shorter block production time is that more competing blocks will be discovered by miners. These competing blocks are also called "orphan blocks" (blocks that are mined but not added to the main chain).

The purpose of Ommers is to help reward miners for including these orphan blocks. The ommers that miners include must be valid, which means that the ommers must be within the 6th child block of the parent block or less. After the 6th child block, stale orphan blocks will no longer be referenced (because including old transactions will complicate things a bit).

Ommer blocks receive a slightly smaller reward than full blocks. However, there is still an incentive for miners to include orphan blocks and receive some reward from them.

Block header

Let's get back to the issue of blocks. We mentioned earlier that each block has a "block header", but what exactly is it?

The block header is part of a block and contains:

  1. parentHash: The hash value of the parent block header (this is also what makes the block a blockchain)

  2. ommerHash: Hash value of the current block ommers list

  3. beneficiary: The account address that receives the fee for mining this block

  4. stateRoot: The hash value of the root node of the state tree (recall what we said before about the state tree stored in the header and how it makes it very simple for light clients to verify anything about the state)

  5. transactionsRoot: The root node hash value of the tree containing all transactions listed in this block

  6. receiptsRoot: The root node hash value of the tree containing all transaction receipts listed in this block

  7. logsBloom: A Bloom filter consisting of log information (data structure)

  8. difficulty: The difficulty level of this block

  9. number: the count of the current block (the block number of the genesis block is 0, and the block number increases by 1 for each subsequent block)

  10. gasLimit: The current gas limit for each block

  11. gasUsed: The total amount of gas used by transactions in this block

  12. timestamp: The Unix timestamp when this block was created

  13. extraData: Additional data related to this block

  14. mixHash: A hash value that, when combined with a nonce, proves that enough computation has been performed on this block

  15. nonce: A hash value that, when combined with mixHash, proves that enough computation has been performed on this block

Note how each block contains three tree structures, corresponding to:

  1. State (stateRoot)

  2. Transactions (transactionsRoot)

  3. Receipts (receiptsRoot)

These three tree structures are the Merkle Patricia trees we discussed earlier.

In addition, there are several terms described above that are worth explaining. Let’s take a look at them below.

log

Ethereum allows logs to track various transactions and information. A contract can explicitly generate logs by defining "events".

A log entity contains:

  1. The account address of the recorder

  2. A collection of topics representing the various events that occurred during the execution of this transaction and any data associated with those events

Logs are stored in bloom filters, which efficiently store endless log data.

Transaction Receipt

The log information is stored in the header because it is included in the transaction receipt. Just like the receipt you get when you buy something at a store, Ethereum generates a receipt for each transaction. As you would expect, each receipt contains specific information about the transaction. These receipts contain:

  1. Block number

  2. Block Hash

  3. Transaction Hash

  4. Gas used by the current transaction

  5. The cumulative gas used by the current block after the current transaction is executed

  6. Logs created when executing the current transaction

  7. etc.

Block Difficulty

The difficulty of a block is used to enforce consistency when validating blocks. The difficulty of the Genesis block was 131,072, and a special formula is used to calculate the difficulty of each subsequent block. If a block is validated faster than the previous block, the Ethereum protocol will increase the difficulty of the block.

The difficulty of a block affects the nonce, which is a hash value that must be calculated using the proof-of-work algorithm when mining.

The relationship between block difficulty and nonce can be expressed in mathematical form as follows:

Hd stands for difficulty.

The only way to find a nonce that meets the difficulty threshold is to use a proof-of-work algorithm to enumerate all the possibilities. The expected time to find a solution is proportional to the difficulty - the higher the difficulty, the harder it is to find the nonce, and therefore the harder it is to verify a block, which in turn increases the time required to verify a new block. So, by adjusting the block difficulty, the protocol can adjust the time required to verify a block.

On the other hand, if verification time gets slower and slower, the protocol will lower the difficulty so that verification time self-regulates to maintain a constant rate — one block every 15 seconds on average.

Trade Execution

We have reached the most complex part of the Ethereum protocol: the execution of transactions. Let’s say you send a transaction to the Ethereum network for processing. What exactly happens in the process of transforming the Ethereum state into one that includes your transaction?

First, in order to be executed, all transactions must meet a basic set of requirements, including:

  • Transactions must be properly formatted RLP. "RLP" stands for Recursive Length Prefix, which is a data format used to encode nested arrays of binary data. Ethereum uses the RLP format to serialize objects.

  • A valid transaction signature.

  • A valid transaction number. Recall that the nonce in an account is a count of transactions sent from this account. If valid, the transaction number must be equal to the nonce in the sending account.

  • The gas limit of a transaction must be equal to or greater than the intrinsic gas used by the transaction. Intrinsic gas includes:
    ——-1. The transaction booking fee is 21,000 gas
    ——-2. Gas fees for data sent with the transaction (4 gas per byte of data or code with a zero value, 68 gas per non-zero byte of data or code)
    ——-3. If the transaction is a contract creation transaction, an additional 32,000 gas is required

  • The sending account balance must have enough Ether to pay the "upfront" gas fee. The calculation of the upfront gas fee is relatively simple: first, the gas limit of the transaction is multiplied by the gas price of the transaction to get the maximum gas fee. Then, this maximum gas fee is added to the total value transferred from the sender to the receiver.

If the transaction meets all the requirements mentioned above, then we proceed to the following steps.

In the first step, we deduct the upfront cost of the execution from the sender's balance and increase the nonce in the sender's account by 1 for the current transaction. At this point, we can calculate the remaining gas by subtracting the intrinsic gas used from the total gas of the transaction.

The second step is to start executing the transaction. Throughout the transaction execution, Ethereum keeps track of "substates". Substates are a way of recording information generated in a transaction that is needed immediately when the transaction is completed. Specifically, it contains:

  1. Self-destruct set: the set of accounts that will be discarded after the transaction is completed (if any)

  2. Logging Series: Archiving and Retrievable Checkpoints of Code Execution in Virtual Machines

  3. Refund Balance: The total amount that needs to be refunded to the sending account after the transaction is completed. Recall that we mentioned earlier that storage in Ethereum is charged, and the sender will get a refund if they clean up the memory. Ethereum uses a refund count to track the refund balance. The refund count starts at 0 and increases every time the contract deletes something in the storage.

In the third step, various calculations required for the transaction begin to be processed.

When all the steps required for the transaction are processed, and assuming there are no invalid states, the final state is determined by determining the amount of unused gas that is refunded to the sender. In addition to the unused gas, the sender will also receive some allowances refunded in the "refund balance" mentioned above.

Once the sender gets the refund:

  1. Gas's Ether will be mined

  2. The gas used by the transaction will be added to the gas count of the block (the count keeps track of the total amount of gas used by all transactions in the current block, which is very useful when verifying the block)

  3. All accounts in the self-destruct set (if any) will be deleted

At the end, we have a new state and a set of logs created by the transaction.

Now that we have covered the basics of transaction execution, let’s look at some of the differences between contract creation transactions and message communication.

Contract creation

Recall that in Ethereum, there are two types of accounts: contract accounts and externally owned accounts. When we say a transaction is a "contract creation", it means that the purpose of the transaction is to create a new contract account.

To create a new contract account, we use a special formula to declare the address of the new account. Then we use the following method to initialize an account:

  1. Set nonce to 0

  2. If the sender sends a certain amount of Ether as value through the transaction, then set the account balance to value

  3. Set storage to 0

  4. Set the contract's codeHash to a Hash value of an empty string

Once we have completed the account initialization, using the init code sent by the transaction (see the "Transactions and Messages" section to review the init code), we actually created an account. The execution process of the init code is various. Depending on the constructor of the contract, it may be to update the storage of the account, or to create another contract account, or to initiate another message communication, etc.

When the code that initializes the contract is executed, gas is used. Transactions are not allowed to use more gas than the remaining gas. If it uses more gas than the remaining gas, an Out of Gas (OOG) exception occurs and the transaction exits. If a transaction exits due to an Out of Gas exception, the state is immediately restored to a point before the transaction. The sender does not receive the gas spent before the gas runs out.

However, if the sender sent Ether along with the transaction, the Ether will be returned even if the contract creation fails.

If the initialization code completes successfully, the final contract creation costs are paid. These are storage costs, proportional to the size of the contract code created (again, there is no free lunch). If there is not enough remaining gas to pay the final costs, the transaction will again declare an insufficient gas exception and abort.

If all goes well without any anomalies, any remaining unused gas is refunded to the original sender of the transaction, and the changed state is now allowed to be saved permanently.

Message calls

The execution of message communication is similar to contract creation, but there are some differences.

Since no new accounts are created, the execution of the message communication does not contain any init code. However, it can contain input data if the transaction sender provides this data. Once executed, the message communication will also have an additional component to contain the output data, which will be used if the subsequent execution requires this data.

Just like contract creation, if the message communication execution exits due to insufficient gas or invalid transaction (such as stack overflow, invalid jump destination or invalid instruction), the used gas will not be refunded to the original trigger. Instead, all remaining unused gas will be consumed and the state will be immediately reset to the point before the balance transfer.

Until the latest Ethereum update, there was no way to stop or resume the execution of a transaction without having the system consume all the gas you provided. For example, suppose you wrote a contract that throws an error when the caller is not authorized to execute those transactions. In previous versions of Ethereum, the remaining gas was also consumed, and no gas was returned to the sender. But the Byzantium update includes a new "recovery" code that allows a contract to stop execution and revert state changes without consuming remaining gas. This code also has the ability to return the reason why the transaction failed. If a transaction is aborted due to recovery, the unused gas is returned to the sender.

Execution Mode

So far, we have seen the series of steps that a transaction must go through from start to finish. Now, let's look at how transactions are actually executed in a virtual machine (VM).

The part of the protocol that actually handles transaction processing is Ethereum’s own virtual machine, called the Ethereum Virtual Machine (EVM).

As defined above, the EVM is a Turing-complete virtual machine. The only limitation that exists for the EVM that does not exist for a typical Turing-complete machine is that the EVM is inherently gas-bound. Therefore, the total amount of computation that can be done is inherently limited by the total amount of gas provided.

In addition, EVM has a stack-based architecture. A stack machine is a computer that saves temporary values ​​after it comes in and out.

The size of each stack item in EVM is 256 bits, and the stack has a maximum size of 1024 bits.

EVM has memory, and items are stored in addressable byte arrays. Memory is volatile, that is, data is not persistent.

EVM also has a memory. Unlike memory, memory is nonvolatile and maintained as part of the system state. EVM saves program code separately and can only be accessed by special instructions in virtual ROM. In this way, EVM is different from the typical von Neumann architecture, which stores program code in memory or memory.

EVM also has its own language: "EVM bytecode". When a programmer, such as you or I, write a smart contract running on Ethereum, we usually write code in a high-level language such as Solidity. We can then compile it into EVM bytecode that EVM can understand.

OK, now it's about execution.

Before performing a specific calculation, the processor determines whether the information mentioned below is valid and whether it is available:

  1. System Status

  2. Remaining gas used for calculation

  3. The account address that has the code execution

  4. The address of the original transaction sender that triggered the execution

  5. The account address that triggers the code execution (may be different from the original sender)

  6. The price of the transaction gas that triggered this execution

  7. Input data executed this time

  8. Value (in Wei) is passed to the account as part of the current execution

  9. Machine code to be executed

  10. The block header of the current block

  11. The depth of the current message communication or contract creation stack

At the beginning of execution, the memory and stack are empty, and the program counter is 0.

1

PC: 0 STACK: [] MEM: [], STORAGE: {}

Then the EVM begins to execute transactions recursively, calculating the system state and machine state for each loop. The system state is the global state of Ethereum. The machine state includes:

  1. Gas available

  2. Program Counter

  3. Memory content

  4. The number of active words in memory

  5. Contents of the stack

Items in the stack are deleted or added from the leftmost part of the series.

In each cycle, the remaining gas will be reduced by the corresponding amount, and the program counter will be increased.
At the end of each loop, there are three possibilities:

  1. The machine reaches an exception state (such as insufficient gas, invalid instructions, insufficient stack items, overflow of 1024, invalid JUMP/JUMPI destination, etc.) Therefore, stop and discard any changes

  2. Enter the next loop of subsequent processing

  3. The machine has reached a controlled stop (reached the end of the execution process)

Assuming that the execution does not encounter an exception state and reaches a "controllable" or normal stop, the machine will generate a synthetic state, with the remaining gas after execution, the generated substate, and the combined output.

Huh. We finally went through the hardest part of Ethereum. If you can't fully understand this part, it doesn't matter. Unless you're understanding something very deep, you really don't have to understand every detail of execution.

How is a block done?

Finally, let's see how a block with many transactions is completed.

When we say "complete", depending on whether the block is new or existing, we can refer to two different things. If it is a new block, it refers to the processing required to dig this block. If it is an existing block, it refers to the processing to verify this block. In either case, there are 4 requirements for "complete" of a block:
1) Verification (or, if it is mining, it is sure) ommers
Each ommer in the block header must be a valid header and must be within 6 generations of the current block

2) Verify (or, if it is mining, it is determined) the number of gasUsed in the transaction block must be equal to the accumulated gas used by the transaction listed in the block. (Recall that when a transaction is executed, we will track the gas counter of the block, which will track the total number of gas used by all transactions in the block)

3) Apply for rewards (only when mining)
The beneficiary's address will receive 5Ether for mining (in the Ethereum EIP-649 proposal, 5ETH will be reduced to 3ETH soon). In addition, for each ommer, the beneficiary of the current block will receive an additional 1/32 of the current block reward. Recently, the beneficiary of each ommer block can receive a certain amount of reward (there is a special formula to calculate).

4) Verify (or, if it is mining, it is to calculate a valid state) and nonce
Make sure that all transactions and changed result states are applied, and then define a new block as state after the block reward is applied to the final transaction result state. Verify by checking the final state with the state tree stored in the header.

Proof of Work Mining

In the chapter "Blocks", I briefly explained the concept of block difficulty. The algorithm that gives block difficulty is called Proof of Work (PoW).

Ethereum's proof-of-work algorithm is called "Ethash" (formerly called Dagger-Hashimoto).
The algorithm is formally defined as:

m represents mixHash, n represents nonce, Hn represents the header of the new block (not including nonce and mixHash that need to be calculated), Hn is the nonce of the block header, d is DAG, which is a large data set.

In the "Block" chapter, we discuss several items that exist in the block header. Two of them are called mixHash and nonce. Maybe you will recall:

  1. mixHash: A Hash value, when combined with nonce, proves that this block has performed sufficient calculations

  2. nonce: A Hash value, when combined with mixHash, proves that this block has performed sufficient calculations

The PoW function is used to estimate these two terms.
How mixHash and nonce are calculated using PoW functions is a bit complicated. If you have a deeper understanding, we can write another article to explain it. But at a high level, it is roughly calculated like this:
A "seed" is calculated for each block. The seeds of each "era" are different, each period is 30,000 blocks in length. For the first period, the seed is the hash value of 32-bit 0. For each subsequent period, the seed is the hash value of the previous seed hash value. Using this seed, the node can calculate a pseudo-random "cache".

This cache is very useful because it can make the concept of "light nodes" a reality, which was discussed earlier in this article. The purpose of a light node is to enable a node to efficiently verify transactions without storing the data set of the entire blockchain. A light node can verify the validity of a transaction based solely on the cache, because the cache can regenerate specific blocks that need to be checked.

Using this cache, nodes can generate DAG "datasets", each item in the dataset depends on a small number of pseudo-random selections in the cache. In order to become a miner, you need to generate the full dataset, all the all clients and miners save the dataset, and this dataset grows linearly over time.

The miner can then randomly extract parts of the dataset and put them into a mathematical function to hash a "mixHash". The miner will repeatedly generate mixHash until the output value is less than the desired target value nonce. When the output value meets this condition, nonce is considered valid, and the block is added to the chain.

Mining as a security mechanism

In general, the purpose of PoW is to prove in a cryptographically secure way that some of the generated outputs (i.e. nonce) have been calculated by a certain amount. Because there is no better way to find a nonce below the required threshold except for listing all possibilities. The output of the Hash function is uniformly distributed, so we can ensure that on the average, the time required to find the nonce that meets the requirement depends on the difficulty threshold. The greater the difficulty coefficient, the longer it takes. In this way, the PoW algorithm gives the meaning of the concept of difficulty: used to enhance the security of the blockchain.

What do we mean by blockchain security? It's very simple: we want to create a blockchain that everyone trusts. As we discussed in this article before, if there are more than one chain, the user's trust will disappear because they do not have the ability to reasonably confirm which chain is "effective". In order for a group of users to accept the potential state stored in the blockchain, we need an authoritative blockchain that a group of people trust.

This is exactly what the Pow algorithm does: it ensures that a particular blockchain remains authoritative until the future, and it becomes very difficult to create a new block to rewrite a part of the history (such as clearing a transaction or creating a fake transaction) or keeping a fork. In order to first get their blocks validated, the attacker needs to always solve the nonce problem faster than others on the network, so that the network will believe that their chain is the heaviest chain (based on the GHOST protocol principle we mentioned earlier). This is basically impossible unless the attacker has more than half of the network's mining capabilities (also known as most 51% attacks).


Mining as a Wealth Distribution Mechanism

In addition to providing a secure blockchain, PoW is also a way to allocate wealth to those who spend their own computing power to provide this security. Recall that a miner will receive rewards when he mines a block, including:

  1. 5 ether static block rewards for the "winning" block (it will become 3 ethers immediately)

  2. Gas consumed by transactions in a block within a block

  3. Additional rewards for incorporating ommers as part of the block

To ensure that the use of security and wealth allocation by the PoW consensus algorithm mechanism is sustainable in the long term, Ethereum strives to instill these two characteristics:

  1. Make it accessible to many people as much as possible. In other words, people do not need special or unique hardware to run this algorithm. The purpose of this is to make the wealth distribution model as open as possible so that anyone can provide some computing power and get Ether in return.

  2. Reducing any single node (or group) can create a disproportionate profit possibility. Any node that can create a disproportionate profit has a greater influence to determine the authoritative blockchain. This is a hassle because it reduces the security of the network.

In blockchain networks, a problem related to the above two features is that the PoW algorithm is a SHA256 hash function. The disadvantage of this function is that it uses special hardware (also known as ASCIs) to solve the nonce problem more quickly and efficiently.

To alleviate this problem, Ethereum chose to make the PoW algorithm (Ethhash) more difficult to increase the memory level. This means that this algorithm is designed to calculate the required nonce requires a lot of memory and bandwidth. The demand for large amounts of memory makes it extremely difficult for computers to use memory in parallel to calculate multiple nonces at the same time, and the demand for high bandwidth makes it very difficult for even supercomputers to calculate multiple nonces at the same time. This method reduces the risk of centralization and provides a more leveling playing field for the ongoing verification points.

One thing worth noting is that Ethereum is gradually transforming from the PoW consensus mechanism to a consensus algorithm called "Proof of Stake (PoS). This is a relatively ambitious topic, and we hope to explore this topic in future articles.

Summarize

Huh! You finally stick to the end. I hope so?

There are many things in this article that need to be digested. If you need to read it several times to understand what is going on, it is completely normal. I personally read the Ethereum Yellow Book, White Paper, and different parts of the code several times before gradually understanding what is going on.

Anyway, I hope you find this post helpful. If you find any errors or mistakes, I would be happy to write me a private message or comment directly in the comments section (I promise I will check all the comments).

Remember, I am a human (yes, it's true) and I make mistakes. I took the time to write this article for free for the benefit of the community. So please don't have unnecessary aggression when it comes to feedback, try to be constructive feedback.

Ethereum's Yellow Book

<<:  Masters, a former JP Morgan trader, says Bitcoin’s breakthrough is just beginning

>>:  Who is the real Bitcoin after the fork? Xapo chooses the chain with the highest cumulative difficulty

Recommend

What kind of girls are smart, especially the first type

People all want to interact with smart people, wh...

Is it a good or bad fortune for a person with a low and narrow forehead?

A person's facial features have a great influ...

Blockchain Technology (10): Ethereum Decentralized Taobao Smart Contract Case

Article summary image: In this article, we will i...

Coin Zone Trends: Bitcoin Price Trends Based on Big Data This Week (2017-03-01)

High-level wide fluctuations and short-term trend...

Analyze whether you have difficulty making choices based on your facial features

Life is a question bank. No matter where we are, ...

Mole location on female chest

Everyone has different moles on their body, and d...

MONA Mining Tutorial

MonaCoin, abbreviated as MONA, Chinese name: Mona...

Palm lines that threaten marriage

Palm lines that threaten marriage Palmistry is th...

What do different head shapes mean?

In physiognomy, if a woman holds her head particu...

Is a woman with 10 dustpans destined to be a phoenix?

A person's life is closely related to every p...

Vitalik mentioned ETC twice recently and talked about the characteristics of ETC

On July 13 this year, Vitalik Buterin posted on t...

How to tell fortune by palmistry

Many people know that fortune tellers can calculat...

Where does the lucky mole grow?

The moles that can be called "lucky moles&qu...