Dive into EVM data structures, transaction receipts and event logs

Understanding the data structures that make up a blockchain helps us think of creative ways to parse this data.

**Written by:**NOXX

Compile: Flush

Navigating on-chain data is an essential skill for anyone wishing to understand the Web3 space. Understanding the data structures that make up a blockchain helps us think about creative ways to parse this data. At the same time, this on-chain data constitutes a large portion of the available data. This post will delve into a key data structure in the EVM, the transaction receipt and its associated event log.

Why log

Before we start, let's briefly talk about why we need to use event logs as a solidity developer:

  • The event log is a cheaper option for data storage that does not need to be accessed by the contract, and can also reconstruct the stored state by testing specific variables in the smart contract, indexing variables.
  • Event logging is a way to trigger a Web3 application listening to a specific event log.

EVM nodes do not need to keep logs forever, and can save space by deleting old logs. Contracts do not have access to log storage, so nodes do not need them to execute contracts. Contract storage, on the other hand, is required for execution and therefore cannot be deleted.

Ethereum Block Merkle Root

In Part 4, we delved into the Ethereum framework, especially the state Merkle root part. The State Root is one of three Merkle roots included in the block header. The other two are Transaction Root and Receipt Root.

For input to building this framework, we will refer to block 15001871 on Ethereum, which contains 5 transactions with their associated receipts and event logs sent.

block header

We will start with 3 parts in the block header, Transaction Root, Receipt Root and Logs Bloom (a brief introduction to the block header can be reviewed in Part 4).

Source:

In the Ethereum client under Transaction Root and Receipt Root, Merkle Patricia Tries contains all transaction data and receipt data in the block. This article will only focus on all transactions and receipts that a node can access.

The block header information of block 15001871 found through the Ethereum node is as follows:

The logsBloom in the block header is a key data structure, which will be mentioned later in this article. First let's start with the data located under the Transaction Root, the Transaction Trie.

Transaction Tree Transaction Trie

Transaction Trie is a data set that generates transactionsRoot and records transaction request vectors. Transaction request vectors are pieces of information required to execute a transaction. The data fields contained in a transaction are as follows:

  • Type - transaction type (LegacyTxType traditional transaction, AccessListTxType EIP-2930 introduction, DynamicFeeTxType EIP-1559 introduction)
  • ChainId - EIP155 chain ID of the transaction
  • Data - input data of the transaction
  • AccessList - access list for transactions
  • Gas - the gas limit of the transaction
  • GasPrice - the gas price of the transaction
  • GasTipCap - the incentive premium for miners whose transaction unit gas exceeds the base fee to package first, maxPriorityFeePerGas in Geth is defined by EIP1559
  • GasFeeCap - the upper limit of gas fee per unit of transaction, maxFeePerGas in Geth (GasFeeCap ≥ baseFee + GasTipCap)
  • Value - the amount of Ethereum traded
  • Nonce - the nonce of the originator of the trading account
  • To - The recipient address of the transaction. For contract creation transactions, To returns a nil value
  • RawSignaturues - Signature values V, R, S of transaction data

After understanding the above data fields, let's take a look at the first transaction of block 15001871

Through Geth's ethclient query, you can see that both ChainId and AccessList have "omitempty", which means that if the field is empty, it will be omitted in the response to reduce or shorten the size of the serialized data.

code source:

This transaction represents the transfer of USDT tokens to the 0xec23e787ea25230f74a3da0f515825c1d820f47a address. The To address is the ERC20 USDT contract address 0xdac17f958d2ee523a2206206994597c13d831ec7. Through INPUT DATA, we can see that the function signature 0xa9059cbb corresponds to the function Transfer (Address, UINT256), and 42.251 USDT (accuracy of 6) to 0x2b279b8 (45251000) is transferred to 0xEC23E787EA25230F to 0xEC23E787EA25230F. 74A3DA0F515825C1D820F47A address.

You may have noticed that this transaction data structure doesn't tell us anything about the outcome of the transaction, so did the transaction succeed? How much gas does it consume? Which event records are triggered? At this point we will introduce the Receipt Trie.

Receipt Trie

Just as a shopping receipt records the outcome of a transaction, an object in the Receipt Trie does the same for an Ethereum transaction but also records some additional details. Going back to asking the question about transaction receipts above, we'll focus on the logs that triggered the following events.

Query the on-chain data of 0x311b again and obtain its transaction receipt. At this time, the following fields will be obtained:

code source:

  • Type - transaction type (LegacyTxType, AccessListTxType, DynamicFeeTxType)
  • PostState(root) - StateRoot, the root node of the state tree generated after executing the transaction, the corresponding value found in the figure is 0x, probably because of EIP98
  • CumulativeGasUsed - Cumulative total Gas consumed by this transaction and all previous transactions in the same block
  • Bloom(logsBloom) - Bloom filter for event logs, used to efficiently search and access contract event logs on the blockchain, allowing nodes to quickly retrieve whether a certain event occurred in a block without fully parsing the block All transaction receipts in the block
  • Logs - an array of log objects containing log entries generated by contract events triggered during transaction execution
  • TxHash - the transaction hash associated with the receipt
  • ContractAddress - If the transaction is to create a contract, the address where the contract was deployed. If the transaction is not a contract creation, but such as a transfer or interaction with a deployed smart contract, then the ContractAddress field will be empty
  • GasUsed - the gas consumed by this transaction
  • BlockNumber - the block number of the block where this transaction occurred
  • TransactionIndex - The transaction index within the block, the index determines which transaction is executed first. This transaction is at the top of the block, so index 0

Now that we know the composition of the transaction receipt, let's take a closer look at the logsBloom and the log array logs in the transaction receipt.

Event Logs

Through the USDT contract code on the Ethereum mainnet, we can see that the Transfer event is declared on line 86 of the contract, and the two input parameters have the keyword "indexed".

(code source:

When an event input is "indexed", it allows us to quickly find logs through that input. For example, when using the index "from" above, it is possible to get all event logs of type Transfer with "from" address 0x5041ed759dd4afc3a72b8192c143f72f4724081a between blocks X and Y. We can also see that when the transfer function is called on line 138, the event log is fired. It is worth noting that the current contract uses an earlier version of solidity, so the emit keyword is missing.

Go back to the obtained on-chain data:

code source:

Let's dig a little deeper into the address, topics, and data fields.

Theme Topics

Topics is an index value. From the above figure, we can see that there are 3 index parameters of topics in the query data on the chain, while the Transfer event only has 2 index parameters (from and to). This is because the first topic is always the event's function signature hash. The event function signature in the current example is Transfer(address, address, uint256). By hashing it with keccak256, we get the result ddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef.

(Online tool:

When we query the from field as mentioned above, but at the same time want to limit the query event log type to only Transfer type event logs, we need to filter by event type by indexing event signatures.

We can have up to 4 topics, each topic has a size of 32 bytes (if the type of the index parameter is greater than 32 bytes (i.e. string and bytes), the actual data is not stored, but a keccak256 digest of the data is stored ). We can declare 3 index parameters because the first parameter is taken by the event signature. But there is a situation where the first topic is not a hash event signature. This is the case when declaring anonymous events. This opens up the possibility of using 4 index parameters instead of the previous 3, but loses the ability to index event names. Another advantage of anonymous events is that they are less expensive to deploy since they do not enforce an additional topic. The other topics are the values of the indexes "from" and "to" from the Transfer event.

DataData

The data section contains the remaining (non-indexed) parameters from the event log. In the above example, there is a value 0x00000000000000000000000000000000000000000000000000000002b279b8, which is 45251000 in decimal, which is the aforementioned amount of $45.251. If there are more such parameters, they will be appended to the data item. The example below will show the case of more than 1 non-indexed parameter.

The current example adds an extra "tax" field to the Transfer event. Suppose the set tax is 20%, then the tax value should be 45251000 * 20% = 9050200, its hexadecimal value is 0x8a1858, since the type of this number is uint256, and the type of data is 32 bytes, you need to The hexadecimal value is filled with 32 bytes, and the result of the data item is 0x000000000000000000000000000000000000000000000000000000002b279b80000000000000000000000000000000000000 00000000000000000000008a1858.

Address

The address field is the address of the contract that emitted the event, an important note about this field is that it will be indexed even though it is not included in the topic section. The reason is that the Transfer event is part of the ERC20 standard, which means that when it is necessary to filter the logs of ERC20 transfer events, transfer events will be obtained from all ERC20 contracts. And by indexing the contract address, the search can be narrowed down to a specific contract/token, such as USDT in the example.

Opcodes Opcodes

Finally there is the LOG opcode. They range from LOG0 when there are no topics to LOG4 when there are 4 topics. LOG3 is what we use in our example. Contains the following:

  • offset - memory offset, indicating the starting position of data field input
  • length - length of data to read from memory
  • topic x(0 - 4) - the value of topic x

(Source:

offset and length define where the data is located in the data section in memory.

After understanding the structure of the log and how a topic is indexed, let's understand how index items are searched.

Bloom Filters Bloom Filters

The secret to index items being searched faster is the Bloom filter.

The Llimllib article has a good definition and explanation of this data structure.

"Bloom filter is a data structure that can be used to determine whether an element is in a collection. It has the characteristics of fast operation and small memory footprint. The cost of efficient insertion and query is that Bloom Filter is a probability-based data structure: It can only tell us that an element is definitely not in the set or possibly in the set. The underlying data structure of a Bloom filter is a bit vector.”

Below is an example of a bit vector. White cells represent bits with a value of 0, and green cells represent bits with a value of 1.

These bits are set to 1 by taking some input and hashing, the resulting hash value is used as a bit index on which bit should be updated. The bit vector above is the result of applying 2 different hashes to the value "ethereum" to get a 2-bit index. The hash represents a hexadecimal number, and to get the index, you take the number and convert it to a value between 0 and 14. There are many ways to do this, like mod 14.

Review

With a Bloom filter for transactions, that is, a bit vector, it can be hashed in Ethereum to determine which bits in the bit vector to update. The input is the address field and the topic of the event log. Let's review the logsBloom in the transaction receipt, which is a transaction-specific Bloom filter. A transaction can have multiple logs, which contains the address / topic of all logs.

If you look back up to the block header, you will find another logsBloom. This is a Bloom filter for all transactions within the block. Which contains all addresses/topics in each log for each transaction.

These Bloom filters are expressed in hexadecimal not binary. They are 256 bytes long and represent a 2048-bit vector. If we refer to the Llimllib example above, our bit vector length is 15, and bit indices 2 and 13 are flipped as 1. Let's see what we get when we convert it to hex.

Although the hexadecimal representation doesn't look like a bit vector, it does in logsBloom.

Query Queries

A query was mentioned before, "Find all event logs of the Transfer type whose "from" address is 0x5041ed759dd4afc3a72b8192c143f72f4724081a between blocks X and Y". We can get the event signature topic, which represents a topic of type Transfer and from (0x5041…) value, and determine which bit indices in the Bloom filter should be set to 1.

If you use logsBloom in the block header, you can check to see if any of these bits are not set to 1. If not, it can be determined that there are no logs matching the condition in the block. And if those bits are found to be set, we know that a matching log is probably in the block. But not completely sure, because the block header logsBloom consists of multiple addresses and topics. Other event logs may have the match bit set. This is why a Bloom filter is a probabilistic data structure. The larger the bit vector, the less likely it is that bit index collisions will occur with other logs. Once you have a matching Bloom filter, you can use the same method to query logsBloom for individual receipts. When a match is obtained, the actual log entry can be viewed to retrieve the object.

Execute the above operations on blocks X to Y to quickly find and retrieve all logs that meet the criteria. This is how the Bloom filter works conceptually.

Now let's look at the implementation used in Ethereum.

Geth implementation - Bloom Filters

Now that we know how the Bloom filter works, let's learn how the Bloom filter completes the screening step by step from address / topic to logsBloom in an actual block.

First of all, from the definition of the Ethereum Yellow Paper:

Source:

"We define a Bloom filter function M that reduces log entries into a single 256-byte hash:

in is a specialized Bloom filter that sets three bits in 2048 given an arbitrary sequence of bytes. This is done by taking the lower 11 bits of each of the first three pairs of bytes in the Keccak-256 hash of the byte sequence. "

An example and a reference to a Geth client implementation are provided below to simplify the understanding of the above definitions.

Here is the transaction log we looked at on Etherscan.

The first topic is the event signature 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef and converts this value into the bit index that should be updated.

Below is the bloomValues function from the Geth codebase.

This function receives the event signature topic, such as: 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef and other data, and returns the bit index that needs to be updated in the Bloom filter.

code source:

  1. The bloomValues function receives as input a topic (event signature in the example) and a hashbuf (an empty byte array of length 6).
  1. Refer to the Yellow Paper snippet, "The first three pairs of bytes in a Keccak-256 hash of a sequence of bytes". These three pairs of bytes are 6 bytes, which is the length of hashbuf.

  2. Sample data: 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef.

  1. The sha command between lines 140 - 144 hashes the input data and loads the output into hashbuf.
  1. The hexadecimal result of the sha output using keccak256 is (when using keccak 256 as the function signature, the input is text type, but here is hexadecimal type): ada389e1fc24a8587c776340efb91b36e675792ab631816100d55df0b5cf3cbc.

  1. The current content of hasbuf [ad, a3, 89, e1, fc, 24] (hexadecimal). Each hexadecimal character represents 4 bits.

3 Calculate v1.

1)hashbuf [1] = 0xa3 = 10100011 for bitwise AND with 0x7. 0x7 = 00000111.

  1. A byte consists of 8 bits. If you want to get a bit index, you need to ensure that the obtained value is between 0 and 7 of the zero index array. Use bitwise AND to hashbuf [1] Restricted to values between 0 and 7. Calculated in the example, 10100011 & 00000111 = 00000011 = 3.

  2. This bit index value is used with a bit shift operator, ie shifted 3 bits to the left, resulting in 8-bit byte index 00001000, to create a flipped bit.

  3. v1 is the whole byte rather than the actual bit index, because this value will be bitwise ORed on the Bloom filter later. The OR operation will ensure that all corresponding bits in the Bloom filter are also flipped.

  1. We now have byte values, but we still need byte indexes. The Bloom filter is 256 bytes long (2048 bits), so we need to know which byte to run the bitwise OR on. The value i1 represents this byte index.
  1. Put the hashbuf through big-endian uint16 byte order, which makes it limit the first 2 bytes of the bit array, which is 0xada3 = 1010110110100011 in the example.

  2. Bitwise AND this value with 0x7ff = 0000011111111111. There are 11 bits where 0x7ff is set to 1. As mentioned in the yellow paper, "it does this by taking the lower 11 bits of each of the first three pairs". This will result in the value 0000010110100011 which is 1010110110100011 & 0000011111111111.

  3. Then shift the value right by 3 bits. This converts an 11-digit number to an 8-digit number. We want a byte index, and the byte length of the Bloom filter is 256, so the byte index value needs to be in this range. And an 8-bit number can be any value between 0 and 255. In our example, this value is 0000010110100011 shifted right by 3 bits 10110100 = 180.

  4. Calculate our byte index by BloomByteLength, knowing it is 256 minus the calculated 180, minus 1. Subtract 1 to keep the result between 0 and 255. This gives us the byte index to update, which in this case turns out to be byte 75, which is how we calculated i1.

  1. Update the bit index 3 in the 75th byte of the Bloom filter (0 is the index so the 4th bit), which can be done by performing a bitwise OR operation of v1 on the 75th byte in the Bloom filter.
  1. We only covered the first byte pair 0xada3, which was done again for byte pairs 2 and 3. Each address / topic will update 3 bits in a 2048 bit vector. As mentioned in the Yellow Paper, "Bloom filter sets three bits in 2048 given an arbitrary sequence of bytes".

  2. Byte pair 2 status updates bit index 1 in byte 195 (execute in accordance with procedures 3 and 4, the result is shown in the figure).

  3. Byte pair 3 status update bit index 4 in byte 123.

  4. If the bit to be updated has already been flipped by another topic, it will remain as it is. Will flip to 1 if not.

Through the above operation process, it can be determined that the event signature topic will flip the following bits in the Bloom filter:

  • Bit index 3 in byte 75
  • bit index 1 in byte 195
  • bit index 4 in byte 123

Looking at the logBlooms in the transaction receipt, converted to binary, verifies that these bit indices are set.

Meanwhile, for those readers who are interested in learning more about the implementation of log search and Bloom filter, you can refer to the BloomBits Trie article.

At this point, our in-depth discussion of the EVM series of articles has come to an end, and we will bring you more high-quality technical articles in the future.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 1
  • Share
Comment
0/400
GateUser-17100469vip
· 07-09 02:40
Launch with power 🚀
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)