The Evolution of Blockchain Data Indexing: From Nodes to AI-Driven Full Chain Services

robot
Abstract generation in progress

The Evolution of Blockchain Data Indexing Technology: From Nodes to AI-Driven Full Chain Services

1. Introduction

Since the emergence of the first batch of Blockchain applications in 2017, to the flourishing of various financial, gaming, and social applications based on different Blockchains today, have we ever thought about the sources of the various types of data used in the interactions of these applications?

In 2024, artificial intelligence and Web3 have become hot topics. In the field of AI, data is the foundation of its development. Just as plants need sunlight and water, AI systems rely on vast amounts of data to continuously learn and evolve. Without data, even the most sophisticated AI algorithms struggle to exhibit their intended intelligence and effectiveness.

This article will delve into the development of blockchain data accessibility, analyze the evolution of data indexing in the industry, and compare the similarities and differences in technical features between established indexing protocols and emerging data service protocols.

Reading, indexing to analysis, a brief overview of the Web3 data indexing track

2. The Evolution of Data Indexing: From Blockchain Nodes to Full Chain Database

2.1 Data Source: Blockchain Node

Blockchain is a decentralized ledger, and nodes are the foundation of the entire network, responsible for recording, storing, and disseminating all transaction data. Each node has a complete copy of the blockchain data, maintaining the decentralized nature of the network. However, it is not easy for ordinary users to build and maintain nodes, as it requires not only specialized technology but also high hardware and bandwidth costs. The query capability of ordinary nodes is also limited, making it difficult to meet the needs of developers. Therefore, users often rely on third-party services.

RPC node providers have emerged, responsible for node management and providing data through RPC endpoints. This allows users to access blockchain data without having to build their own nodes. Public RPC endpoints are free but come with rate limits, while private RPC endpoints offer better performance but still have low efficiency. Nevertheless, the standardized API interfaces provided by node providers lower the threshold for users to access on-chain data, laying the foundation for subsequent data parsing and applications.

2.2 Data Parsing: From Raw Data to Usable Data

The raw data provided by blockchain nodes is usually encrypted and encoded to ensure integrity and security, but it also increases the difficulty of parsing. For ordinary users and developers, directly handling this data requires a significant amount of technical knowledge and computational resources.

The data parsing process thus becomes crucial. By transforming complex raw data into an easily understandable and operable format, users can utilize this data more intuitively. The quality of parsing directly affects the efficiency and effectiveness of blockchain data applications and is a key link in the entire data indexing process.

2.3 Evolution of Data Indexers

As the volume of Blockchain data increases, the demand for data indexers is growing. Indexers play an important role in organizing data on the chain and sending it to a database for querying. They index Blockchain data and provide SQL-like query language interfaces such as GraphQL API(, making data readily available. This unified query interface greatly simplifies the process for developers to retrieve the information they need.

Different types of indexers optimize data retrieval in various ways:

  1. Full Node Indexer: Directly extracts data from complete Blockchain nodes, ensuring completeness and accuracy, but requires significant storage and processing power.
  2. Lightweight Indexer: Relies on full nodes to retrieve specific data on demand, reducing storage requirements but potentially increasing query time.
  3. Dedicated Indexer: Optimized retrieval for specific types of data or Blockchain, such as NFT data or DeFi transactions.
  4. Aggregated Indexer: Extracts data from multiple Blockchains and sources, including off-chain information, providing a unified query interface suitable for multi-chain applications.

Currently, Ethereum archive nodes occupy 3-13.5 TB of storage space under different clients. In the face of such a large amount of data, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks for different application needs.

Compared to traditional RPC endpoints, indexers greatly enhance data indexing and query efficiency. They support complex queries, data filtering, and post-extraction analysis. Some indexers also support aggregating data sources from multiple blockchains, avoiding the need for multi-chain applications to deploy multiple APIs. By operating in a distributed manner, indexers provide stronger security and performance, reducing the risks that centralized RPC providers may pose.

![Read, Index to Analyze, Brief Introduction to the Web3 Data Indexing Track])https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(

) 2.4 Full Chain Database: Aligning Towards Flow Priority

As application demands become more complex, basic data indexers struggle to meet the increasingly diverse query needs, such as search, cross-chain access, or off-chain data mapping. In modern data pipeline architecture, a "stream-first" approach has become a solution to overcome the limitations of traditional batch processing, enabling real-time data processing and analysis.

Blockchain data service providers are also moving towards building data streams. Traditional indexer service providers have launched real-time blockchain data stream products, such as The Graph's Substreams and Goldsky's Mirror. There are also real-time data lakes based on blockchain-generated data streams, such as Chainbase and SubSquid.

These services aim to address the need for real-time parsing of Blockchain transactions and providing more comprehensive query capabilities. By redefining on-chain data management from the perspective of modern data pipelines, we can envision a future of high-performance datasets tailored for any business use case.

3. AI + Database: A Comparison of The Graph, Chainbase, and Space and Time

3.1 The Graph

The Graph network provides multi-chain data indexing and query services through decentralized Nodes. Its main product models include the data query execution market and the data indexing cache market, serving users' product query needs.

Subgraphs ### are the foundational data structure of The Graph network, defining how to extract and transform data from the blockchain into a queryable format. The network comprises four roles: indexers, curators, delegators, and developers, working together to support the data needs of web3 applications.

The products of The Graph are also rapidly developing in the wave of AI. The AutoAgora, Allocation Optimizer, and AgentC tools developed by Semiotic Labs optimize pricing strategies, resource allocation, and user experience, enhancing the intelligence of the system and its user-friendliness.

Reading, indexing to analysis, a brief overview of the Web3 data indexing track

( 3.2 Chainbase

Chainbase is a full-chain data network that integrates all blockchain data into one platform. Its features include:

  • Real-time Data Lake: Provides a dedicated real-time data lake for Blockchain data streams.
  • Dual-chain architecture: The execution layer is built on Eigenlayer AVS, forming a parallel architecture with the CometBFT consensus algorithm.
  • Innovative data format standard: Introduce the "manuscripts" data format standard.
  • Crypto World Model: Combining AI technology to create a model that can understand and predict Blockchain transactions.

Chainbase's AI model Theia is based on NVIDIA's DORA model, analyzing on-chain external data and spatiotemporal activities to provide users with intelligent data services.

![Reading, Indexing to Analysis, Brief Overview of the Web3 Data Indexing Track])https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp###

( 3.3 Space and Time

Space and Time )SxT### is dedicated to creating a verifiable computing layer that expands zero-knowledge proofs on decentralized data warehouses. Its core technology, Proof of SQL, ensures the tamper-proof and verifiability of SQL queries, providing an efficient solution for data verification.

SxT collaborates with Microsoft's AI Joint Innovation Lab to develop generative AI tools, allowing users to process blockchain data through natural language. In the Space and Time Studio, AI can convert natural language into SQL and execute queries.

Read, index to analyze, a brief overview of the Web3 data indexing track

Conclusion and Outlook

Blockchain data indexing technology has evolved from the initial node data sources, through the development of data parsing and indexing, to finally achieving AI-powered full-chain data services, going through a process of gradual improvement. These technological advancements have not only enhanced the efficiency and accuracy of data access but also brought an intelligent experience.

In the future, with the development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. As infrastructure, blockchain data services will continue to support industry innovation.

Reading, indexing to analysis, a brief overview of the Web3 data indexing track

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 5
  • Share
Comment
0/400
ChainSpyvip
· 2h ago
Just index the whole chain data and that's it~
View OriginalReply0
ProbablyNothingvip
· 12h ago
Give it a try.
View OriginalReply0
FUD_Whisperervip
· 12h ago
Bull, the indexer is moving towards intelligence.
View OriginalReply0
BearMarketNoodlervip
· 12h ago
Nothing new, this trap was used back in 2008.
View OriginalReply0
HorizonHuntervip
· 12h ago
AI is powerful, but it cannot just do technical work dryly.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)