🎉 [Gate 30 Million Milestone] Share Your Gate Moment & Win Exclusive Gifts!
Gate has surpassed 30M users worldwide — not just a number, but a journey we've built together.
Remember the thrill of opening your first account, or the Gate merch that’s been part of your daily life?
📸 Join the #MyGateMoment# campaign!
Share your story on Gate Square, and embrace the next 30 million together!
✅ How to Participate:
1️⃣ Post a photo or video with Gate elements
2️⃣ Add #MyGateMoment# and share your story, wishes, or thoughts
3️⃣ Share your post on Twitter (X) — top 10 views will get extra rewards!
👉
The Evolution of Blockchain Data Indexing: From Nodes to AI-Driven Full Chain Services
The Evolution of Blockchain Data Indexing Technology: From Nodes to AI-Driven Full Chain Services
1. Introduction
Since the emergence of the first batch of Blockchain applications in 2017, to the flourishing of various financial, gaming, and social applications based on different Blockchains today, have we ever thought about the sources of the various types of data used in the interactions of these applications?
In 2024, artificial intelligence and Web3 have become hot topics. In the field of AI, data is the foundation of its development. Just as plants need sunlight and water, AI systems rely on vast amounts of data to continuously learn and evolve. Without data, even the most sophisticated AI algorithms struggle to exhibit their intended intelligence and effectiveness.
This article will delve into the development of blockchain data accessibility, analyze the evolution of data indexing in the industry, and compare the similarities and differences in technical features between established indexing protocols and emerging data service protocols.
2. The Evolution of Data Indexing: From Blockchain Nodes to Full Chain Database
2.1 Data Source: Blockchain Node
Blockchain is a decentralized ledger, and nodes are the foundation of the entire network, responsible for recording, storing, and disseminating all transaction data. Each node has a complete copy of the blockchain data, maintaining the decentralized nature of the network. However, it is not easy for ordinary users to build and maintain nodes, as it requires not only specialized technology but also high hardware and bandwidth costs. The query capability of ordinary nodes is also limited, making it difficult to meet the needs of developers. Therefore, users often rely on third-party services.
RPC node providers have emerged, responsible for node management and providing data through RPC endpoints. This allows users to access blockchain data without having to build their own nodes. Public RPC endpoints are free but come with rate limits, while private RPC endpoints offer better performance but still have low efficiency. Nevertheless, the standardized API interfaces provided by node providers lower the threshold for users to access on-chain data, laying the foundation for subsequent data parsing and applications.
2.2 Data Parsing: From Raw Data to Usable Data
The raw data provided by blockchain nodes is usually encrypted and encoded to ensure integrity and security, but it also increases the difficulty of parsing. For ordinary users and developers, directly handling this data requires a significant amount of technical knowledge and computational resources.
The data parsing process thus becomes crucial. By transforming complex raw data into an easily understandable and operable format, users can utilize this data more intuitively. The quality of parsing directly affects the efficiency and effectiveness of blockchain data applications and is a key link in the entire data indexing process.
2.3 Evolution of Data Indexers
As the volume of Blockchain data increases, the demand for data indexers is growing. Indexers play an important role in organizing data on the chain and sending it to a database for querying. They index Blockchain data and provide SQL-like query language interfaces such as GraphQL API(, making data readily available. This unified query interface greatly simplifies the process for developers to retrieve the information they need.
Different types of indexers optimize data retrieval in various ways:
Currently, Ethereum archive nodes occupy 3-13.5 TB of storage space under different clients. In the face of such a large amount of data, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks for different application needs.
Compared to traditional RPC endpoints, indexers greatly enhance data indexing and query efficiency. They support complex queries, data filtering, and post-extraction analysis. Some indexers also support aggregating data sources from multiple blockchains, avoiding the need for multi-chain applications to deploy multiple APIs. By operating in a distributed manner, indexers provide stronger security and performance, reducing the risks that centralized RPC providers may pose.
![Read, Index to Analyze, Brief Introduction to the Web3 Data Indexing Track])https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(
) 2.4 Full Chain Database: Aligning Towards Flow Priority
As application demands become more complex, basic data indexers struggle to meet the increasingly diverse query needs, such as search, cross-chain access, or off-chain data mapping. In modern data pipeline architecture, a "stream-first" approach has become a solution to overcome the limitations of traditional batch processing, enabling real-time data processing and analysis.
Blockchain data service providers are also moving towards building data streams. Traditional indexer service providers have launched real-time blockchain data stream products, such as The Graph's Substreams and Goldsky's Mirror. There are also real-time data lakes based on blockchain-generated data streams, such as Chainbase and SubSquid.
These services aim to address the need for real-time parsing of Blockchain transactions and providing more comprehensive query capabilities. By redefining on-chain data management from the perspective of modern data pipelines, we can envision a future of high-performance datasets tailored for any business use case.
3. AI + Database: A Comparison of The Graph, Chainbase, and Space and Time
3.1 The Graph
The Graph network provides multi-chain data indexing and query services through decentralized Nodes. Its main product models include the data query execution market and the data indexing cache market, serving users' product query needs.
Subgraphs ### are the foundational data structure of The Graph network, defining how to extract and transform data from the blockchain into a queryable format. The network comprises four roles: indexers, curators, delegators, and developers, working together to support the data needs of web3 applications.
The products of The Graph are also rapidly developing in the wave of AI. The AutoAgora, Allocation Optimizer, and AgentC tools developed by Semiotic Labs optimize pricing strategies, resource allocation, and user experience, enhancing the intelligence of the system and its user-friendliness.
( 3.2 Chainbase
Chainbase is a full-chain data network that integrates all blockchain data into one platform. Its features include:
Chainbase's AI model Theia is based on NVIDIA's DORA model, analyzing on-chain external data and spatiotemporal activities to provide users with intelligent data services.
![Reading, Indexing to Analysis, Brief Overview of the Web3 Data Indexing Track])https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp###
( 3.3 Space and Time
Space and Time )SxT### is dedicated to creating a verifiable computing layer that expands zero-knowledge proofs on decentralized data warehouses. Its core technology, Proof of SQL, ensures the tamper-proof and verifiability of SQL queries, providing an efficient solution for data verification.
SxT collaborates with Microsoft's AI Joint Innovation Lab to develop generative AI tools, allowing users to process blockchain data through natural language. In the Space and Time Studio, AI can convert natural language into SQL and execute queries.
Conclusion and Outlook
Blockchain data indexing technology has evolved from the initial node data sources, through the development of data parsing and indexing, to finally achieving AI-powered full-chain data services, going through a process of gradual improvement. These technological advancements have not only enhanced the efficiency and accuracy of data access but also brought an intelligent experience.
In the future, with the development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. As infrastructure, blockchain data services will continue to support industry innovation.