The Rise of Tokens in the AI Industry: China's Unique Position

Explore how the concept of tokens is revolutionizing the AI industry and China's emerging role in this global shift.

Introduction

In early 2026, a set of data sparked heated discussions in the global AI industry. According to OpenRouter, the world’s largest AI model API aggregation platform, from February 9 to 15, China’s large model token usage reached 41.2 trillion, surpassing the U.S. model’s 29.4 trillion for the first time in history. This trend continued, with usage exceeding 73 trillion by mid-March, and four out of the top five models globally were from China.

This data is not meant to compare quantities but marks a silent revolution in the basic measurement unit of the AI industry—tokens are becoming the “kilowatt-hour” of the intelligent era. The dimensions of models, computing power, data, applications, industry, and governance are profoundly reshaped by this established measurement unit. Understanding AI in 2026 begins with understanding tokens.

Sixfold Reconstruction from a Measurement Unit

The measurement unit of the industrial revolution was the “kilowatt-hour,” which allowed energy to be precisely measured, priced, and transported across domains. The information revolution used “bits” and “traffic,” enabling information to be packaged, transmitted, and billed. The intelligent revolution’s measurement unit is “tokens,” which allows intelligence to be segmented, measured, priced, and traded for the first time.

The popularization of the token concept and its rapid growth are gradually pushing intelligence towards industrialization, marketization, and circulation.

Models

The economic value of large models is shifting from one-time training costs to long-term inference outputs. Model vendors no longer simply “sell capability” but directly “sell tokens,” with pricing based on millions of tokens for input and output becoming a global industry norm. The asset attribute of models is transitioning from “weight files” to “the ability to continuously produce tokens.”

Computing Power

The focus is shifting from “training computing power” to “inference computing power.” Training computing power is pulse-like and centralized, while inference computing power is continuous and distributed, posing new demands on latency, energy efficiency, and geographical distribution. The collaboration of cloud, edge, and terminal computing power, inference-specific chips, silicon photonics interconnect, and computing networks are becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.

Data

Data must be processed into standardized fuel before it can be used for power generation; similarly, data entering large models requires cleaning, labeling, and tokenization. In long-tail scenarios like autonomous driving, robot training, and scientific discovery, synthetic data generated through simulation has achieved large-scale application. The construction of a data factor market is entering a substantial phase, where “trainability” and “token output density”—rather than just data scale—are becoming new metrics for pricing data assets. This shift is significant: the valuation of data is beginning to correlate with its actual contribution in the token production chain, providing a more solid economic foundation for the marketization of data factors.

Applications

The focus is shifting from “function delivery” to “token consumption.” Traditional software charges based on seats or functions; today, applications bill based on token usage and business results. Intelligent agents are becoming the primary consumers of tokens, with complex tasks consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being scaled in customer service, marketing, compliance, and programming. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”

Industry

A new industry chain is forming around tokens, encompassing production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trust verification). The boundaries between model layers, inference service layers, intelligent agent middleware layers, and industry application layers are becoming clearer, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud providers, chip manufacturers, green energy operators, and content delivery network vendors are forming a collaborative ecosystem in the token industry chain. According to the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the effects of the entire industry chain’s collaboration becoming evident.

Governance

The governance focus is shifting from “algorithm governance” to “full-chain governance of tokens.” As the AI industry develops, the governance targets are expanding from “algorithms and code” to the entire chain of token production, circulation, consumption, and cross-border flow. New governance tools and rules are needed for issues such as token traceability, synthetic content identification, cross-border token flow, computing power and energy consumption constraints, and trustworthy evaluation and benchmarks. The year 2026 may become a key year for the implementation of global AI governance rules.

China’s Unique Position in the Global Token Wave

In the global wave brought by tokens, China is forming a unique position with multiple supports.

Token Production

On the production side, domestic model clusters are rising. A number of domestic models such as MiniMax, Dark Side of the Moon, Deep Quest, Zhipu, Alibaba Qianwen, and Byte Bean are leveraging mixed expert architectures and extreme engineering optimizations to continuously improve performance while reducing inference costs to a fraction of comparable global models. On the OpenRouter platform, U.S. users account for 47%, while Chinese users only account for about 6%, yet the usage volume is led by Chinese models—this is a recognition determined by global developers voting with their feet.

Token Consumption

On the consumption side, applications are deeper than ever, with tokens entering daily life at an unprecedented speed. A general practitioner in a county hospital can, within seconds and thousands of tokens, identify nodules and provide differential diagnosis suggestions for a suspicious lung CT, compressing what used to take two weeks into a single outpatient visit. A farmer in Shouguang, Shandong, uses a smart agriculture app to identify whether a curled cucumber is affected by thrips or a viral disease and what medication to use. An elderly person living alone can speak to a smart speaker in dialect, and after a conversation of a few thousand tokens, their children’s phones receive alerts and location sharing with emergency services. Delivery riders receive route guidance that considers real-time traffic and elevator wait times instead of mechanical instructions. AI assistants in government service halls respond to inquiries about medical insurance transfers and real estate registration, replacing “people running errands” with “tokens running errands.” Tokens are becoming the “invisible labor force” across various industries.

Industry Chain

A full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Hygon, to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, as well as a range of open-source middleware and industry-specific intelligent agents, the entire industry chain covering chips, computing power, models, middleware, and applications is quickly being perfected. The “East Data West Computing” project provides low-cost computing power, while green energy directly supplies data centers, solidifying the energy foundation.

However, it is essential to recognize that China still has significant room for improvement in areas such as original innovation in cutting-edge models, high-end computing power foundations, cross-language and cross-cultural ecological influence, and participation in global rule-making.

The second half of the token wave is not about having “already won” but rather that it has “just begun.” In the global landscape unfolding from small tokens, China is not only a vast market but also should be an active builder and responsible co-governor. Understanding tokens is essential to understanding the next phase of artificial intelligence.

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.