What is CIYUAN (词元)?

CIYUAN (词元, ciyuan) is the official Chinese name for “token” as designated by China's National Data Bureau. A token is the smallest information unit that large language models process. For example, “I love China!” might be split into four tokens: “I”, “love”, “China”, “!”

What is the scale of China's token usage?

China's daily token usage grew from 100 billion in early 2024, to 100 trillion by end of 2025, and to 140 trillion in March 2026 — a growth of over 1,000x in two years. The National Data Bureau stated that some model companies achieved in 20 days what took all of 2025, indicating a new commercial logic built on token-based billing is accelerating.

Why does the official naming of 词元 matter?

When China officially names a term, it sets the standard for government documents, academic publications, regulatory frameworks, and industry communications. The naming of 词元 (CIYUAN) represents a shift in AI narrative: from benchmarking against U.S. models, to measuring domestic industrial capacity. This reflects China's transition from an “AI consumer” to an “AI production powerhouse.”

How is 词元 defined by AI researchers?

According to researchers from Tsinghua University and Renmin University of China, a token is a discrete symbolic unit produced after text is segmented via tokenization, subword splitting, or byte-level encoding. In multimodal AI, tokens are applied not only to text but also to images, audio, and video, enabling cross-modal understanding and generation within a unified representation space.

Official Terminology — March 2026

What is 词元?

CIYUAN = 词元 = Token

The official Chinese name for "token" — announced by China's National Data Bureau at the 2026 China Development Forum.
中国国家数据局在中国发展高层论坛2026年年会上正式公布的"token"官方中文名称。

Explore CIYUAN

140万亿/天

Daily Token Usage (March 2026) / 日均词元调用量

1,000×

Growth in 2 Years / 两年增长倍数

Mar 23, 2026

Official Announcement / 官方发布日期

Definition / 定义

Three Terms, 同一概念

CIYUAN, 词元, and "token" all refer to the same concept — 词元、CIYUAN 与"token"三者指向同一概念 the smallest information unit that large AI models process. 即人工智能大模型处理信息的最小信息单元。

English

英文

token

The fundamental unit of data that large language models process. For example: "I love China!" might split into four tokens: "I", "love", "China", "!"

Romanized

拼音

CIYUAN

The official romanized designation. Proposed by Professor Qiu Xipeng (Fudan University, 2021) and formally adopted by China's National Data Bureau in March 2026.

中文

Chinese

词元

"词"覆盖字和词的范围，"元"是最小基础单元。两个汉字合在一起，精准描述了 token 在大模型中扮演的角色。

Official Source / 官方来源

On March 23, 2026, at the 2026 China Development Forum annual conference, Liu Liehong (刘烈宏), Director of the National Data Bureau (国家数据局), officially introduced 词元 (CIYUAN) as the standardized Chinese name for "token". The term was subsequently reported by People's Daily and widely circulated in official Chinese media.
2026年3月23日，中国国家数据局局长刘烈宏在中国发展高层论坛2026年年会上正式将"token"的中文名称定为"词元"（CIYUAN）。

By the Numbers / 数据说话

A 1,000× Surge in Two Years / 两年增长超千倍

China's daily token consumption tells the real story of the AI boom — 中国日均词元消耗量的变化，折射出 AI 产业的真实面貌 not in benchmark scores, but in industrial throughput. 不是评测分数，而是工业产能。

Early 2024

100亿/天

Daily token usage in China at the start of 2024.

2024年初，中国日均词元调用量为1000亿。

End of 2025

100万亿/天

A 400× increase in roughly 1.5 years — as reported by the National Data Bureau.

国家数据局披露，一年多增长400多倍。

March 2026

140万亿/天

140 trillion tokens per day — over 1,000× growth from the start of 2024.

突破140万亿，较2024年初增长超千倍。

“Token（词元）不仅是智能时代的价值锚点，更是连接技术供给与商业需求的‘结算单位’，为商业模式的落地提供了可量化的可能。”
— 刘烈宏，国家数据局局长，中国发展高层论坛 2026 年会

“Token [CIYUAN] is not only the value anchor of the intelligent era, but also the ‘settlement unit’ connecting technology supply and commercial demand, providing quantifiable possibilities for business model implementation.”
— Liu Liehong, Director, National Data Bureau, China Development Forum 2026

Why It Matters / 为何重要

A Quiet Shift in AI's 叙事权的悄然切换

"词元" is not just a good translation — "词元"不只是一个好翻译 it is a signal that China's AI narrative has completed an identity switch: from "we are also catching up" to "we are exporting production capacity." 它是中国 AI 叙事完成身份切换的信号：从"我们也在追赶"，变成"我们正在输出产能"。

Before / 以前

Benchmark Rankings
/ 评测分数时代

Which model scores higher on MMLU, HumanEval, GPQA?
English-language benchmarks as the universal standard
Every Chinese model launch measured against GPT-4o
The ruler is made by others — you just compete on it

谁的 benchmark 更高？参数量更大？评测分数更领先？尺子是别人造的。

Now / 现在

Token Volume
/ 词元产能时代

How many tokens consumed per day? How many API calls?
Industrial capacity as the metric — a domain China knows well
China's weekly token usage: 4.12 trillion vs US: 2.94 trillion (People's Daily, 2026)
The ruler is built at home — the standard is set by usage scale

词元消耗量、调用量曲线，尺子是中国造的，用规模来定义话语权。

Expert Definition / 专家定义

“A token is the discrete unit for data processing in natural language algorithms. With the rise of large models, tokens provide a unified representation for diverse modalities — enabling cross-modal understanding and generation. From text subwords to visual patches, tokenization enhances data processing efficiency.”

— 东昱晓 (Tsinghua), 文继荣 (Renmin U. of China), 唐杰 (Tsinghua)
Token (词元) | Terminology Series, 2026

NVIDIA & Jensen Huang / 英伟达 & 黄仁勋

At NVIDIA's GTC 2026 conference, CEO Jensen Huang explicitly stated that the token is the foundational building block of the new AI era. The English term is "token"; the Chinese term is 词元. Both sides are crowning the same concept — in different languages, with equal weight.

英伟达CEO黄仁勋在2026年GTC大会上明确指出，token 是新 AI 时代的基础构建单元。英文叫 token，中文叫词元，两边同时在给这个概念加冕。

FAQ / 常见问答

Frequently Asked Questions / 常见问题

What is a token (词元)? / 什么是词元？

A token is the smallest information unit that large AI models process. It can be a Chinese character, a word, part of an English word, or even punctuation. For example, "我爱我中国！" might split into "我", "爱", "中国", "！" — four tokens. In multimodal AI, images, audio, and video are also tokenized into discrete units, enabling cross-modal processing. 词元是大模型处理信息的最小信息单元，可以是一个汉字、一个词，或英文单词的一部分。如"我爱中国！"可拆分为"我""爱""中国""！"四个词元。在多模态AI中，图像、音频、视频也被切分为词元，实现跨模态处理。

When was 词元 officially named? / 词元是什么时候被正式命名的？

On March 23, 2026, at the 2026 China Development Forum annual conference, Liu Liehong (刘烈宏), Director of the National Data Bureau, officially announced 词元 (CIYUAN) as the standardized Chinese name for "token". The announcement was widely reported by People's Daily. 2026年3月23日，中国国家数据局局长刘烈宏在中国发展高层论坛2026年年会上正式宣布"词元"（CIYUAN）为"token"的规范化中文名称。《人民日报》对此进行了报道。

Why is China's official naming significant? / 为什么中国官方的命名很重要？

When China's National Data Bureau designates a term, it sets the standard for government documents, academic publications, regulatory frameworks, and industry communications nationwide. The naming of 词元 represents a shift from benchmarking against U.S. AI models to measuring China's own industrial capacity. This mirrors China's approach in other industries: naming the unit defines who controls the ruler. 国家数据局的命名具有权威性，会成为政府文件、学术出版、监管框架和行业交流的标准。词元的命名标志着从"对标美国 AI 模型评测分数"转向"衡量中国自身工业产能"。这与中国在新能源汽车等领域建立"渗透率"指标的逻辑一脉相承。

Why is 词元 a better translation than 令牌 or 代币? / 为什么"词元"比"令牌""代币"更好？

The same English word "token" has had four different Chinese names across industries: 令牌 (IBM network security, identity credentials), 代币/通证 (blockchain, ICO era), and 符号/词法单元 (compiler theory). 词元 was first proposed by Prof. Qiu Xipeng at Fudan University in 2021: "词" covers both characters and words, "元" means the smallest indivisible unit. The term follows Chinese naming conventions (like chemical elements 氢、氧、铜、铁) rather than phonetic transliteration. "token"在中文里有四种叫法：令牌（网络安全）、代币/通证（区块链）、符号（编译器）。词元由复旦大学邱锡鹏教授于2021年提出——"词"覆盖字和词的范围，"元"是最小基础单元。它遵循汉语造词法（类比化学元素命名），而非音译。

Some people call it "托肯" (tuōkěn). Which is correct? / 有人说"托肯"，哪个更正确？

托肯 is a phonetic transliteration that some prefer because it carries no semantic baggage — "词元" implies a linguistic connection, but tokens are now used for images, audio, and video too. However, 词元 has two decisive advantages: it was officially endorsed by the National Data Bureau and reported by People's Daily, giving it national-level backing. The competition between 词元 and 托肯 mirrors the historic 区块链 vs 区块脸 debate — official endorsement often wins in the long run. "托肯"是民间音译路线，优点是不携带语义（"词元"暗示语言学关联，但词元已用于图像、音频、视频）。但"词元"有两大决定性优势：国家数据局官方背书、《人民日报》刊登。历史上"区块链"vs"区块脸"的争论，最终官方命名胜出。

What is 词元?

Three Terms, 同一概念

A 1,000× Surge in Two Years / 两年增长超千倍

A Quiet Shift in AI's 叙事权的悄然切换

Benchmark Rankings
/ 评测分数时代

Token Volume
/ 词元产能时代

Frequently Asked Questions / 常见问题

Four Names for One Concept / 一词四名的前世今生

How Many Tokens Is That? / 你的文字有多少词元？

What is 词元?

Three Terms, 同一概念

A 1,000× Surge in Two Years / 两年增长超千倍

A Quiet Shift in AI's 叙事权的悄然切换

Benchmark Rankings/ 评测分数时代

Token Volume/ 词元产能时代

Frequently Asked Questions / 常见问题

Four Names for One Concept / 一词四名的前世今生

How Many Tokens Is That? / 你的文字有多少词元？

Benchmark Rankings
/ 评测分数时代

Token Volume
/ 词元产能时代