Official Terminology — March 2026

What is 词元?

CIYUAN = 词元 = Token

The official Chinese name for "token" — announced by China's National Data Bureau at the 2026 China Development Forum.
中国国家数据局在中国发展高层论坛2026年年会上正式公布的"token"官方中文名称。

Explore CIYUAN

140万亿/天

Daily Token Usage (March 2026) / 日均词元调用量

1,000×

Growth in 2 Years / 两年增长倍数

Mar 23, 2026

Official Announcement / 官方发布日期

Three Terms, 同一概念

CIYUAN, 词元, and "token" all refer to the same concept — 词元、CIYUAN 与"token"三者指向同一概念 the smallest information unit that large AI models process. 即人工智能大模型处理信息的最小信息单元。

English

英文

token

The fundamental unit of data that large language models process. For example: "I love China!" might split into four tokens: "I", "love", "China", "!"

Romanized

拼音

CIYUAN

The official romanized designation. Proposed by Professor Qiu Xipeng (Fudan University, 2021) and formally adopted by China's National Data Bureau in March 2026.

中文

Chinese

词元

"词"覆盖字和词的范围,"元"是最小基础单元。两个汉字合在一起,精准描述了 token 在大模型中扮演的角色。

Official Source / 官方来源

On March 23, 2026, at the 2026 China Development Forum annual conference, Liu Liehong (刘烈宏), Director of the National Data Bureau (国家数据局), officially introduced 词元 (CIYUAN) as the standardized Chinese name for "token". The term was subsequently reported by People's Daily and widely circulated in official Chinese media.
2026年3月23日,中国国家数据局局长刘烈宏在中国发展高层论坛2026年年会上正式将"token"的中文名称定为"词元"(CIYUAN)。

A 1,000× Surge in Two Years / 两年增长超千倍

China's daily token consumption tells the real story of the AI boom — 中国日均词元消耗量的变化,折射出 AI 产业的真实面貌 not in benchmark scores, but in industrial throughput. 不是评测分数,而是工业产能。

Early 2024

100亿/天

Daily token usage in China at the start of 2024.

2024年初,中国日均词元调用量为1000亿。

End of 2025

100万亿/天

A 400× increase in roughly 1.5 years — as reported by the National Data Bureau.

国家数据局披露,一年多增长400多倍。

March 2026

140万亿/天

140 trillion tokens per day — over 1,000× growth from the start of 2024.

突破140万亿,较2024年初增长超千倍。

“Token(词元)不仅是智能时代的价值锚点,更是连接技术供给与商业需求的‘结算单位’,为商业模式的落地提供了可量化的可能。”
— 刘烈宏,国家数据局局长,中国发展高层论坛 2026 年会
“Token [CIYUAN] is not only the value anchor of the intelligent era, but also the ‘settlement unit’ connecting technology supply and commercial demand, providing quantifiable possibilities for business model implementation.”
— Liu Liehong, Director, National Data Bureau, China Development Forum 2026

A Quiet Shift in AI's 叙事权的悄然切换

"词元" is not just a good translation — "词元"不只是一个好翻译 it is a signal that China's AI narrative has completed an identity switch: from "we are also catching up" to "we are exporting production capacity." 它是中国 AI 叙事完成身份切换的信号:从"我们也在追赶",变成"我们正在输出产能"。

Before / 以前

Benchmark Rankings
/ 评测分数时代

  • Which model scores higher on MMLU, HumanEval, GPQA?
  • English-language benchmarks as the universal standard
  • Every Chinese model launch measured against GPT-4o
  • The ruler is made by others — you just compete on it

谁的 benchmark 更高?参数量更大?评测分数更领先?尺子是别人造的。

Now / 现在

Token Volume
/ 词元产能时代

  • How many tokens consumed per day? How many API calls?
  • Industrial capacity as the metric — a domain China knows well
  • China's weekly token usage: 4.12 trillion vs US: 2.94 trillion (People's Daily, 2026)
  • The ruler is built at home — the standard is set by usage scale

词元消耗量、调用量曲线,尺子是中国造的,用规模来定义话语权。

Expert Definition / 专家定义

“A token is the discrete unit for data processing in natural language algorithms. With the rise of large models, tokens provide a unified representation for diverse modalities — enabling cross-modal understanding and generation. From text subwords to visual patches, tokenization enhances data processing efficiency.”

— 东昱晓 (Tsinghua), 文继荣 (Renmin U. of China), 唐杰 (Tsinghua)
Token (词元) | Terminology Series, 2026

NVIDIA & Jensen Huang / 英伟达 & 黄仁勋

At NVIDIA's GTC 2026 conference, CEO Jensen Huang explicitly stated that the token is the foundational building block of the new AI era. The English term is "token"; the Chinese term is 词元. Both sides are crowning the same concept — in different languages, with equal weight.

英伟达CEO黄仁勋在2026年GTC大会上明确指出,token 是新 AI 时代的基础构建单元。英文叫 token,中文叫词元,两边同时在给这个概念加冕。

Frequently Asked Questions / 常见问题

What is a token (词元)? / 什么是词元?
A token is the smallest information unit that large AI models process. It can be a Chinese character, a word, part of an English word, or even punctuation. For example, "我爱我中国!" might split into "我", "爱", "中国", "!" — four tokens. In multimodal AI, images, audio, and video are also tokenized into discrete units, enabling cross-modal processing. 词元是大模型处理信息的最小信息单元,可以是一个汉字、一个词,或英文单词的一部分。如"我爱中国!"可拆分为"我""爱""中国""!"四个词元。在多模态AI中,图像、音频、视频也被切分为词元,实现跨模态处理。
When was 词元 officially named? / 词元是什么时候被正式命名的?
On March 23, 2026, at the 2026 China Development Forum annual conference, Liu Liehong (刘烈宏), Director of the National Data Bureau, officially announced 词元 (CIYUAN) as the standardized Chinese name for "token". The announcement was widely reported by People's Daily. 2026年3月23日,中国国家数据局局长刘烈宏在中国发展高层论坛2026年年会上正式宣布"词元"(CIYUAN)为"token"的规范化中文名称。《人民日报》对此进行了报道。
Why is China's official naming significant? / 为什么中国官方的命名很重要?
When China's National Data Bureau designates a term, it sets the standard for government documents, academic publications, regulatory frameworks, and industry communications nationwide. The naming of 词元 represents a shift from benchmarking against U.S. AI models to measuring China's own industrial capacity. This mirrors China's approach in other industries: naming the unit defines who controls the ruler. 国家数据局的命名具有权威性,会成为政府文件、学术出版、监管框架和行业交流的标准。词元的命名标志着从"对标美国 AI 模型评测分数"转向"衡量中国自身工业产能"。这与中国在新能源汽车等领域建立"渗透率"指标的逻辑一脉相承。
Why is 词元 a better translation than 令牌 or 代币? / 为什么"词元"比"令牌""代币"更好?
The same English word "token" has had four different Chinese names across industries: 令牌 (IBM network security, identity credentials), 代币/通证 (blockchain, ICO era), and 符号/词法单元 (compiler theory). 词元 was first proposed by Prof. Qiu Xipeng at Fudan University in 2021: "词" covers both characters and words, "元" means the smallest indivisible unit. The term follows Chinese naming conventions (like chemical elements 氢、氧、铜、铁) rather than phonetic transliteration. "token"在中文里有四种叫法:令牌(网络安全)、代币/通证(区块链)、符号(编译器)。词元由复旦大学邱锡鹏教授于2021年提出——"词"覆盖字和词的范围,"元"是最小基础单元。它遵循汉语造词法(类比化学元素命名),而非音译。
Some people call it "托肯" (tuōkěn). Which is correct? / 有人说"托肯",哪个更正确?
托肯 is a phonetic transliteration that some prefer because it carries no semantic baggage — "词元" implies a linguistic connection, but tokens are now used for images, audio, and video too. However, 词元 has two decisive advantages: it was officially endorsed by the National Data Bureau and reported by People's Daily, giving it national-level backing. The competition between 词元 and 托肯 mirrors the historic 区块链 vs 区块脸 debate — official endorsement often wins in the long run. "托肯"是民间音译路线,优点是不携带语义("词元"暗示语言学关联,但词元已用于图像、音频、视频)。但"词元"有两大决定性优势:国家数据局官方背书、《人民日报》刊登。历史上"区块链"vs"区块脸"的争论,最终官方命名胜出。

Four Names for One Concept / 一词四名的前世今生

The same English word "token" has had four different Chinese translations across industries — "token" 在中文里有四种叫法 only one carries official authority. ,只有一个获得了官方认可。

中文名称 拼音/Romanization 所属领域 使用场景 权威性
词元 CIYUAN AI / 大模型 国家数据局官方文件、人民日报、学术论文 ★ 官方认定
令牌 Lìngpái 网络安全 / 身份认证 登录凭证、权限验证、OAuth 2.0 ✅ 技术术语
代币 / 通证 Dàibì / Tōngzhèng 区块链 / Web3 ICO、DeFi、NFT、白皮书 ✅ 行业惯例
符号 / 词法单元 Fúhào / Cífǎ dānyuán 编译器 / 编程语言 词法分析器(Lexer)、编译器前端 ✅ 学术术语
托肯 Tuōkěn 民间 / 商业定价 API 定价页面(如"每千托肯收费 $0.002") ⚡ 民间流行中

Why CIYUAN (词元) Won / 词元为何胜出

In 2017, CSDN's 孟岩 and 元道 proposed "区块链" (Blockchain) over "块链" — a naming that stuck and became the global standard. The same logic applies here: "词元" follows Chinese naming conventions like chemical elements (氢、氧、铜、铁), creating new characters with semantic hints, rather than phonetic transliteration. The official endorsement from the National Data Bureau is the decisive factor.
2017年,孟岩和元道将"Blockchain"定名为"区块链",而非直译"块链"——最终成为全球通用译法。词元同理:它遵循汉语造词逻辑(类比化学元素命名),而非音译。国家数据局的背书是关键决定性因素。

How Many Tokens Is That? / 你的文字有多少词元?

Enter any text to see how a large language model would split it into tokens — 输入任意文字,看大模型如何将其切分为词元 based on typical tokenization logic. (基于典型分词逻辑估算)。

0 characters / 字符  ·  ≈ 0 tokens (est.) / 词元(估)

Enter text above to see the token estimate / 在上方输入文字查看估算

* Estimates based on typical subword tokenization (BPE/WordPiece). Actual token counts vary by model provider. English: ~4 chars/token avg; Chinese: ~1.5 chars/token avg.
* 估算基于典型子词分词算法(BPE/WordPiece),实际词元数因模型而异。英文约4字符/词元;中文约1.5字符/词元。