Sequence Labeling

序列标注

Considering the importance of breaks and pauses in sentences for the comprehension of classical Chinese, for sequence labeling, two tasks are designed, including punctuation PUNC and named entity recognition GLNER.

鉴于在理解古代汉语中句子的时候，断句和停顿的重要性，设计了两项序列标注任务，包括标点符号断句 PUNC 和命名实体识别 GLNER。

Sentence Classification

句子分类

For sentence classification, three tasks are designed, including text category classification GJC, written time classification TLC, and emotion classification of poems task FSPC.

对于句子分类，设计了三项任务，包括文本类别分类 GJC，创作时间分类 TLC，诗歌情感分类 FSPC。

Token Similarity

虚词判断

Considering that some tokens in the classic Chinese language have the functions of prepositions, conjunctions and auxiliaries, and the same token has different meanings in different sentences, a new task is designed, Xuci, for token comparison.

考虑到古代汉语中有些词语具有介词、连词和助词的功能，同一个词语在不同的句子中有不同的意思，设计了一个新的任务，虚词判断 Xuci。

Reading Comprehension

阅读理解

This kind including a classical Chinese text reading comprehension task named WYWRC and an idiom comprehension task named IRC.

本类包括一个古汉语文本阅读理解任务 WYWRC 和一个成语理解任务 IRC。

Translation

翻译

A translation task for Classical Chinese.

Cite Us

@inproceedings{zhou-etal-2023-wyweb,
title = "{WYWEB}: A {NLP} Evaluation Benchmark For Classical {C}hinese",
author = "Zhou, Bo  and
Chen, Qianglong  and
Wang, Tianyu  and
Zhong, Xiaomi  and
Zhang, Yin",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.204",
doi = "10.18653/v1/2023.findings-acl.204",
pages = "3294--3319"
}

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Abstract

To fully evaluate the overall performance of different NLP models in a given domain, many evaluation benchmarks are proposed, such as GLUE, SuperGLUE and CLUE. The field of natural language understanding has traditionally focused on benchmarks for various tasks in languages such as Chinese, English, and multilingual, however, there has been a lack of attention given to the area of classical Chinese, also known as "wen yan wen (文言文)", which has a rich history spanning thousands of years and holds significant cultural and academic value.
For the prosperity of the NLP community, in this paper, we introduce the WYWEB evaluation benchmark, which consists of nine NLP tasks in classical Chinese, implementing sentence classification, sequence labeling, reading comprehension, and machine translation. We evaluate the existing pre-trained language models, which are all struggling with this benchmark. We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on classical Chinese NLU. The github repository and leaderboard of WYWEB will be released as soon as possible.

古汉语自然语言处理能力评估基准

Welcome To WYWEB!

Sequence Labeling

Sentence Classification

Token Similarity

Reading Comprehension

Translation

Cite Us

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Abstract

Contact Us

Welcome To WYWEB!

Cite Us

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Abstract