Heart of Machine Report Project Author: CLUEbenchmark Participation: Si, Du Wei With this open source project, you no longer have to worry about finding a useful Chinese NLP data set. With 142 data sets, there is always one that suits you.

2024/05/2713:22:32 hotcomm 1394

Machine Heart reports

project Author: CLUE benchmark

Participants: Si, Du Wei

With this open source project, you no longer have to worry about finding a useful Chinese NLP data set. With 142 data sets, there is always one that suits you. One.

Chinese NLP data set search: https://www.cluebenchmarks.com/dataSet_search.html

On the road of no return of learning NLP, we will always find that most advanced algorithms and high-quality sample codes use English data sets. And when we are full of hope to migrate the model to the Chinese world, the lack of public high-quality data sets is simply a barrier. For example, the simplest language model and word embedding model only require segments of natural Chinese text. However, in fact, we will find that there are really few useful public large-scale corpora.

We need to find various projects that collect Chinese NLP data sets on GitHub and other platforms, and then choose according to our needs. It is worth noting that many domestic Chinese data sets are very old, and their use will be more troublesome. At this time, we need to make our own judgment and trial and error.

However, in this article, we will introduce a new Chinese NLP data search project, , which may be the most comprehensive Chinese NLP data set information collection project . This project collects more than one hundred pieces of Chinese NLP data information and displays the results in the form of search. We only need to type in keywords, or information such as the field to which the data set belongs, and we can find the corresponding data set.

Each search result will display the basic information of the data set, access links and other key information, which can help us quickly filter the data set. Because there are so many similar data sets found in every field, these brief overviews are very meaningful.

If readers want to see what data sets there are, they can directly check the GitHub address of the search project, and the information of all data sets is on it.

This may be the most comprehensive Chinese NLP data set

The NLP data set in this project includes NER, QA, sentiment analysis, text classification, text distribution, text summarization, machine translation, knowledge graph, corpus and reading comprehension, etc. There are 142 data sets in 10 categories.

Specifically, for each data set, the project author provides information such as the data set name, update time, data set provider, description, keywords, category, and paper address.

project address: https://github.com/CLUEbenchmark/CLUEDatasetSearch

This project is Chinese NLP data set classification.

However, since the entire project contains many types of data sets, Machine Heart only briefly introduces the sentiment analysis and text classification data sets.

Sentiment Analysis

As a common application of natural language processing (NLP), sentiment analysis is particularly suitable for classification methods aimed at extracting the emotional content of text. This project introduces 11 sentiment analysis data set sources , including NLPCC 2013/2014, Weibo Emotions Corpus, Zhijiang Cup E-commerce Review Opinion Mining Competition and 2019 Sohu Campus Algorithm Competition data set. Details of some sentiment analysis Chinese data sets in the

project.

Text classification

As the most commonly used and basic application in natural language processing, there are already many data sets in text classification. This project introduces 19 sources of text classification data sets, including Toutiao Chinese news (text) classification, THUCNews Chinese text classification, 2017 Zhihu Kanshan Cup machine learning challenge, and the University of Science and Technology of China news classification corpus, etc. Details of some text classification data sets in the

project.

Finally, developers can also contribute their own efforts by uploading data set information. By uploading 5 or more data set information, you can become a contributor to this project after passing the review. At present, it seems that 142 data sets are quite complete, but for more NLP subfield tasks, everyone still needs to jointly maintain it.

hotcomm

Su Bingtian has broken the 10-second mark 7 times and is currently in good condition. This means that as long as he performs stably, Su Bingtian is very hopeful to stand on the stage of the 100-meter flying showdown in the Olympic Games. At the same time, he is also expected to l - DayDayNews

Su Bingtian has broken the 10-second mark 7 times and is currently in good condition. This means that as long as he performs stably, Su Bingtian is very hopeful to stand on the stage of the 100-meter flying showdown in the Olympic Games. At the same time, he is also expected to l

Leading the Chinese track and field team in the decisive battle in Tokyo: How does Su Bingtian conduct special strength training for running?

06/26 1686

Training Method 1 for Level 2 Sprinters After an athlete chooses to reach the training level of a Level 3 sprinter, he or she can enter the training stage for Level 2 sprinters. This stage of training usually lasts for three years, and teenagers aged 15 to 17 are usually selected - DayDayNews

Training Method 1 for Level 2 Sprinters After an athlete chooses to reach the training level of a Level 3 sprinter, he or she can enter the training stage for Level 2 sprinters. This stage of training usually lasts for three years, and teenagers aged 15 to 17 are usually selected

Track and Field·Sprinting Level 2 Athletes Training Manual

06/26 1744

Although the shot put is an unpopular event in track and field, the world record of 22.63 meters set by the former Soviet star Lisovskaya in 1987 has made it difficult and daunting for subsequent players to challenge. In the past three seasons, the Chinese star Gong Lijiao has ra - DayDayNews

Although the shot put is an unpopular event in track and field, the world record of 22.63 meters set by the former Soviet star Lisovskaya in 1987 has made it difficult and daunting for subsequent players to challenge. In the past three seasons, the Chinese star Gong Lijiao has ra

The five most difficult track and field world records to break, Bolt ranked third with 9.58 seconds. The top spot has been dusted for 33 years.

06/26 1267

In all sports in the world, there are too many records set here. But based on influence, attention, and competitions, the greatest records are today’s five. - DayDayNews

In all sports in the world, there are too many records set here. But based on influence, attention, and competitions, the greatest records are today’s five.

The five greatest records in world sports are unprecedented and we look forward to others coming in the future.

06/26 1800

It is not easy to break the world record in track and field events, and yellow people need to work harder if they want to break the world record. What the author wants to share today is the 9 world records that our country still holds. - DayDayNews

It is not easy to break the world record in track and field events, and yellow people need to work harder if they want to break the world record. What the author wants to share today is the 9 world records that our country still holds.

Do you know all the 8 track and field world records that the Chinese still hold?

06/26 1192

The world records in track and field events symbolize the limits of human beings. Every improvement is the result of the hard work of countless athletes. However, some talented athletes have greatly improved the world records through their own efforts and achieved great leaps. Le - DayDayNews

The world records in track and field events symbolize the limits of human beings. Every improvement is the result of the hard work of countless athletes. However, some talented athletes have greatly improved the world records through their own efforts and achieved great leaps. Le

Five great leaps in the world record: Wang Junxia is on the list, Bolt breaks 0.11 seconds for 100 meters

06/26 1680

The Tokyo Olympics is the first Olympics after Bolt retired. The performance of the two 100-meter flying athletes in this Olympics attracted the most attention. The first was Italian star Jacobs who was crowned the new Olympic champion with a time of 9.80 seconds. The second was - DayDayNews

The Tokyo Olympics is the first Olympics after Bolt retired. The performance of the two 100-meter flying athletes in this Olympics attracted the most attention. The first was Italian star Jacobs who was crowned the new Olympic champion with a time of 9.80 seconds. The second was

The latest top 10 rankings of 100-meter records in various countries: Su Bingtian helped China rise to fifth place, ranking first in 9 seconds 58

06/26 1174

For example, sprinter Bolt, 800-meter Rudisha, and 110-meter hurdles Liu Xiang. If an athlete can be called the king of an event, it is his greatest honor. - DayDayNews

For example, sprinter Bolt, 800-meter Rudisha, and 110-meter hurdles Liu Xiang. If an athlete can be called the king of an event, it is his greatest honor.

Marathon + Decathlon world record is born, who is the king of track and field today?

06/26 1525

Bolt is the number one man in track and field in the world. He brings us a lot of speed and passion on the track. He is tall, 1.96 meters tall, and has amazing talent. He is a victorious general who pushes human speed to greater heights again and again. High limit. - DayDayNews

Bolt is the number one man in track and field in the world. He brings us a lot of speed and passion on the track. He is tall, 1.96 meters tall, and has amazing talent. He is a victorious general who pushes human speed to greater heights again and again. High limit.

Bolt set 5 great records that are difficult to surpass, the last one is called a miracle

06/26 1336

The World Sprint Five refers to the five fastest people in human history: Bolt, Blake, Gay, Powell, and Gatlin. NO. 19 seconds 58 Bolt Bolt defeated his strong rival American Guy to win the championship at the 2009 World Athletics Championships in Berlin, setting a new world reco - DayDayNews

The World Sprint Five refers to the five fastest people in human history: Bolt, Blake, Gay, Powell, and Gatlin. NO. 19 seconds 58 Bolt Bolt defeated his strong rival American Guy to win the championship at the 2009 World Athletics Championships in Berlin, setting a new world reco

collect! The world's top five sprinters, led by Bolt, swept the top ten fastest 100-meter times in human history

06/26 1374