Starting from August 12, 2022, the "Magichub Chinese-British Mixed ASR Challenge" co-sponsored by Magic Data, TAL, Tsinghua University , Institute of Acoustics, Chinese Academy of Sciences has received more than 30 Support the registration of participating teams from domestic and foreign research institutions, well-known enterprises and universities, including litchi FM, Terminus, NetEase Games, China Mobile Online, Chinese Academy of Sciences , Huazhong University of Science and Technology , University of Science and Technology of China , Northwestern Polytechnical University, Xiamen University , Tianjin University, etc. On August 24, the organizers officially opened the development training set and baseline system to participating teams.
Registration is in progress
https://magichub.com/join-competition/?id=11627
development training set
The organizer has opened the following training and development data sets:
1, MagicData-RAMC Including 351 sets of multiple rounds of Mandarin conversations, with a total duration of 180 hours. The annotation information of each group of conversations includes transcript text, voice activity timestamps, speaker information, recording information, and topic information. Speaker information includes gender, age and region, and recording information includes environment and equipment. Please check the email to download the data set.
2 and TAL_CSASR mixed Chinese and English speech data sets, which are TAL English course audios with a total duration of 587 hours. Including mixed speech in Chinese and English, each audio has only one speaker, including more than 200 speakers in total. Please check the email to download the data set.
3, development set (Dev), including 14 speakers, with a total duration of about 6.8 hours.
All participants are expected to abide by the following rules:
1. DATA: Only MagicData-RAMC and TAL_CSASR are allowed. Data augmentation can use two noisy data sets, namely MUSAN (openslr17), RIRNoise (openslr 28).
2. It is strictly prohibited to use the test set in any form, including but not limited to using the test data set to fine-tune or train the model.
3. Allows multi-system integration. Fusion using systems with the same structure is however discouraged.
4. All models should be trained on allowed datasets. Specifically, pretrained models are not allowed to use other datasets (including unlabeled data).
5. The final interpretation right belongs to the organizer.
Baseline system introduction
In order to help contestants evaluate system performance, the organizer provides baseline system performance for contestants' reference. The system adopts the Transformer model and is developed based on the ETEH platform.
For detailed information, please see:
https://github.com/MagicHub-io/CSASR_Challenge
Scoring tool
uses the open source scoring tool Sclite for scoring. The scoring indicator uses Mixed Error Rate (MER), which calculates the word error rate for Chinese and the word error rate for English. Please see
scoring examples. :
https://github.com/MagicHub-io/CSASR_Challenge/blob/main/dev_scoring_sclite.sh
Baseline system Q&A guide
If you have any questions about the baseline system, please visit the following link for help, and a team of experts will answer it.
Q&A express:
https://github.com/MagicHub-io/CSASR_Challenge#contact
Award settings
The competition will set first prize, second prize and third prize respectively. Three groups of winning teams/individuals will be selected. The winners will have the opportunity to participate in on-site demonstrations and exchange activities at international and domestic top conferences.
1 first prize: Huawei Watch + Apu fascia gun (worth 3,000 yuan) + award certificate
2 second prizes: Magic Data Koi gift pack + TAL Future & Lingmei joint pen gift box (worth 1,500 yuan) +Award certificate
3 third prizes: Magic Data customized gift + Apu weight scale (worth 500 yuan) + award certificate
Schedule setting
Competition organizing committee support team
For questions related to the challenge, please send an email to [email protected], the email title is "Questions about the Chinese-English Mixed ASR Challenge". If you have questions, the following senior technical experts from the organizing committee will provide professional technical Q&A and guidance. The guiding experts have been working in the speech field for many years and have rich research and practical experience. I believe that the contestants will be inspired and gain from their guidance.
Registration method
Registration address: https://magichub.com/join-competition/?id=11627
Number of participants: Each team has no more than 4 participants (including 4 people)
More details: www.magichub.com