Home

SHINRA2020-ML is a shared-task on text categorization, but at the same time, it is a resource creation project.

The task is to categorize Wikipedia entities in 30 languages into the name ontology, called Extended Named Entity. We will provide annotated training data for each language, which is created based on the already-categorized 780K Japanese articles (entities) and the language links. For example, out of the 780K Japanese articles, 511K articles have a language link to English articles among 5,790K English articles. The participants can use the 511K articles as the training data and the task is to categorize the remaining 5.3M English articles. Similarly, the training data will be available in the languages as shown in Data description page. The test data is blind for the participants. They are required to submit the output for all remaining data. That data will be open immediately so that people can try ensemble learning to create the resource of the categories of Wikipedia pages in 30 languages.