“Project SHINRA” aims to build structured Knowledge Base combining Wikipedia and Extended Named Entity by Collaborative Contribution methodology

Structuring Wikipedia and RbCC

Wikipedia, which is created by crowds and is a great Knowledge base, has so many entries and up-to-date information. However, most of the information is written for people to read, is not for machine to manipulate.

We are aiming at structuring the information in Wikipedia for machine to manipulate.

More concretely, we will host shared-tasks to that goal. The tasks aim to structure Wikipedia entities based on the Extended Named Entity definitions, which includes 200+ categories and attributes defined for each category. The outputs by the participated systems will be unified, and the final results will be distributed as a structured KB. We call this scheme as “Resource by Collaborative Contribution (RbCC)”, and asking many collaborators to participate the tasks.

Shared-tasks

* SHINRA2020-jp

The task is to structure Japanese Wikipedia entities. The categories includes those used at SHINRA2019, JP-5 and JP-30, as well as the new 47 categories under facilities and events, JP-47.

* SHIRA2020-ML

The task is to categorize Wikipedia entities in 30-languages. The training data is provided by hand-categorized Japanese Wikipedia entities and language-links to each language Wikipedia. For example, 300K entities in German Wikipedia has a link from 920K hand-categorized Japanese Wikipedia. The participants are supposed to categorize the remaining 1M German entities. This task will be run as one of the NTCIR-15 shared tasks.

The 30 languages are English, Spanish, French, German, Chinese, Russian, Portuguese, Italian, Arabic, Indonesian, Turkish, Dutch, Polish, Persian, Swedish, Vietnamese, Korean, Hebrew, Romanian, Norwegian, Czech, Ukrainian, Hindi, Finnish, Hungarian, Danish, Thai, Catalan, Greek, Bulgarian. (These Wikipedias have the largest number of users.)

* SHINRA2019-JP

The task is to structure Japanese Wikipedia entities. The categories includes those used at SHINRA2018, JP-5, as well as the new 30 categories under location and organization, JP-30 (2 categories are not used due to the small number of entities). There were 11 participants to this task.

* SHINRA2018-JP

The task is to structure Japanese Wikipedia entities in 5 categories. The categories are person, city, company, airport and chemical compound. There were 8 participants to this task.

Resource

Extended Named Entity Definition

219 categories are defined for Names, Numerical values and Date&Time. Examples include person, company, city, airport and chemical compound. Each category has its attribute definitions 10 to 30 attributes for each category).