9-11 October 2018
Europe/Lisbon timezone

INCEpTION - Corpus-based Data Science from Scratch

Not scheduled


ISCTE, University of Lisbon
Poster Posters


Richard Eckart de Castilho (Technische Universität Darmstadt, Ubiquitous Knowledge Processing (UKP) Lab)


In recent years, corpus-based data science has seen rapid adoption both in science and in industry. Developing corpus-based models for text mining from scratch has penetrated a huge number of application fields. This renders common approached to corpus annotation unscalable. Instead machine-assisted annotation with a human-in-the loop is becoming crucial for the adoption of NLP by data scientists. INCEpTION [https://inception-project.github.io] is a web-based annotation platform for machine-assisted annotation which provides such a tool. The platform targets users in any domain or application scenario that are in need of text that is annotated with specific categories and relations or linked to knowledge bases. It uses machine learning to provide annotation suggestions including active-learning driven guidance, thus improving annotator efficiency and quality. The modular architecture allows using different external annotation services to provide such suggestions. It supports entity disambiguation and linking, cross-document coreference, as well as fact linking using custom domain-specific RDF-based internal knowledge bases or using local or remote external knowledge bases through SPARQL. Annotation interoperability is ensured through the use of UIMA [http://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html] as well as through the support of various annotation formats including CLARIN TCF. At the level of the annotation scheme, the platform is compatible with the DKPro Core [https://www.aclweb.org/anthology/W/W14/W14-5201.pdf] type system which facilitates interoperability with many of the NLP tools integrated within DKPro Core. INCEpTION is a multi-user platform. Users assume different roles (e.g. admin, project creator, normal user) on the platform as well as in individual projects (manager, annotator, adjudicator). User authorization can be delegated to an external mechanism users to be authenticated against infrastructure identity providers. This is essential for the deployment of the platform at the level of local or national infrastructures where it is used by users from many different organizations. Being a web-based tool these geographically distributed users can also conveniently collaborate on annotation projects within the platform. Further connectivity with other services is possible through a remote access API compatible with the OpenMinTeD AERO [https://openminted.github.io/releases/aero-spec/1.0.0/omtd-aero/] protocol that permits the automated setup and management of annotation projects. This allows projects to embed the annotation tool into a larger annotation campaign management process. It can also be used in a classroom scenario to automatically set up and tear down projects for students. INCEpTION is fully open-source, openly developed on GitHub and published under the liberal Apache License 2.0. It is our goal to not only develop a comprehensive semantic text annotation platform, but also to grow a community around it and thus to promote a community-driven sustainability model for the platform. We believe the high level of interoperability, the generic nature of the tool, the open development process and the liberal license are key factors in this strategy.


INCEpTION [https://inception-project.github.io] is an web-based annotation platform for machine-assisted annotation which enables the development of corpus-based models for text mining from scratch or the adaptation of models to new domains using a human-in-the-loop approach. The use of and compatibility with standards for annotation representation, knowledge representation, authentication, as well as the ability to make use of to external text mining services and to be controlled through a web-service API allow the platform to function as part of a larger infrastructure.

Type of abstract Poster

Primary authors

Iryna Gurevych (Technische Universität Darmstadt, Ubiquitous Knowledge Processing (UKP) Lab) Jan-Christoph Klie (Technische Universität Darmstadt, Ubiquitous Knowledge Processing (UKP) Lab) Richard Eckart de Castilho (Technische Universität Darmstadt, Ubiquitous Knowledge Processing (UKP) Lab)

Presentation Materials

There are no materials yet.