Speaker
Mr
Riccardo Del Gratta
(Istituto di Linguistica Computazionale "A. Zampolli")
Description
We describe the implications of (re)using the OpeNer and PANACEA Web Services into the CLARIN
Research Infrastructure.
The analyzed tools are of great interest for specific communities such as academic and small business
focused on sentiment/opinion analysis and on Machine Translation along with related technologies,
but their outcomes may be of great importance for the CLARIN audience as well.
In fact, the Virtual Language Observatory shows a lot of lexical resources for sentiment but a few tool, while a lot
of lexical resources and tools are available for Machine Translation. This means that the latter community is already in
CLARIN, while the former should be poked. If community-related challenges are on the political side, issues related to interoperability are definitely on the technical one.
The initiative is carried out at the ILC4CLARIN center in Pisa, the leading one of the CLARIN-IT national
Consortium.
The least common multiple between those two projects is neither limited to tools and Web Services nor
to the creation of annotated corpora and lexicons; neither to the focus they have on specific communities.
They also are based on (and strongly pursue and suggest) the concept of interoperability. This is clear from
the use of the Kyoto Annotation Format in OpeNer, of Graph Annotation
Format in PANACEA8 and of and the Lexical Markup Framework in both.
Data and tools interoperability is also a key asset in both CLARIN (https://www.clarin.eu/event/2017/clarin-workshop-towards-interoperability-lexico-semantic-resources)
and EUDAT (https://eudat.eu/communities/an-eudat-based-fair-data-approach-for-data-interoperability) .
Within CLARIN,
initiatives such as the Language Resource Switchboard and
openly go towards methodologies and “systems” to address the interoperability issues.
From a technical point of view the main issues are briefly
reported below:
1. Many tools in OpeNer and PANACEA are command line ones;
2. OpeNer o_ers both POST and GET API;
3. PANACEA built its Web Services using Soaplab11 and o_ers SOAP Web Services;
4. KAF, LMF and GrAF guarantee the interoperability among data and services;
5. Simple pipelines are available in OpeNer, while a workow engine has been used in PANACEA.
Tools are already wrapped, but to fully meet the requirements of both LRS and WebLicht we have to
build a new shell around the command line tools so that REST APIs can accept both POST and GET
requests and accept/produce different formats. Indeed if Language Resource Switchboard accepts tools
with their output format but requires to read data from URL in plain text, WebLicht accepts tools which
read and write the TCF format.
While OpeNer requires that the core (the command line) be wrapped into a REST shell, Web Services
in PANACEA need REST APIs around a SOAP core.
In the final paper, we will finalize the technical aspects and describe how the User Involvement group can play an important role in poking the sentiment/opinion community in CLARIN.
Type of abstract | Poster |
---|---|
Topic Area | Interoperability |
Primary author
Mr
Riccardo Del Gratta
(Istituto di Linguistica Computazionale "A. Zampolli")