Linguistic services

This is the Java porting of the perl-based tokenizer developed within the opener project and available here

Current Software version 0.2, released on 04/10/2017

ILC4CLARIN provides three sets of distinct web services to perform tokenization on texts for the following languages:

The application arises an Unsupported Language Exception if the language provided is not in the list.

Offered services perform the same operation (tokenization), but, according with the endpoints, valid TCF, KAF or tabbed files can be produced.

The service that produces TCF can read from both a plain text or a valid TCF document. The mimetype is set accordingly.

This page explains how to invoke the offered services.

The endpoints are the following:

The language is provided as a parameter:

PLEASE NOTE THIS CALL. For TCF when a TCF document is sent in input, NO LANGUAGE PROVIDED AS PARAMETER

For Language Resource Switchboard (please note the lrs in the path) we added three additional endpoints

The endpoints are the following:

Both the language and the url are provided as a parameters:

This because the integration of services in Language Resource Switchboard requires the URL passed as an input parameter.

You can test the service endpoints using curl or wget as follows:

Please note that services designed for Language Resource Switchboard clearly work by themselves invoking the commands above.

As for plain text you can use
 Mi chiamo Riccardo. Abito a Roma
As for TCF text you can use
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://de.clarin.eu/images/weblicht-tutorials/resources/tcf-04/schemas/latest/d-spin_0_4.rnc" type="application/relax-ng-compact-syntax"?>
    <D-Spin xmlns="http://www.dspin.de/data" version="0.4">
        <md:MetaData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cmd="http://www.clarin.eu/cmd/" 
            xmlns:md="http://www.dspin.de/data/metadata" 
            xsi:schemaLocation="http://www.clarin.eu/cmd/ http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1320657629623/xsd">
        </md:MetaData>
            <tc:TextCorpus xmlns:tc="http://www.dspin.de/data/textcorpus" lang="it">
                <tc:text>
                    Mi chiamo Alfredo. Abito a Roma.
                </tc:text>
            </tc:TextCorpus>
    </D-Spin>
        

Contacts

In case of problems write an email to The ILC4CLARIN technical staff with all the information needed to solve the issues, included the version number.