C L A R I N (Q2)

abril 14, 2008

          The abbreviations for  C L A R I N stand for Common Language Resources and Technology Infrastructure. The CLARIN project is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable. CLARIN offers scholars the tools to allow computer-aided language processing, addressing one or more of the multiple roles language plays (i.e. carrier of cultural content and knowledge, instrument of communication, component of identity and object of study) in the Humanities and Social Sciences.

          Its initiative offers:

• Comprehensive service to the humanities disciplines with respect to language resources and technology.
• Technology overcoming the many boundaries currently fragmenting the resources and tools landscape as it is given by institutional, structural and semantic interoperability problems.
• Tools and resources that will be interoperable across languages and domains, thus addressing the issue of preserving and supporting the multilingual and multicultural European heritage.
• Comprehensive training and education programs that include university education in the different member states.
• Improvement and extension of web-based collaborations, i.e. creating virtual working groups breaking the discipline boundaries.
• Development or improvement of standards for language resource maintenance.
• A persistent and stable infrastructure that researchers can rely on for the next decades.

          To achieve these challenging goals CLARIN will be built on and contribute to a number of key technologies coming from the major initiatives advancing the eScience paradigm:

• It includes Data Grid technology to connect the repositories as being implemented in the DAM-LR pilot project and web services the various centres provide;
• It builds on ideas launched by the Digital Library community to create Live Archives, and will further such initiatives;
• It incorporates, and contributes to, Semantic Web technology to overcome the structural and semantic encoding problems;
• It incorporates advanced multi-lingual language processing technology that supports cultural and linguistic integration.

The purpose of the infrastructure is to offer persistent services that are secure and provide easy access to language processing resources. As language, speech and vision technology improve, it should be commonplace to carry out tasks such as: ‘summarize Le Monde from 11th March 2007’ ‘list all uses of “enthusiasm” in 19th century English novels written by women’, ‘find all video clips of Tony Blair on the BBC in 2007’. But without the proper infrastructure, the technologies to make these tasks possible will only be available to a few specialists. At present one needs to find an appropriate program (to do translation, summarization, or extraction of information, etc.), download the program, make sure it is compatible with the computer that will execute the program, understand the form of input it takes, download the data (e.g. novels, newspapers, corpus, videos), and convert them to the correct format for the programs, and all this before one can get started.

For most researchers outside computer science, at least one of these tasks will be an insurmountable barrier. Our vision is that the resources for processing language, the data to be processed as well as appropriate guidance, advice and training be made available and can be accessed over a distributed network from the user’s desktop. CLARIN proposes to make this vision a reality: the user will have access to guidance and advice through distributed knowledge centres, and via a single sign-on the user will have access to repositories of data with standardized descriptions, processing tools ready to operate on standardized data, and all of this will be available on the internet using a service oriented architecture based on secure grid technologies.

The nature of the project is therefore primarily to turn existing, fragmented technology and resources into accessible and stable services that any user can share or adapt and repurpose. CLARIN can build upon a rich history of national and European initiatives in this domain, and it will ensure that Europe maintains the leading position in humanities and social science research in the current highly competitive era.

Language Technology Lab (Q2)

abril 2, 2008

