RSS

Mayo 20, 2009
RSS is a family of web sources  formats encoded in XML. It is used to provide updated information to subscribers frequently. The format makes it possible to distribute content without a browser, using a software designed to read RSS feeds. Despite this, it is possible to use the same browser to view RSS content. The latest versions of major browsers can read RSS feeds without additional software. RSS is part of the family of XML formats developed specifically for all types of sites that are updated frequently and through which you can share information and use it on other sites or programs. This is known as re-organize  web site.
There are three types of RSS and its initials acquire a different meaning depending on the specification used:
  • Rich Site Summary. (RSS 0.91)
  • RDF Site Summary. (RSS 0.9 and 1.0)
  • Really Simple Syndication. (RSS 2.0)

 The RSS file is rewritten automatically when there is an update on the contents of the website. Accessing the RSS file is impossible to know if they have updated the content and how news texts, but without the need to access the site except to read the extended version.

 

References:

 


Project´s list (2nd questionnaire).

Mayo 20, 2009

Here we have the list of projects I have chosen for this article:

1. Computational semantics. (Language technology world).

2. Language checking. (Language technology world).

3. Knowledge Discovery. (Language technology world).

4. Semantic web. (DFKI).

5. Music Information Retrieval. (DFKI).

6. Collaborating Using Diagrams. (Language Technology Group).

7. Crossmarc. (Language Technology Group).

8. Shallow Semantic Parsing. (SNLP).

9. Detecting contradictions in Text. (SNLP).

10. Document indexing for German and English. (DFKILT).

References:


Multiword expression (MWE) (Q2)

Mayo 20, 2009
Multiword expression (MWE): any phrase that is not entirely predictable on the basis of standard grammar rules and lexical entries
No immediate counterexamples to the claim that any expression that can be realised hyphenated/as a single lexeme or alternatively with spaces (e.g. mailman/postman vs. mail/post man), is a MWE. This could be used in the evaluation of extraction techniques, possibly using external resources to determine whether extracted expressions can be expressed hyphenated/without spaces (e.g. determine “optimal extraction volume” as the point where the ratio of such expressions is maximised)

human language technologies (Q1)

Marzo 25, 2009

          Language technology is often called human language technology (HLT) or natural language processing (NLP) and consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.

          It makes it easier for people to interact with machines. This can benefit a wide range of people – from illiterate farmers in remote villages who want to obtain relevant medical information over a cellphone, to scientists in state-of-the-art laboratories who want to focus on problem-solving with computers.

          The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages.

          HLTCentral is a dedicated server providing a gateway to speech and language technology opportunities on the Web. HLTCentral web site is an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide – with a unique European perspective.

          The HLT Research Group studies how this technology can be applied, adapted and developed to benefit the people from southern Africa.    

          The HLT research group investigates how HLT can be adapted and applied to benefit a developing country and pursues basic and directed research relevant to the local context. This goal is considered from two perspectives:

  • HLT as an enabling technology that can play a crucial role in addressing the need for information empowerment. An example is telephone-based systems using HLT that can provide much useful information.
  • HLT as a support for language diversity in an affordable and equitable fashion. HLT can assist industry and government to make services and documents available in the 11 official languages and has a role to play in rectifying the historical discrimination against specific languages.

martin kay (Q1)

Marzo 25, 2009

Martin Kay is a computer scientist known especially for his work in computational linguistics. He was responsible for introducing the notion of chart parsing in computational linguistics, and the notion of unification in linguistics generally. With Ron Kaplan, he pioneered finite-state morphology. He has been a longtime contributor to, and critic of, work on machine translation. Permanent chairman of the International Committee on Computational Linguistics, Kay was a Research Fellow at the Xerox Palo Alto Research Center until 2002. Gothenburg University has made him an honorary Filosofi Doktor.

He is a recruitment consultant within the Professional Practice arena covering the East coast of Scotland. His areas of expertise are the following: CIPFA, ICAS, ACCA, Audit (External, Internal, IT), Tax (Corporate, Personal, Employment, Indirect), Business Services, Corporate Recovery, Corporate Finance, Forensic Accounting, Project Finance, Management Consulting and Financial Services.

Hammond Resources is an independant agency, this allows us flexibility with our service delivery. We pride ourselves on our reputation. Our vision is to be recognised as Scotland’s favourite recruitment agency.


HANS USZKOREIT (Q1)

Mayo 14, 2008

          Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. By cooptation he is also Professor of the Computer Science Department. 

          Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin. He co-founded the Berlin city magazine Zitty, for which he worked as an part-time editor and writer. In 1977, he received a Fulbright Grant for continuing his studies at the University of Texas at Austin. During his time in Austin he also worked as a research associate in a large machine translation project at the Linguistics Research Center.  In 1984 Uszkoreit received his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. While working at SRI, he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts). At the same time he also taught at the University of Stuttgart.

          In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at  DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation). He is also co-founder and professor of the “European Postgraduate Program Language Technology and Cognitive Systems”, a joint Ph.D. program with the University of Edinburgh.

          Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards.  He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.

          His current research interests are computer models of natural language understanding and production, advanced applications of language and knowledge technologies such as semantic information systems, translingual technologies, cognitive foundations of language and knowledge, deep linguistic processing of natural language, syntax and semantics of natural language and the grammar of German.


The Stanford NLP Group (Q2)

Mayo 14, 2008

          The Natural Language Processing Group at Stanford University is a team of faculty, postdocs, and students who work together on algorithms that allow computers to process and understand human languages. Our work ranges from basic research in computational linguistics to key applications in human language technology, and covers areas such as sentence understanding, probabilistic parsing and tagging, biomedical information extraction, grammar induction, word sense disambiguation, and automatic question answering.

          A distinguishing feature of the Stanford NLP Group is our effective combination of sophisticated and deep linguistic modeling and data analysis with innovative probabilistic and machine learning approaches to NLP. Our research has resulted in state-of-the-art technology for robust, broad-coverage natural-language processing in many languages. These technologies include our part-of-speech tagger, which currently has the best published performance in the world; a high performance probabilistic parser; a competition-winning biological named entity recognition system; and algorithms for processing Arabic, Chinese, and German text.

          The Stanford NLP Group includes members of both the Linguistics Department and the Computer Science Department, and is affiliated with the Stanford AI Lab and the Stanford InfoLab.


C L A R I N (Q2)

Abril 14, 2008

          The abbreviations for  C L A R I N stand for Common Language Resources and Technology Infrastructure. The CLARIN project is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable. CLARIN offers scholars the tools to allow computer-aided language processing, addressing one or more of the multiple roles language plays (i.e. carrier of cultural content and knowledge, instrument of communication, component of identity and object of study) in the Humanities and Social Sciences.

          Its initiative offers:

• Comprehensive service to the humanities disciplines with respect to language resources and technology.
• Technology overcoming the many boundaries currently fragmenting the resources and tools landscape as it is given by institutional, structural and semantic interoperability problems.
• Tools and resources that will be interoperable across languages and domains, thus addressing the issue of preserving and supporting the multilingual and multicultural European heritage.
• Comprehensive training and education programs that include university education in the different member states.
• Improvement and extension of web-based collaborations, i.e. creating virtual working groups breaking the discipline boundaries.
• Development or improvement of standards for language resource maintenance.
• A persistent and stable infrastructure that researchers can rely on for the next decades.

          To achieve these challenging goals CLARIN will be built on and contribute to a number of key technologies coming from the major initiatives advancing the eScience paradigm:

• It includes Data Grid technology to connect the repositories as being implemented in the DAM-LR pilot project and web services the various centres provide;
• It builds on ideas launched by the Digital Library community to create Live Archives, and will further such initiatives;
• It incorporates, and contributes to, Semantic Web technology to overcome the structural and semantic encoding problems;
• It incorporates advanced multi-lingual language processing technology that supports cultural and linguistic integration.

The purpose of the infrastructure is to offer persistent services that are secure and provide easy access to language processing resources. As language, speech and vision technology improve, it should be commonplace to carry out tasks such as: ’summarize Le Monde from 11th March 2007′ ‘list all uses of “enthusiasm” in 19th century English novels written by women’, ‘find all video clips of Tony Blair on the BBC in 2007′. But without the proper infrastructure, the technologies to make these tasks possible will only be available to a few specialists. At present one needs to find an appropriate program (to do translation, summarization, or extraction of information, etc.), download the program, make sure it is compatible with the computer that will execute the program, understand the form of input it takes, download the data (e.g. novels, newspapers, corpus, videos), and convert them to the correct format for the programs, and all this before one can get started.

For most researchers outside computer science, at least one of these tasks will be an insurmountable barrier. Our vision is that the resources for processing language, the data to be processed as well as appropriate guidance, advice and training be made available and can be accessed over a distributed network from the user’s desktop. CLARIN proposes to make this vision a reality: the user will have access to guidance and advice through distributed knowledge centres, and via a single sign-on the user will have access to repositories of data with standardized descriptions, processing tools ready to operate on standardized data, and all of this will be available on the internet using a service oriented architecture based on secure grid technologies.

The nature of the project is therefore primarily to turn existing, fragmented technology and resources into accessible and stable services that any user can share or adapt and repurpose. CLARIN can build upon a rich history of national and European initiatives in this domain, and it will ensure that Europe maintains the leading position in humanities and social science research in the current highly competitive era.


Language Technology Lab (Q2)

Abril 2, 2008

These themes are elaborated in research, development and commercial projects:

1. Computational semantics. (Language technology world).

2. Language checking. (Language technology world).

3. Knowledge Discovery. (Language technology world).

4. Semantic web. (DFKI).

5. The Stanford NLP Group

6. Collaborating Using Diagrams. (Language Technology Group).

7. CLARIN

8. Shallow Semantic Parsing. (SNLP).

9. Detecting contradictions in Text. (SNLP).

10. Document indexing for German and English. (DFKILT).


XML: Extended Markup Language

Enero 16, 2008

        XML, sigla en inglés de Extended Markup Languagelenguaje de marcas extensible»), es un metalenguaje extensible de etiquetas desarrollado por el World Wide Web Consortium (W3C). Es una simplificación y adaptación del SGML y permite definir la gramática de lenguajes específicos (de la misma manera que HTML es a su vez un lenguaje definido por SGML). Por lo tanto XML no es realmente un lenguaje en particular, sino una manera de definir lenguajes para diferentes necesidades. Algunos de estos lenguajes que usan XML para su definición son XHTML, SVG, MathML.

        XML no ha nacido sólo para su aplicación en Internet, sino que se propone como un estándar para el intercambio de información estructurada entre diferentes plataformas. Se puede usar en bases de datos, editores de texto, hojas de cálculo y casi cualquier cosa imaginable.

        XML es una tecnología sencilla que tiene a su alrededor otras que la complementan y la hacen mucho más grande y con unas posibilidades mucho mayores. Tiene un papel muy importante en la actualidad ya que permite la compatibilidad entre sistemas para compartir la información de una manera segura, fiable y fácil.

¿PARA QUÉ SIRVEN?

        Entre las tecnologías XML disponibles se pueden destacar:

        XSL : Lenguaje Extensible de Hojas de Estilo, cuyo objetivo principal es mostrar cómo debería estar estructurado el contenido, cómo debería ser diseñado el contenido de origen y cómo debería ser paginado en un medio de presentación como puede ser una ventana de un navegador Web o un dispositivo móvil, o un conjunto de páginas de un catálogo, informe o libro.

        XPath : Lenguaje de Rutas XML, es un lenguaje para acceder a partes de un documento XML.

        XLink : Lenguaje de Enlace XML, es un lenguaje que permite insertar elementos en documentos XML para crear enlaces entre recursos XML.

        XPointer : Lenguaje de Direccionamiento XML, es un lenguaje que permite el acceso a la estructura interna de un documento XML, esto es, a sus elementos, atributos y contenido.

        XQL : Lenguaje de Consulta XML, es un lenguaje que facilita la extracción de datos desde documentos XML. Ofrece la posibilidad de realizar consultas flexibles para extraer datos de documentos XML en la Web.

¿CÓMO FUNCIONAN?

        XSL funciona como un lenguaje avanzado para crear hojas de estilos. Es capaz de transformar, ordenar y filtrar datos XML, y darles formato basándolo en sus valores. XPath identifica partes de un documento XML concreto, como pueden ser sus atributos, elementos, etc. XLink por su lado, describe un camino estándar para añadir hiperenlaces en un archivo XML. Es decir, es un mecanismo de vinculación a otros documentos XML. Funciona de forma similar a un enlace en una página Web, es decir, funciona como lo haría <a href="" mce_href="">, sólo que a href es un enlace unidireccional. Sin embargo, XLink permite crear vínculos bidireccionales, lo que implica la posibilidad de moverse en dos direcciones. Esto facilita la obtención de información remota como recursos en lugar de simplemente como páginas Web. XPointer funciona como una sintaxis que apunta a ciertas partes de un documento XML, es como una extensión de XPath. Se utiliza para llegar a ciertas partes de un documento XML. Primero, XLink permite establece el enlace con el recurso XML y luego es XPointer el que va a un punto específico del documento. Su funcionamiento es muy similar al de los identificadores de fragmentos en un documento HTML ya que se añade al final de una URI y después lo que hace es encontrar el lugar especificado en el documento XML. Al ser XPointer una extensión de XPath, XPointer tiene todas las ventajas de XPath y además permite establecer un rango en un documento XML, es decir, con XPointer es posible establecer un punto final y un punto de inicio, lo que incluye todos los elementos XML dentro de esos dos puntos. Finalmente, XQL, lenguaje de consultas, se basa en operadores de búsqueda de un modelo de datos para documentos XML que puede realizar consultas en infinidad de tipos de documentos como son documentos estructurados, colecciones de documentos, bases de datos, estructuras DOM, catálogos, etc.