Uplug – a modular corpus tool for parallel corpora

in Parallel corpora, parallel worlds
Restricted Access
Get Access to Full Text

Subject Highlights

Abstract

This article describes the Uplug-system, a modular software platform intended for the integration of text processing tools. It includes three components: An extensible I/O library which provides a transparent interface for working with textual data, a tool for combining single-task text and corpus processing modules into sequentially executable systems, and a graphical user interface for running Uplug applications, modifying parameter settings, and investigating resulting data. The system supports a variety of storage formats, including those of standard database management tools such as SDBM and GDBM as well as simple XML formats and other text oriented data formats. Furthermore, connections to relational databases are supported via a transparent database toolbox. Uplug applications can be adjusted easily by modifying standardised configuration files. A prototype of the Uplug-system is currently used in a Linux version at Uppsala University with modules for processing bilingual parallel text, such as modules for several kinds of word alignment and data generation from parallel texts, as well as tools for the examination and evaluation of the results that are produced.

Parallel corpora, parallel worlds

Selected papers from a symposium on parallel and comparable corpora at Uppsala University, Sweden, 22-23 April, 1999

Series:

Table of Contents

Information

Metrics

Metrics

All Time Past Year Past 30 Days
Abstract Views 23 23 6
Full Text Views 11 11 3
PDF Downloads 3 3 2
EPUB Downloads 0 0 0

Related Content