In the TraCorpEx project, we pursue a more restricted goal: to enlarge bilingual corpora of parallel utterances to more languages, because some large such corpora have become freely available and one of them, the "Tanaka corpus" of Japanese-English sentence pairs, has been proven by J.!Breen to be useful as a source of examples while consulting JEDict. Translating all these elements into all languages is a necessary contribution to the Papillon project. Storing can be achieved in a simple and «!seamless!» way by introducing «!auxiliary» lexies and axies for these «!free language elements!». In a multilingual richly structured lexical data base such as Papillon, examples, citations, definitions and glosses expressed in each language have to be translated into all other languages and stored into the data base. Jedict breen manual#We also provide manual word alignment correction which is visualized in the tool and can lead to its gamification in the future, thus, providing a valuable source of word / phrase alignments. We have designed a novel tool that provides aides like references to various dictionary sources such as Wordnets, Shabdkosh, Wikitionary etc. The tool interface includes a corpora management system which facilitates maintenance of parallel corpora by assigning roles such as manager, lexicographer etc. It also provides automatic translation of the source sentence using an integrated MT system. It provides various dictionary references as help within the interface which increase the productivity and efficiency of a lexicographer. Then, it provides with a helpful interface to lexicographers for manual translation / validation, and gives out the corrected text files as output. Jedict breen zip#It takes source and target text of a corpus for any language pair in text file format, or zip archives containing multiple corresponding text files. We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). Jedict breen free#A main goal of this effort is to offer occasional and volunteer translators and posteditors access to a free TWS and to sharable translation memories put in the MPM format. The first 2 stages are operational, and used for experimentation and MT evaluation on the CSTAR 5-lingual BTEC corpus and on the Japanese-English Tanaka corpus used as a source of examples in electronic dictionaries (JDict, Papillon). A third interface is planned for giving feedbacks to the developers of the MT systems, in the form of lists of unknown or wrongly translated words, with suggestions for correct translations, and of parallel presentation of pairs of translations showing the "editing work" to be done to get one from the other. Another serves to send sentences to MT systems with appropriate parameters, and to run various evaluation measures (NIST, BLEU, and distance computations) in order to propose to the translator a "best" proposal. One is a web-oriented translator workstation (TWS), where suggestions or translations come from the MPM itself, which functions as its own translation memory, and from calls to MT systems. The MPM part of PolyphraZ has 3 main web interfaces. An MPM stores one or more corpora of polyphrazes. A "polyphrase" is a structure containing an original sentence and various proposals of equivalent sentences, in the same and other languages. In a third stage, they are put in a Multilingual Polyphraz Memory (MPM). Then, they are assembled (CPXM.dtd) to visualize them in parallel through the web. Corpus files (monolingual or multilingual) are firstly converted to a standard coding (CXM.dtd, UTF8). The PolyphraZ tool is being developed in the framework of the TraCorpEx project (Translation of Corpora of Examples), to manage parallel multilingual corpora through the web.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |