Skip to content
Change the repository type filter

All

    Repositories list

    • keops

      Public
      Tool for manual evaluation of parallel sentences.
      PHP
      GNU General Public License v3.0
      51501Updated Jan 26, 2026Jan 26, 2026
    • giashard

      Public
      Sharding program for Paracrawl
      Go
      2200Updated Sep 24, 2025Sep 24, 2025
    • Scripts for running bitextor/paracrawl/europat jobs on cirrus.ac.uk
      Shell
      1781Updated Sep 26, 2024Sep 26, 2024
    • giawarc

      Public
      Processing utilities for Internet Archive
      C++
      0141Updated Apr 19, 2024Apr 19, 2024
    • corset

      Public
      Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.
      SCSS
      GNU General Public License v3.0
      42110Updated Nov 6, 2023Nov 6, 2023
    • Scripts for obtaining patent data
      Java
      2511Updated Apr 14, 2023Apr 14, 2023
    • tmxutil

      Public
      Tools to generate & filter Europat tmx files.
      Python
      MIT License
      1410Updated Jan 17, 2023Jan 17, 2023
    • synthesis

      Public
      Data synthesis by contextualizing glossary translations
      Python
      2500Updated Jul 1, 2021Jul 1, 2021
    • Automate download and training with OPUS corpora
      Shell
      MIT License
      2200Updated Jan 28, 2021Jan 28, 2021
    • Results of the human evaluation
      Rich Text Format
      3500Updated Dec 9, 2020Dec 9, 2020
    • Open here any Paracrawl corpus related issue
      0000Updated Nov 18, 2020Nov 18, 2020
    • Creative Commons Zero v1.0 Universal
      0000Updated Nov 13, 2020Nov 13, 2020
    • b64filter

      Public archive
      Program for operating on one document per Base 64 encoded line files
      Go
      0110Updated Aug 4, 2020Aug 4, 2020
    • InDomain detection is a tool designed to extract in-domain data from a large collections of data.
      Python
      GNU General Public License v3.0
      1100Updated Jun 5, 2020Jun 5, 2020
    • Python
      0000Updated Mar 6, 2020Mar 6, 2020
    • Python
      0100Updated Mar 6, 2020Mar 6, 2020
    • go-warc

      Public
      A golang library to work with WARC files from the common crawl
      Go
      GNU General Public License v2.0
      8000Updated Aug 4, 2019Aug 4, 2019
    • extractor

      Public
      C++
      Apache License 2.0
      32410Updated Nov 29, 2017Nov 29, 2017
    • embedding

      Public
      Mine parallel corpora with embeddings
      Perl
      0400Updated Sep 2, 2017Sep 2, 2017
    • Data collection, alignment and TAUS repository
      Python
      Apache License 2.0
      8830Updated Aug 1, 2017Aug 1, 2017
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.