Skip to content
Change the repository type filter

All

    Repositories list

    • Common web archive utility code.
      Java
      Apache License 2.0
      7663146Updated May 2, 2026May 2, 2026
    • warcaroo

      Public
      Java
      Apache License 2.0
      31850Updated Apr 29, 2026Apr 29, 2026
    • An Awesome List for getting started with web archiving
      Creative Commons Zero v1.0 Universal
      1892.5k43Updated Apr 27, 2026Apr 27, 2026
    • jwarc

      Public
      Java library for reading and writing WARC files with a typed API
      Java
      Apache License 2.0
      1658180Updated Apr 27, 2026Apr 27, 2026
    • WACAD

      Public
      Web Archive Collections as Data Project
      Jupyter Notebook
      0010Updated Apr 20, 2026Apr 20, 2026
    • Centralised repository for WARC usage specifications.
      HTML
      34127461Updated Apr 4, 2026Apr 4, 2026
    • Links on the web break all the time, robustify them!
      JavaScript
      76021Updated Mar 5, 2026Mar 5, 2026
    • warc2html

      Public
      Converts WARC files to static HTML
      Java
      Apache License 2.0
      85450Updated Sep 18, 2025Sep 18, 2025
    • javaswf

      Public
      Fork of JavaSWF2 for building Heritrix
      Java
      Other
      1400Updated Oct 17, 2024Oct 17, 2024
    • The OpenWayback Development
      Java
      Apache License 2.0
      3085181005Updated Jan 3, 2024Jan 3, 2024
    • web access control (exclusion oracle) tools for optional use with wayback machine
      JavaScript
      Apache License 2.0
      6907Updated Jan 2, 2023Jan 2, 2023
    • logtrix

      Public
      Java library/tool for parsing and summarising Heritrix crawl logs
      Java
      Apache License 2.0
      2733Updated Nov 16, 2022Nov 16, 2022
    • urlcanon

      Public
      url canonicalization library for python and java
      Java
      74040Updated May 22, 2022May 22, 2022
    • Dependencies needed to build Heritrix that aren't in Maven Central
      1400Updated Sep 1, 2021Sep 1, 2021
    • training

      Public
      Inventory of Web Archiving Training Resources
      1900Updated Oct 24, 2019Oct 24, 2019
    • An 'archive' of the Yahoo-hosted archive-crawler group
      2700Updated Oct 17, 2019Oct 17, 2019
    • qa2019

      Public
      Resources for the 2019 IIPC QA hackathon
      HTML
      36140Updated May 3, 2019May 3, 2019
    • A place to share practical bits of crawling experiences
      Apache License 2.0
      1400Updated Dec 12, 2018Dec 12, 2018
    • IIPC Open Development
      Apache License 2.0
      41100Updated Jun 16, 2017Jun 16, 2017
    • travis

      Public
      Shared config for Travis CI for IIPC.
      Shell
      Apache License 2.0
      2400Updated May 3, 2017May 3, 2017
    • heritrix3

      Public
      Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Java
      7821201Updated Mar 9, 2017Mar 9, 2017
    • cdx-cli

      Public
      Command line utility for working with CDX files
      Java
      Apache License 2.0
      4700Updated Sep 29, 2016Sep 29, 2016
    • IIPC Parent POM
      Apache License 2.0
      1400Updated May 24, 2016May 24, 2016
    • twittervane

      Public archive
      Using social media to steer web archiving and curation.
      JavaScript
      41810Updated Nov 20, 2015Nov 20, 2015
    • Sample Wayback Config using OpenWayback
      6600Updated Feb 7, 2014Feb 7, 2014
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.