Awesome

GitHub projects from awesome lists

Search awesome repositories

Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.

Continue with GitHub Browse awesome lists Request a list

Repos indexed: 17,379
Awesome lists tracked: 125
Current results: 103

Find repositories

Start broad, then narrow by ecosystem, freshness, health, and growth.

Clear 1 refinement

Search repositories

Search mode

Keyword Semantic

Tune results

The controls most people need first.

Awesome list

Language

Freshness

Sort

Direction

More filters Topics, generated tags, stack, files, age, archive status, and growth.

Ecosystem

GitHub topic

Generated tag

Framework or stack

Package manager

Files

Has file

Choose a suggestion or use commas to require multiple files.

Health

Minimum stars

Repository age

Uses known first-commit dates.

Archive status

AI development signals

Momentum

Unmaintained for

Commit velocity

Star growth

Reset filters

103 repos shown

List: awesome-web-archiving

Browse

Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

go-shiori/obelisk

Go package and CLI tool for saving web page as single HTML file

Stack

Go Cobra Go modules

GitHub topics

#archive #cli #go #golang #hacktoberfest

Updated: 2026-02-01
Lists: 1 list mention
First commit: 2020-03-29
History: 5 history points
License: MIT
Issues: 10 open

318

stars

Forks: 25
Commits: 94 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

WebMemex/freeze-dry

Snapshots a web page to get it as a static, self-contained HTML document.

Stack

TypeScript Vite npm

Updated: 2022-09-18
Lists: 1 list mention
First commit: 2017-07-13
History: 5 history points
License: Unlicense
Issues: 21 open

302

stars

Forks: 20
Commits: 269 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

harvard-lil/scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

Stack

JavaScript Express Flask npm pip Poetry

Updated: 2025-09-03
Lists: 1 list mention
First commit: 2022-09-20
History: 5 history points
License: MIT
Issues: 15 open

204

stars

Forks: 12
Commits: 1,255 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

karust/gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

Stack

Go Cobra Go modules

GitHub topics

#commoncrawl #concurrency #crawler #golang #wayback-machine #webarchive

Updated: 2024-11-04
Lists: 1 list mention
First commit: 2019-06-14
History: 5 history points
License: MIT
Issues: 0 open

183

stars

Forks: 17
Commits: 31 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stack

JavaScript npm Yarn

GitHub topics

#browser-automation #chrome #chrome-headless #crawler #crawling #headless-chrome

Updated: 2020-05-19
Lists: 1 list mention
First commit: 2017-07-20
History: 5 history points
License: Apache-2.0
Issues: 11 open

178

stars

Forks: 25
Commits: 119 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

internetarchive/warctools

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Stack

Python uv

Updated: 2025-08-18
Lists: 1 list mention
First commit: 2010-12-04
History: 5 history points
License: MIT
Issues: 17 open

175

stars

Forks: 33
Commits: 266 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

chfoo/warcat

Tool and library for handling Web ARChive (WARC) files.

Stack

Python pip

GitHub topics

#python

Updated: 2024-10-11
Lists: 1 list mention
First commit: 2013-04-09
History: 5 history points
License: GPL-3.0
Issues: 15 open

165

stars

Forks: 20
Commits: 78 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

helgeho/ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Stack

Scala

GitHub topics

#archivespark #internet-archive #spark #spark-framework #warc #web-archiving

Updated: 2025-10-08
Lists: 1 list mention
First commit: 2015-08-06
History: 5 history points
License: MIT
Issues: 5 open

161

stars

Forks: 19
Commits: 154 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stack

Scala Maven

GitHub topics

#analysis #apache-spark #big-data #big-data-analytics #dataframe #digital-humanities

Updated: 2025-12-05
Lists: 1 list mention
First commit: 2013-07-13
History: 5 history points
License: Apache-2.0
Issues: 5 open

158

stars

Forks: 33
Commits: 1,032 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

netarchivesuite/solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

Stack

Java Vite Vue Maven npm

Updated: 2026-07-08
Lists: 1 list mention
First commit: 2017-02-08
History: 5 history points
License: Apache-2.0
Issues: 63 open

145

stars

Forks: 28
Commits: 3,121 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

chatnoir-eu/chatnoir-resiliparse

A robust web archive analytics toolkit

Stack

Rust pytest Cargo CMake .NET SDK

GitHub topics

#bigdata #extraction #htmlparser #python #rust #warc

Updated: 2026-06-16
Lists: 1 list mention
First commit: 2021-06-04
History: 5 history points
License: Apache-2.0
Issues: 0 open

144

stars

Forks: 18
Commits: 1,545 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

ukwa/webarchive-discovery

Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.

Stack

Java Maven

Updated: 2025-11-21
Lists: 1 list mention
First commit: 2011-10-26
History: 5 history points
License: Unknown
Issues: 93 open

133

stars

Forks: 26
Commits: 1,615 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

N0taN3rd/wail

:whale2: One-Click User Instigated Preservation

Stack

JavaScript Electron React npm Yarn

GitHub topics

#browser-based-presrevation #electron #high-fidelity-preservation #warc #web-archiving

Updated: 2019-02-03
Lists: 1 list mention
First commit: 2013-03-20
History: 5 history points
License: GPL-3.0
Issues: 34 open

128

stars

Forks: 9
Commits: 677 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

N0taN3rd/node-warc

Parse And Create Web ARChive (WARC) files with node.js

Stack

JavaScript npm Yarn

GitHub topics

#chrome-remote-interface #pupeteer #warc #warc-files #web-archives #web-archiving

Updated: 2025-01-29
Lists: 1 list mention
First commit: 2017-05-21
History: 5 history points
License: MIT
Issues: 23 open

104

stars

Forks: 22
Commits: 116 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

datatogether/research

📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity

GitHub topics

#docs #research

Updated: 2018-09-27
Lists: 1 list mention
First commit: 2017-05-17
History: 5 history points
License: CC-BY-SA-4.0
Issues: 8 open

100

stars

Forks: 10
Commits: 91 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

CGamesPlay/chronicler

Offline-first web browser

Archived

Stack

JavaScript Electron Express React npm Yarn

GitHub topics

#browser #electron #warc

Updated: 2019-01-14
Lists: 1 list mention
First commit: 2018-12-17
History: 5 history points
License: MIT
Issues: 1 open

stars

Forks: 8
Commits: 50 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

oduwsdl/MemGator

A Memento Aggregator CLI and Server in Go

Stack

Go Go modules

GitHub topics

#memento #memento-rfc #timemap #web-archiving

Updated: 2026-04-09
Lists: 1 list mention
First commit: 2015-09-08
History: 5 history points
License: MIT
Issues: 48 open

stars

Forks: 13
Commits: 210 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

recrm/ArchiveTools

A collection of tools for archiving and analysing the internet.

Stack

Python Poetry

Updated: 2022-07-06
Lists: 1 list mention
First commit: 2015-01-14
History: 5 history points
License: GPL-3.0
Issues: 2 open

stars

Forks: 15
Commits: 30 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

justinlittman/fbarc

A commandline tool and Python library for archiving data from Facebook using the Graph API.

Archived

Stack

Python Flask pip

GitHub topics

#code4lib #facebook-graph-api

Updated: 2018-01-29
Lists: 1 list mention
First commit: 2017-02-22
History: 5 history points
License: CC0-1.0
Issues: 3 open

stars

Forks: 11
Commits: 68 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

commoncrawl/cc-notebooks

Various Jupyter notebooks about Common Crawl data

Stack

Jupyter Notebook

GitHub topics

#aws-athena #common-crawl #commoncrawl #jupyter-notebook #webarchiving #webgraph-framework

Updated: 2026-07-03
Lists: 1 list mention
First commit: 2019-07-19
History: 5 history points
License: Apache-2.0
Issues: 1 open

stars

Forks: 11
Commits: 27 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

turicas/crau

Easy-to-use Web archiver

Stack

Python pytest Scrapy pip

Updated: 2026-04-13
Lists: 1 list mention
First commit: 2019-10-26
History: 5 history points
License: LGPL-3.0
Issues: 11 open

stars

Forks: 10
Commits: 76 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

peterk/warcworker

A dockerized, queued high fidelity web archiver based on Squidwarc

Stack

Python pip

GitHub topics

#archiving #high-fidelity-preservation #preservation #webarchives #webarchiving

Updated: 2024-07-09
Lists: 1 list mention
First commit: 2018-07-21
History: 5 history points
License: GPL-3.0
Issues: 6 open

stars

Forks: 9
Commits: 34 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

iipc/jwarc

Java library for reading and writing WARC files with a typed API

Stack

Java Maven

Updated: 2026-06-27
Lists: 1 list mention
First commit: 2015-09-21
History: 5 history points
License: Apache-2.0
Issues: 20 open

stars

Forks: 18
Commits: 518 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

jedireza/warc

:gear: A Rust library for reading and writing WARC files

Stack

Rust Cargo

GitHub topics

#rust #rust-library #warc

Updated: 2024-11-27
Lists: 1 list mention
First commit: 2016-03-22
History: 5 history points
License: MIT
Issues: 10 open

stars

Forks: 19
Commits: 39 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

iipc/warc2html

Converts WARC files to static HTML

Stack

Java Maven

Updated: 2025-09-18
Lists: 1 list mention
First commit: 2021-11-08
History: 5 history points
License: Apache-2.0
Issues: 5 open

stars

Forks: 8
Commits: 9 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

machawk1/Mink

Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user access to the copy

Stack

JavaScript npm

GitHub topics

#chrome #extension #internet-archive #memento #memento-rfc

Updated: 2025-08-27
Lists: 1 list mention
First commit: 2014-01-17
History: 5 history points
License: MIT
Issues: 95 open

stars

Forks: 3
Commits: 597 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

webrecorder/har2warc

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

Stack

Python pytest pip

Updated: 2018-10-21
Lists: 1 list mention
First commit: 2017-03-16
History: 5 history points
License: Apache-2.0
Issues: 3 open

stars

Forks: 4
Commits: 18 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

wabarc/cairn

NPM package and CLI tool for saving web page as single HTML file

Stack

TypeScript npm Yarn

GitHub topics

#archive #base64 #cli #html #html-files #internet-archive

Updated: 2026-07-09
Lists: 1 list mention
First commit: 2020-10-09
History: 5 history points
License: MIT
Issues: 33 open

stars

Forks: 3
Commits: 101 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

archivesunleashed/warclight

A Rails engine supporting the discovery of web archives.

Archived

Stack

Ruby Ruby on Rails Bundler npm

GitHub topics

#blacklight #discovery #rails #rails-engine #ruby #solr

Updated: 2023-06-13
Lists: 1 list mention
First commit: 2017-08-03
History: 5 history points
License: NOASSERTION
Issues: 8 open

stars

Forks: 9
Commits: 301 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

ikreymer/webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Stack

Python pip

Updated: 2017-12-04
Lists: 1 list mention
First commit: 2015-02-26
History: 5 history points
License: MIT
Issues: 5 open

stars

Forks: 12
Commits: 47 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Search awesome repositories

Find repositories

Put your repository first

How it works

Pricing

How it works

Pricing