← All awesome lists Search these repos

Awesome List

Awesome Web Archiving

An Awesome List for getting started with web archiving

iipc/awesome-web-archiving #awesome #awesome-list #webarchiving

Open GitHub

List stars: 2,605
README repos: 105
Indexed repos: 103
List commits: 162
Forks: 196
Open issues: 11

Tracked list growth

GitHub stars and default-branch commits for iipc/awesome-web-archiving.

Latest scan 2026-07-17 10:49

Likes history

GitHub stars

Commits history

Default branch commits

Indexed repositories

103 repos currently saved from this list.

No filters applied

Latest repo push 2026-07-15

Browse

Filter this list

Search within Awesome Web Archiving or narrow by ecosystem and project health.

Search repositories

Search mode

Keyword Semantic

Tune results

The controls most people need first.

Language

Freshness

Sort

Direction

More filters Topics, generated tags, stack, files, age, archive status, and growth.

Ecosystem

GitHub topic

Generated tag

Framework or stack

Package manager

Files

Has file

Choose a suggestion or use commas to require multiple files.

Health

Minimum stars

Repository age

Uses known first-commit dates.

Archive status

AI development signals

Momentum

Unmaintained for

Commit velocity

Star growth

Reset filters

Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

flameshot-org/flameshot

Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:

Stack

C++ Qt CMake

GitHub topics

#capture #cross-platform #free-software #gnu-linux #gui #hacktoberfest

Updated: 2026-07-04
Lists: 7 list mentions
First commit: 2017-05-10
License: GPL-3.0
Issues: 696 open

30,311

stars

Forks: 1,963
Commits: 2,320 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

ArchiveBox/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

AI dev

Stack

Python npm PEP 517 pip

GitHub topics

#archivebox #backups #bookmark-archiver #browser-bookmarks #chromium #digipres

Updated: 2026-07-15
Lists: 4 list mentions
First commit: 2017-05-05
License: MIT
Issues: 174 open

27,940

stars

Forks: 1,553
Commits: 5,730 commits
Star growth, last 7 days: +49 +0.2%
Commit velocity, last 7 days: +2 0.0%

Website GitHub

gildas-lormeau/SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file

Stack

JavaScript Express npm

GitHub topics

#annotations #archive #archiver #auto-save #browser #chrome

Updated: 2026-02-24
Lists: 1 list mention
First commit: 2010-09-12
License: AGPL-3.0
Issues: 151 open

21,792

stars

Forks: 1,371
Commits: 8,169 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

dgraph-io/badger

Fast key-value DB in Go.

Stack

Go Cobra Go modules

GitHub topics

#database #document-database #go #golang #key-value #library

Updated: 2026-07-11
Lists: 2 list mentions
First commit: 2017-01-26
License: Apache-2.0
Issues: 65 open

15,701

stars

Forks: 1,297
Commits: 1,544 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

Y2Z/monolith

⬛️ CLI tool and library for saving complete web pages as a single HTML file

Stack

Rust Cargo

GitHub topics

#come-and-take-it #e-hoarding #its-mine #make-the-internet-great-again #no-more-404 #procrastination

Updated: 2026-05-25
Lists: 2 list mentions
First commit: 2017-02-20
License: CC0-1.0
Issues: 68 open

15,345

stars

Forks: 466
Commits: 678 commits
Star growth, last 7 days: +14 +0.1%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

cyrus-and/chrome-remote-interface

Chrome Debugging Protocol interface for Node.js

Stack

JavaScript npm

GitHub topics

#browser #chrome-debugging-protocol #firefox #google-chrome #headless #javascript

Updated: 2026-02-09
Lists: 1 list mention
First commit: 2013-04-17
License: MIT
Issues: 12 open

4,546

stars

Forks: 324
Commits: 620 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

DO-SAY-GO/dn

💾 dn - offline full-text search and archiving for your Chromium-based browser.

Stack

JavaScript Express npm

GitHub topics

#archive #archiver #disk #diskernet #dn #download-net

Updated: 2026-03-28
Lists: 1 list mention
First commit: 2023-01-14
License: Unknown
Issues: 27 open

3,902

stars

Forks: 147
Commits: 237 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

jordansissel/xdotool

fake keyboard/mouse input, window management, and more

Stack

Updated: 2026-06-30
Lists: 1 list mention
First commit: 2007-06-22
License: BSD-3-Clause
Issues: 323 open

3,826

stars

Forks: 343
Commits: 686 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

internetarchive/heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Stack

Java Maven pip

GitHub topics

#heritrix #java #warc #webcrawling

Updated: 2026-07-10
Lists: 1 list mention
First commit: 2009-05-11
License: NOASSERTION
Issues: 37 open

3,271

stars

Forks: 792
Commits: 2,985 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

wabarc/wayback

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, Ghostarchive, IPFS, Telegraph, and file systems.

Stack

Go Cobra Go modules pip

GitHub topics

#archive #har #heroku #internet-archive #ipfs #irc

Updated: 2026-07-12
Lists: 3 list mentions
First commit: 2020-06-13
License: GPL-3.0
Issues: 60 open

2,213

stars

Forks: 86
Commits: 538 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

jjjake/internetarchive

A Python and Command-Line Interface to Archive.org

AI dev

Stack

Python pytest PEP 517 pip

Updated: 2026-07-06
Lists: 1 list mention
First commit: 2012-08-15
License: AGPL-3.0
Issues: 101 open

1,882

stars

Forks: 249
Commits: 2,331 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

webrecorder/pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

Stack

JavaScript Vue npm pip Yarn

GitHub topics

#python #pywb #wayback #web-archives #web-archiving

Updated: 2026-04-10
Lists: 1 list mention
First commit: 2013-12-09
License: GPL-3.0
Issues: 182 open

1,677

stars

Forks: 238
Commits: 2,352 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stack

Python pip

GitHub topics

#archiving #crawl #crawler #spider #warc

Updated: 2025-05-23
Lists: 1 list mention
First commit: 2015-02-05
License: NOASSERTION
Issues: 103 open

1,598

stars

Forks: 157
Commits: 1,174 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

DocNow/twarc

A command line tool (and Python library) for archiving Twitter JSON

Stack

Python pytest pip uv

Updated: 2025-10-31
Lists: 1 list mention
First commit: 2013-01-14
License: MIT
Issues: 48 open

1,393

stars

Forks: 254
Commits: 1,804 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

bellingcat/auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

Stack

Python FastAPI pytest React Starlette npm PEP 517 Poetry

GitHub topics

#archive #docker #open-source-research #python #scraping #service

Updated: 2026-07-10
Lists: 1 list mention
First commit: 2021-01-15
License: MIT
Issues: 16 open

1,095

stars

Forks: 107
Commits: 1,546 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

Stack

TypeScript npm pip Yarn

GitHub topics

#crawler #crawling #wacz #warc #web-archiving #web-crawler

Updated: 2026-07-11
Lists: 1 list mention
First commit: 2020-10-31
License: AGPL-3.0
Issues: 137 open

1,079

stars

Forks: 147
Commits: 696 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

WikiTeam/wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more than 600,000 wikis.

Stack

Python pip

GitHub topics

#archive-wikis #backup #digital-preservation #dump #export #mediawiki

Updated: 2026-01-10
Lists: 1 list mention
First commit: 2011-04-05
License: GPL-3.0
Issues: 172 open

853

stars

Forks: 175
Commits: 1,143 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

internetarchive/brozzler

brozzler - distributed browser-based web crawler

Stack

Python Flask pytest pip uv

Updated: 2026-07-07
Lists: 1 list mention
First commit: 2014-01-21
License: Apache-2.0
Issues: 60 open

807

stars

Forks: 115
Commits: 1,789 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

oduwsdl/ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

Stack

Python Flask pytest npm pip

GitHub topics

#docker #ipfs #memento #memento-rfc #python #service-worker

Updated: 2026-06-15
Lists: 1 list mention
First commit: 2016-03-04
License: MIT
Issues: 160 open

654

stars

Forks: 41
Commits: 1,622 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

ArchiveTeam/wpull

Wget-compatible web downloader and crawler.

Stack

HTML Tornado pip

Updated: 2024-04-29
Lists: 1 list mention
First commit: 2013-12-07
License: GPL-3.0
Issues: 205 open

611

stars

Forks: 84
Commits: 2,047 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

akamhy/waybackpy

Wayback Machine API interface & a command-line tool

Stack

Python pytest PEP 517 pip

GitHub topics

#archive-webpage #archive-webpages #cdx-api #internet-archive #internet-archiving #osint

Updated: 2024-02-26
Lists: 2 list mentions
First commit: 2020-05-02
License: MIT
Issues: 21 open

594

stars

Forks: 40
Commits: 497 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

iipc/openwayback

The OpenWayback Development

Stack

Java Maven

Updated: 2024-01-03
Lists: 1 list mention
First commit: 2005-10-18
License: Apache-2.0
Issues: 105 open

521

stars

Forks: 313
Commits: 3,401 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

webrecorder/warcio

Streaming WARC/ARC library for fast web archive IO

Stack

Python Flask pytest pip

GitHub topics

#python #pywb #warc #web-archives #web-archiving

Updated: 2026-06-10
Lists: 1 list mention
First commit: 2017-03-02
License: Apache-2.0
Issues: 62 open

459

stars

Forks: 70
Commits: 161 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

internetarchive/warcprox

WARC writing MITM HTTP/S proxy

Stack

Python Flask pytest pip uv

Updated: 2026-06-17
Lists: 1 list mention
First commit: 2012-07-18
License: Unknown
Issues: 28 open

456

stars

Forks: 66
Commits: 1,116 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

webrecorder/browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

Stack

TypeScript Express FastAPI Next.js pytest npm Pipenv uv

GitHub topics

#archiving #cloud #kubernetes #wacz #warc #web-archive

Updated: 2026-07-10
Lists: 1 list mention
First commit: 2021-06-28
License: AGPL-3.0
Issues: 303 open

437

stars

Forks: 70
Commits: 2,088 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

oduwsdl/archivenow

A Tool To Push Web Resources Into Web Archives

Stack

Python Flask pip

GitHub topics

#internet-archive #web-archiving

Updated: 2024-01-23
Lists: 1 list mention
First commit: 2017-02-09
License: MIT
Issues: 17 open

434

stars

Forks: 40
Commits: 186 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Florents-Tselai/WarcDB

WarcDB: Web crawl data as SQLite databases.

Stack

Python pytest pip Poetry

GitHub topics

#cli #crawling #database #sqlite #warc #web-archiving

Updated: 2024-07-13
Lists: 1 list mention
First commit: 2022-05-29
License: Apache-2.0
Issues: 9 open

406

stars

Forks: 10
Commits: 73 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

machawk1/wail

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation

Stack

Roff pip

GitHub topics

#gui #heritrix #openwayback #pyinstaller #python #warc

Updated: 2026-06-19
Lists: 1 list mention
First commit: 2013-03-20
License: MIT
Issues: 184 open

398

stars

Forks: 38
Commits: 871 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

medialab/hyphe

Websites crawler with built-in exploration and control web interface

Stack

JavaScript Scrapy npm pip

Updated: 2026-05-18
Lists: 1 list mention
First commit: 2010-12-22
License: AGPL-3.0
Issues: 56 open

384

stars

Forks: 62
Commits: 3,601 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

leonkt/zotero-memento

Zotero extension that combats link rot by archiving webpages and journal articles.

Stack

JavaScript

Updated: 2022-06-08
Lists: 1 list mention
First commit: 2019-08-29
License: MIT
Issues: 12 open

357

stars

Forks: 20
Commits: 64 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

go-shiori/obelisk

Go package and CLI tool for saving web page as single HTML file

Stack

Go Cobra Go modules

GitHub topics

#archive #cli #go #golang #hacktoberfest

Updated: 2026-02-01
Lists: 1 list mention
First commit: 2020-03-29
License: MIT
Issues: 10 open

318

stars

Forks: 25
Commits: 94 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

WebMemex/freeze-dry

Snapshots a web page to get it as a static, self-contained HTML document.

Stack

TypeScript Vite npm

Updated: 2022-09-18
Lists: 1 list mention
First commit: 2017-07-13
License: Unlicense
Issues: 21 open

302

stars

Forks: 20
Commits: 269 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

harvard-lil/scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

Stack

JavaScript Express Flask npm pip Poetry

Updated: 2025-09-03
Lists: 1 list mention
First commit: 2022-09-20
License: MIT
Issues: 15 open

204

stars

Forks: 12
Commits: 1,255 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

karust/gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

Stack

Go Cobra Go modules

GitHub topics

#commoncrawl #concurrency #crawler #golang #wayback-machine #webarchive

Updated: 2024-11-04
Lists: 1 list mention
First commit: 2019-06-14
License: MIT
Issues: 0 open

183

stars

Forks: 17
Commits: 31 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stack

JavaScript npm Yarn

GitHub topics

#browser-automation #chrome #chrome-headless #crawler #crawling #headless-chrome

Updated: 2020-05-19
Lists: 1 list mention
First commit: 2017-07-20
License: Apache-2.0
Issues: 11 open

178

stars

Forks: 25
Commits: 119 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

internetarchive/warctools

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Stack

Python uv

Updated: 2025-08-18
Lists: 1 list mention
First commit: 2010-12-04
License: MIT
Issues: 17 open

175

stars

Forks: 33
Commits: 266 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

chfoo/warcat

Tool and library for handling Web ARChive (WARC) files.

Stack

Python pip

GitHub topics

#python

Updated: 2024-10-11
Lists: 1 list mention
First commit: 2013-04-09
License: GPL-3.0
Issues: 15 open

165

stars

Forks: 20
Commits: 78 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

helgeho/ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Stack

Scala

GitHub topics

#archivespark #internet-archive #spark #spark-framework #warc #web-archiving

Updated: 2025-10-08
Lists: 1 list mention
First commit: 2015-08-06
License: MIT
Issues: 5 open

161

stars

Forks: 19
Commits: 154 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stack

Scala Maven

GitHub topics

#analysis #apache-spark #big-data #big-data-analytics #dataframe #digital-humanities

Updated: 2025-12-05
Lists: 1 list mention
First commit: 2013-07-13
License: Apache-2.0
Issues: 5 open

158

stars

Forks: 33
Commits: 1,032 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

netarchivesuite/solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

Stack

Java Vite Vue Maven npm

Updated: 2026-07-08
Lists: 1 list mention
First commit: 2017-02-08
License: Apache-2.0
Issues: 63 open

145

stars

Forks: 28
Commits: 3,121 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

chatnoir-eu/chatnoir-resiliparse

A robust web archive analytics toolkit

Stack

Rust pytest Cargo CMake .NET SDK

GitHub topics

#bigdata #extraction #htmlparser #python #rust #warc

Updated: 2026-06-16
Lists: 1 list mention
First commit: 2021-06-04
License: Apache-2.0
Issues: 0 open

144

stars

Forks: 18
Commits: 1,545 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

ukwa/webarchive-discovery

Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.

Stack

Java Maven

Updated: 2025-11-21
Lists: 1 list mention
First commit: 2011-10-26
License: Unknown
Issues: 93 open

133

stars

Forks: 26
Commits: 1,615 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

N0taN3rd/wail

:whale2: One-Click User Instigated Preservation

Stack

JavaScript Electron React npm Yarn

GitHub topics

#browser-based-presrevation #electron #high-fidelity-preservation #warc #web-archiving

Updated: 2019-02-03
Lists: 1 list mention
First commit: 2013-03-20
License: GPL-3.0
Issues: 34 open

128

stars

Forks: 9
Commits: 677 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

N0taN3rd/node-warc

Parse And Create Web ARChive (WARC) files with node.js

Stack

JavaScript npm Yarn

GitHub topics

#chrome-remote-interface #pupeteer #warc #warc-files #web-archives #web-archiving

Updated: 2025-01-29
Lists: 1 list mention
First commit: 2017-05-21
License: MIT
Issues: 23 open

104

stars

Forks: 22
Commits: 116 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

datatogether/research

📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity

GitHub topics

#docs #research

Updated: 2018-09-27
Lists: 1 list mention
First commit: 2017-05-17
License: CC-BY-SA-4.0
Issues: 8 open

100

stars

Forks: 10
Commits: 91 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

CGamesPlay/chronicler

Offline-first web browser

Archived

Stack

JavaScript Electron Express React npm Yarn

GitHub topics

#browser #electron #warc

Updated: 2019-01-14
Lists: 1 list mention
First commit: 2018-12-17
License: MIT
Issues: 1 open

stars

Forks: 8
Commits: 50 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

oduwsdl/MemGator

A Memento Aggregator CLI and Server in Go

Stack

Go Go modules

GitHub topics

#memento #memento-rfc #timemap #web-archiving

Updated: 2026-04-09
Lists: 1 list mention
First commit: 2015-09-08
License: MIT
Issues: 48 open

stars

Forks: 13
Commits: 210 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

recrm/ArchiveTools

A collection of tools for archiving and analysing the internet.

Stack

Python Poetry

Updated: 2022-07-06
Lists: 1 list mention
First commit: 2015-01-14
License: GPL-3.0
Issues: 2 open

stars

Forks: 15
Commits: 30 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

justinlittman/fbarc

A commandline tool and Python library for archiving data from Facebook using the Graph API.

Archived

Stack

Python Flask pip

GitHub topics

#code4lib #facebook-graph-api

Updated: 2018-01-29
Lists: 1 list mention
First commit: 2017-02-22
License: CC0-1.0
Issues: 3 open

stars

Forks: 11
Commits: 68 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

commoncrawl/cc-notebooks

Various Jupyter notebooks about Common Crawl data

Stack

Jupyter Notebook

GitHub topics

#aws-athena #common-crawl #commoncrawl #jupyter-notebook #webarchiving #webgraph-framework

Updated: 2026-07-03
Lists: 1 list mention
First commit: 2019-07-19
License: Apache-2.0
Issues: 1 open

stars

Forks: 11
Commits: 27 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Activity

Default branch: main
Last pushed: 2026-04-27
GitHub updated: 2026-07-14
Created: 2017-06-16
First commit: -
Last scanned: 2026-07-17 10:49
Watchers: 92

Indexed repo mix

Repo stars: 149,279
Repo forks: 11,246
Active: 97
Archived: 6

Languages

Python (31) Java (17) JavaScript (16) Go (9) Scala (6) Rust (5) TypeScript (4) Jupyter Notebook (3) C (2) C++ (2) HTML (2) Roff (1)

Awesome Web Archiving

Tracked list growth

Likes history

Commits history

Indexed repositories

Filter this list

Put your repository first

How it works

Pricing