Partners

The construction of the End of Term Web Archive is a collaborative project, drawing on the skills and resources of each partner institution.

Current Partners (EOT 2024)

Archive Team

Archive Team is a loose collective of rogue archivists, programmers, and writers dedicated to saving our digital heritage. Since 2009, this volunteer driven effort has mobilized quickly to rescue at risk websites from shutdowns and deletions. In 2024, Archive Team joined the End of Term Web Archive for the first time, contributing more data to the 2024 crawl than any other partner.

Common Crawl Foundation (CCF)

Joining as a partner in 2024, the Common Crawl Foundation is a 501(c)(3) non-profit with an open web dataset and archive dating back to 2007.

Environmental Data & Governance Initiative (EDGI)

Contributing significantly to the End of Term project in 2016 and signing on as a partner for 2020 and 2024, the Environmental Data & Governance Initiative (EDGI) analyzes federal environmental data, websites, institutions, and policy seeking to improve environmental data stewardship and to promote environmental health and environmental justice.

infoDOCKET / Gary Price

Library Journal’s infoDOCKET is an endeavor by Gary Price to provide information industry news with a focus on libraries and academic research. The platform serves a range of library types including public, academic, government, and national through news updates, reports, and publications. It aims to educate and inform librarians, researchers, and interested readers about trends, challenges, and developments in the library and information sectors.

Internet Archive (IA)

The Internet Archive (IA) is a 502(c)(3) non-profit that was founded to build an internet library offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Founded in 1996 and located in San Francisco, IA’s collections include texts, audio, moving images, television, and software as well as archived web pages. IA participates in the NDIIPP program and is a founding member of the National Digital Stewardship Alliance.

Library Innovation Lab (LIL) at Harvard Law School

The Library Innovation Lab is a department of the Harvard Law School Library. The lab is home to projects such as Perma.cc, a tool for creating reliable web citations used by courts and law journals; H2O Open Casebook, a platform for professors to create free, remixable textbooks; and the Caselaw Access Project, a complete machine-readable archive of American caselaw digitized from the library’s collection. LIL is growing knowledge and community by bringing library principles to technological frontiers.

Stanford University Libraries (SUL)

Joining the End of Term project in 2016, Stanford University Libraries has been archiving the Web for over 15 years, and are active members of the International Internet Preservation Consortium (IIPC), National Digital Stewardship Alliance (NDSA), and other efforts like the End of Term Web Archive. Stanford librarians collect the institutional legacy and research and learning output of the university as well as at-risk sites like Iranian blogs, FOIA sites, Congressional Research Service (CRS) reports, fugitive US agencies, bay area governments and the CA .gov domain. Stanford’s Web archiving program is a growing complement to the library’s robust collection building activities. The library stores web archives locally in the Stanford Digital Repository, provides discovery through its catalog, and enables browsing through a local instance of the Wayback web archive replay platform.

University of North Texas Libraries (UNT)

The University of North Texas Libraries began web archiving in 1997, when, as part of the Federal Depository Library Program, they created the CyberCemetery to capture and provide permanent public access to the web sites and publications of defunct U.S. government agencies and commissions. UNT also participates in the NDIIPP program and is a founding member of the National Digital Stewardship Alliance. More information about the UNT Libraries web archiving activities can be found at the following link: About Web Archiving at UNT.

Webrecorder

Joining as a partner in 2024, Webrecorder is focusing on capturing interactive and otherwise difficult to archive websites using Browsertrix, their high-fidelity web archiving service. Originally a project within Rhizome — the New York-based arts organization dedicated to the history and preservation of born-digital artwork — Webrecorder has since spun out to create a portfolio of open source web archiving software including the aforementioned Browsertrix, the ArchiveWeb.page browser extension, and the ReplayWeb.page web archive viewer.

Previous Partners

California Digital Library (CDL)

Partner in the End of Term Web Archive 2008-2016, the California Digital Library provides digital library services and tools to the University of California and the digital library community at large. CDL developed the open source eXtensible Text Framework (XTF) behind this archive.

George Washington University (GWU)

A project partner in 2016, the George Washington University Libraries has been building and using software tools to support researchers collecting social media data since 2012. With the support of grants from IMLS, NHPRC, and CEAL, GW Libraries developed the open-source Social Feed Manager (SFM) with capabilities to collect from Twitter, Flickr, Tumblr, and Sina Weibo. In addition to using SFM to support academic research in a wide array of disciplines and to support teaching and learning, GW Libraries builds and publicly shares social media data sets for reuse.

Library of Congress (LOC)

A project partner in the End of Term Web Archive 2008-mid 2024, the Library of Congress has been archiving the web since 2000, with collections focusing on sites of Legislative Branch agencies, U.S. House and Senate offices and committees, select Executive Branch agencies, and U.S. national election campaigns, among other thematic collections. More information about the Library’s Web Archiving Program is available at the Library of Congress Web Archives page and public access is available at https://www.loc.gov/web-archives/collections/.

U.S. Government Publishing Office (GPO)

The U.S. Government Publishing Office manages the Federal Depository Library Program and is charged with providing permanent public access to government publications. Access to the Federal Depository Library Program Web Archive is located here.

U.S. National Archives and Records Administration (NARA)

The U.S. National Archives and Records Administration (NARA) preserves and provides public access to high-value government records to promote openness, cultivate public participation, and strengthen our nation’s democracy. Starting in 2006, NARA began capturing and archiving Congressional websites at the end of every Congress. The Center for Legislative Archives provides public access at webharvest.gov. Starting with the Clinton administration, NARA began preserving Presidential websites after the end of each Administration.