End of Term 2024 Dataset
The End of Term 2024 Dataset represents data collected by six collecting institutions from September 2024 through April 2025. These institutions were ArchiveTeam (AT), Common Crawl Foundation (CC), The Library Innovation Lab at Harvard Law School (HIL), Internet Archive (IA), University of North Texas Libraries (UNT), and Webrecorder (WR). The data is part of the initiative called the End of Term Presidential Web Archive.
Archive Location and Download
The 2024 End of Term archive is located on the eotarchive bucket at EOT-2024.
To assist with exploring and using the dataset, we provide gzipped files which list all segments, WARC, WAT, WET, META, and CDX files.
By adding either s3://eotarchive/ or https://eotarchive.s3.amazonaws.com/ to each line, you end up with the s3 and HTTP paths respectively.
| File | List | #Files | Total Size Compressed |
|---|---|---|---|
| Segments | EOT-2024/segment.paths.gz | 128 | |
| WARC files | EOT-2024/warc.paths.gz | 1,216,891 | 2.29 PB |
| WAT files | EOT-2024/wat.paths.gz | 1,216,891 | 18.22 TB |
| WET files | EOT-2024/wet.paths.gz | 1,216,891 | 7.27 TB |
| META files | EOT-2024/meta.paths.gz | 1,216,891 | 1.73 TB |
| CDX files | EOT-2024/cdx.paths.gz | 1,216,891 | 285.71 GB |
| STATS files | EOT-2024/stats.paths.gz | 1,216,891 | 426.45 MB |
| URL Index files | EOT-2024/eot-index.paths.gz | 63 | 227 GB |
License
There are no restrictions on the use, access, and/or download of data from the End of Term Web Archive Dataset. We request that you cite the End of Term Web Archive project when using the data provided from this dataset.
How to Cite
@misc{EOTDataset2024,
title = {End of Term 2024 Dataset},
author = {Alam, Sawood and Phillips, Mark Edward},
year = 2026,
publisher = {End of Term Web Archive},
url = {https://eotarchive.org/data/data-2024/}
}