End of Term Datasets

The End of Term project is working with the Amazon Web Services' Open Data Sponsorship Program to host a copy of the 2004, 2008, 2012, 2016, 2020 and 2024 End of Term Datasets.

The work of inventorying, staging and moving the data into AWS is still ongoing and more information will be provided here in the future.

Currently we have these datasets available for use.

Dataset WARC # WARC Size
Compressed
EOT-2024 1,216,891 2.29 PB
EOT-2020 239,811 266.04 TB
EOT-2016 194,683 139.3 TB
EOT-2012 78,509 41.42 TB
EOT-2008 125,704 15.32 TB
EOT-2004 58,977 6.42 TB

End of Term Web Crawls Collection

Additionally, crawl data is available from the Internet Archive via the End of Term Web Crawls collection.