End of Term Datasets

The End of Term project is working with the Amazon Web Services' Open Data Sponsorship Program to host a copy of the 2004, 2008, 2012, 2016, 2020 and 2024 End of Term Datasets.

The work of inventorying, staging and moving the data into AWS is still ongoing and more information will be provided here in the future.

Currently we have these datasets partially available for use.

Dataset WARC # WARC Size
Compressed
EOT-2024 (Upload in process) 812512 1492.8 TB
EOT-2020 239811 266.04 TB
EOT-2016 194683 139.3 TB
EOT-2012 78509 41.42 TB
EOT-2008 125704 15.32 TB
EOT-2004 58977 6.42 TB

End of Term Web Crawls Collection

Additionally, crawl data is available from the Internet Archive via the End of Term Web Crawls collection.