Datasets
End of Term Datasets
The End of Term project is working with the Amazon Web Services' Open Data Sponsorship Program to host a copy of the 2004, 2008, 2012, 2016, and 2020 End of Term Datasets.
The work of inventorying, staging and moving the data into AWS is still ongoing and more information will be provided here in the future.
Currently we have these datasets partially available for use.
Dataset | WARC # | WARC Size Compressed |
---|---|---|
EOT-2020 | 239811 | 266.04 TB |
EOT-2016 | 194683 | 139.3 TB |
EOT-2012 | 78509 | 41.42 TB |
EOT-2008 | 125704 | 15.32 TB |
EOT-2004 | 58977 | 6.42 TB |
End of Term Web Crawls Collection
Additionally, crawl data is available from the Internet Archive via the End of Term Web Crawls collection.