Statistics of Common Crawl's web archives released on a monthly base:
- size of the crawls - number of pages, unique URLs, hosts, domains, top-level domains (public suffixes), cumulative growth of crawled data over time
- top-level domains - distribution and comparison
- top-500 registered domains
- crawler-related metrics - fetch status, etc.
- overlaps between monthly crawls
- distribution of
All metrics presented here are generated from Common Crawl's URL index data using the code of the cc-crawl-statistics project. Inspired by Sebastian Spiegler's Statistics of the Common Crawl Corpus 2012.