Pre-generated static exports from our most recent crawl cycle.
All datasets are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You are free to use, share, and adapt the data for any purpose, provided you give appropriate credit to MapTheNet.org.
One row per crawled domain. Columns:
domain, tld, country,
category, http_status,
server, first_seen,
last_crawled.
One row per directed link between two domains. Columns:
source_domain, target_domain,
weight (number of distinct linking pages),
first_seen, last_seen.
Domain-to-category mapping. Columns:
domain, category_id,
category_name, confidence.
Categories are assigned via heuristic classification.
All files are gzip-compressed CSV with a UTF-8 header row. Fields are
comma-delimited and quoted where necessary. Dates use ISO 8601 format
(YYYY-MM-DD).
Exports are regenerated after each full crawl cycle, typically every two weeks. The file modification date reflects the most recent export.
For programmatic access to smaller slices of data, use our public API. Documentation is available at /api/.