Data & Privacy

Transparency about what we collect, store, and share.

What Data We Collect

MapTheNet records domain-level link relationships only. When our crawler visits a page, it extracts outbound links and records the relationship as a pair of domain names (e.g., example.com links to example.org). We also record:

  • The domain name and its top-level domain (TLD)
  • HTTP status code and response time
  • Server header (e.g., "nginx" or "Apache")
  • Country of domain registration (from public WHOIS data)
  • Heuristic category assignment (e.g., news, education, commerce)
  • Timestamp of the crawl

What We Do NOT Collect

MapTheNet does not collect any of the following:

  • Page content — we do not store the text, images, or media of any page
  • Individual URLs or paths — only the domain name is recorded, not the full URL
  • Personal data — no names, email addresses, IP addresses of visitors, or user accounts
  • Cookies or tracking data — our crawler does not execute JavaScript or accept cookies
  • Login-protected content — we never attempt authentication
  • Data behind paywalls — if a page requires payment, it is skipped

Respecting robots.txt

Our crawler identifies itself with the user-agent string MapTheNetBot/1.0. It checks each domain's /robots.txt file before crawling and fully complies with any Disallow directives that apply to our user-agent or to all crawlers.

If you wish to block our crawler, add the following to your robots.txt:

User-agent: MapTheNetBot
Disallow: /

Data Retention

Crawl data is retained indefinitely in aggregated form to enable historical analysis of web structure over time. Raw crawl logs (which contain timestamped link observations) are retained for 12 months and then deleted.

Opted-out domains are removed from all current and future public exports within 30 days of the opt-out request.

Legal Basis

MapTheNet processes only publicly available information (domain names and their link relationships) for purposes of academic research and public interest. No personal data is processed.

Our activities are analogous to those of search engine crawlers and internet measurement projects such as the Internet Archive and the Common Crawl.

Questions or Concerns?

If you have questions about our data practices, use the contact form. To request removal of your domain, visit the Opt-Out page.