ProvenanceBot

    ProvenanceBot is the crawler that builds the public Provenance directory. It fetches publicly accessible trust, security, privacy, and AI-disclosure pages so buyers can search across vendors. If you operate a website and prefer not to be indexed, see the controls below — or skip the crawl entirely by adopting our public read API as the source instead.

    Identification

    User-Agent: ProvenanceBot/1.0 (+https://provenance.naburis.cloud/bot)

    Scope

    • Public trust / security / legal / AI-disclosure pages and linked PDFs
    • Sitemap and feed URLs declared by the host
    • Never private API endpoints, authenticated pages, or pages behind a paywall
    • Never form submissions, login flows, or any state-mutating requests

    Politeness

    We respect robots.txt and rate-limit per host. To explicitly disallow ProvenanceBot, add either of these to your robots.txt:

    User-agent: ProvenanceBot
    Disallow: /
    
    # Or set a per-request delay
    User-agent: ProvenanceBot
    Crawl-delay: 30

    Sitewide block takes effect within one crawl cycle (typically < 24 hours). Per-host concurrent connections are capped at 1; default crawl-delay between requests to the same host is 5 seconds.

    Source code

    The crawler is closed-source today; an open-core release is on the roadmap. We do publish all observed events on each vendor's public profile — see our own change history as an example.

    Contact

    Crawl misbehavior, abuse reports, or removal requests: abuse@naburis.cloud. Last reviewed: 2026-05-03.