ProvenanceBot

ProvenanceBot is the crawler that builds the public Provenance directory. It fetches publicly accessible trust, security, privacy, and AI-disclosure pages so buyers can search across vendors. If you operate a website and prefer not to be indexed, see the controls below — or skip the crawl entirely by adopting our public read API as the source instead.

Identification

User-Agent: ProvenanceBot/1.0 (+https://provenance.naburis.cloud/bot)

Scope

Public trust / security / legal / AI-disclosure pages and linked PDFs
Sitemap and feed URLs declared by the host
Never private API endpoints, authenticated pages, or pages behind a paywall
Never form submissions, login flows, or any state-mutating requests

Politeness

We respect robots.txt and rate-limit per host. To explicitly disallow ProvenanceBot, add either of these to your robots.txt:

User-agent: ProvenanceBot
Disallow: /

# Or set a per-request delay
User-agent: ProvenanceBot
Crawl-delay: 30

Sitewide block takes effect within one crawl cycle (typically < 24 hours). Per-host concurrent connections are capped at 1; default crawl-delay between requests to the same host is 5 seconds.

Source code

The crawler is closed-source today; an open-core release is on the roadmap. We do publish all observed events on each vendor's public profile — see our own change history as an example.

Contact

Crawl misbehavior, abuse reports, or removal requests: abuse@naburis.cloud. Last reviewed: 2026-05-03.