ProvenanceBot
ProvenanceBot is the crawler that builds the public Provenance directory. It fetches publicly accessible trust, security, privacy, and AI-disclosure pages so buyers can search across vendors. If you operate a website and prefer not to be indexed, see the controls below — or skip the crawl entirely by adopting our public read API as the source instead.
Identification
User-Agent: ProvenanceBot/1.0 (+https://provenance.naburis.cloud/bot)
Scope
- Public trust / security / legal / AI-disclosure pages and linked PDFs
- Sitemap and feed URLs declared by the host
- Never private API endpoints, authenticated pages, or pages behind a paywall
- Never form submissions, login flows, or any state-mutating requests
Politeness
We respect robots.txt and rate-limit per host. To explicitly disallow ProvenanceBot, add either of these to your robots.txt:
User-agent: ProvenanceBot Disallow: / # Or set a per-request delay User-agent: ProvenanceBot Crawl-delay: 30
Sitewide block takes effect within one crawl cycle (typically < 24 hours). Per-host concurrent connections are capped at 1; default crawl-delay between requests to the same host is 5 seconds.
Source code
The crawler is closed-source today; an open-core release is on the roadmap. We do publish all observed events on each vendor's public profile — see our own change history as an example.
Contact
Crawl misbehavior, abuse reports, or removal requests: abuse@naburis.cloud. Last reviewed: 2026-05-03.