Intelligence Confidence Levels
Intelligence Confidence Audit Engine (ICAE)
Operational confidence posture for every DNS intelligence protocol. ICAE assessments are derived from deterministic pass thresholds plus time-in-service, with no manual overrides or self-grading. Every result is cryptographically hashed and retained — a tamper-evident, historically verifiable audit trail. View the accountability log.
Intelligence Confidence Matrix
Verified 9/9Hash Integrity Audit
Every analysis result is SHA-3-512 (Keccak, NIST FIPS 202) hashed at creation using a canonical, deterministic serialization of all protocol findings. This audit recomputes hashes from stored results and compares against the retained posture hash to verify no data has been altered post-analysis.
Audit window: 100 of 6223 hashed analyses (most recent 100, ordered by creation date)
All 100 of 100 audited results verified — no posture hash mismatches detected. Tamper-evident audit trail intact.
Calibration Validation
Empirical accuracy of the confidence scoring system, measured by running 129 golden test cases across 5 resolver agreement scenarios (645 total predictions). This answers: “when we state a confidence level, how often are we actually correct?”
Reliability Diagram
Each bin shows how well stated confidence matches observed accuracy. A perfectly calibrated system would have predicted = observed in every bin.
| Confidence Bin | Predictions | Predicted | Observed | Gap | Distribution |
|---|---|---|---|---|---|
80–90% |
14 | 88.0% | 100.0% | 0.1200 |
|
90–100% |
631 | 97.1% | 100.0% | 0.0290 |
|
Per-Protocol Calibration
Calibration gap by protocol — how far each protocol’s stated confidence deviates from observed accuracy. Sorted from best to worst calibrated.
| Protocol | Cases | Mean Confidence | Pass Rate | Brier | Gap | Rating |
|---|---|---|---|---|---|---|
| DMARC | 120 | 98.8% | 100% | 0.0002 |
0.0120 |
excellent |
| CAA | 50 | 98.0% | 100% | 0.0006 |
0.0200 |
excellent |
| SPF | 100 | 98.0% | 100% | 0.0006 |
0.0200 |
good |
| TLS-RPT | 25 | 97.2% | 100% | 0.0012 |
0.0280 |
good |
| DNSSEC | 125 | 96.8% | 100% | 0.0015 |
0.0320 |
good |
| DKIM | 40 | 96.0% | 100% | 0.0024 |
0.0400 |
good |
| MTA-STS | 60 | 96.0% | 100% | 0.0024 |
0.0400 |
good |
| BIMI | 55 | 95.2% | 100% | 0.0035 |
0.0480 |
good |
| DANE/TLSA | 70 | 94.0% | 100% | 0.0054 |
0.0600 |
adequate |
rawConfidence=1.0 (engine predicts “correct”) with 5 resolver agreement levels (5/5 through 1/5) per test case. No label leakage — ground-truth outcomes are never used as prediction inputs. The shrinkage estimator w·Craw + (1-w)·α/(α+β) is what produces the varying confidence levels as measurement quality degrades.
Confidence Degradation Log
No degradation events recorded. All protocols have maintained continuous passing status since first evaluation.
Protocol Test Dossier
Each protocol is audited against deterministic test cases grounded in specific RFC sections. Below is what ICAE tests for each protocol — the signals, the standards, and the methodology.
SPF — 20 total (19 analysis + 1 collection) test cases
Validates qualifier classification (~all, -all, +all, ?all), DNS lookup counting (RFC 7208 §4.6.4), 10-lookup limit enforcement, no-mail intent detection, multiple-record error handling (§3.2), record classification (valid vs. spf-like), and cross-protocol RFC 7489 §10.1 premature rejection warnings.
Methodology: Deterministic input → expected output. No live DNS. Pure logic validation.
DMARC — 24 total (21 analysis + 3 collection) test cases
Validates policy enforcement logic (reject/quarantine/none per RFC 7489 §6.3), partial percentage coverage, SPF-only exposure, DMARC-without-SPF gaps, null MX no-mail domains (RFC 7505), and structured color/severity mapping for spoofability assessment.
Methodology: Combinatorial policy matrix → expected verdict + severity color.
DKIM — 8 total (7 analysis + 1 collection) test cases
Validates RSA key strength classification (1024-bit weak, 2048-bit adequate per RFC 8301), Ed25519 key type parsing (RFC 8463), revoked key detection (empty p= per RFC 6376 §3.6.1), test mode flag detection (t=y), and provider fingerprinting (Google, Microsoft 365).
Methodology: Synthetic DKIM records → expected key analysis + provider classification.
DNSSEC — 25 total (18 analysis + 7 collection) test cases
Validates chain-of-trust verdicts (signed/unsigned/broken per RFC 4033 §2), tampering exposure assessment, DS digest classification (SHA-256 per RFC 8624 §3.3), enterprise DNS provider detection (Cloudflare, AWS Route 53, Google Cloud DNS, Azure, Akamai, NS1 per RFC 1035), and inheritance chain classification.
Methodology: Simulated NS/DS inputs → expected verdict + provider fingerprint.
DANE/TLSA — 14 total (10 analysis + 4 collection) test cases
Validates TLSA usage type parsing (DANE-EE usage 3 per RFC 7672 §3.1), deprecated usage recommendations (usage 0 triggers RFC 7672 advisory), MX host extraction (RFC 5321 §5), full-coverage verdict logic, and no-TLSA informational classification.
Methodology: Synthetic TLSA + MX inputs → expected verdict + coverage status.
MTA-STS — 12 total (9 analysis + 3 collection) test cases
Validates mode enforcement logic (enforce=success, testing=warning per RFC 8461 §5), policy line parsing (version, mode, max_age, mx per §3.2), STS record filtering (§3.1), and policy ID extraction.
Methodology: Synthetic DNS records + policy bodies → expected parsed fields.
CAA — 10 total (6 analysis + 4 collection) test cases
Validates CA issuer identification (Let’s Encrypt, DigiCert per RFC 8659 §4), issuewild detection (§4.3), iodef record detection (§4.4), and human-readable message construction.
Methodology: Synthetic CAA records → expected parsed issuers + flags.
BIMI — 11 total (9 analysis + 2 collection) test cases
Validates record filtering (v=BIMI1 per RFC 9495 §3), logo URL extraction, VMC (Verified Mark Certificate) URL extraction, and absent-VMC null handling.
Methodology: Synthetic BIMI records → expected parsed URLs + null checks.
TLS-RPT — 5 total (3 analysis + 2 collection) test cases
Validates TLS-RPT URI extraction from rua fields (RFC 8460 §3), plus cross-protocol cryptographic strength classification: DKIM key strength (2048-bit RSA adequate per RFC 8301, Ed25519 strong) and DS digest type classification (SHA-256 per RFC 8624 §3.3).
Methodology: Synthetic TLS-RPT records → expected URI extraction; known algorithm + key size inputs → expected strength label.
Maturity Levels
Modeled after the Capability Maturity Model (CMM) developed at Carnegie Mellon’s Software Engineering Institute. Five tiers of sustained correctness — earned through deterministic test runs, never self-assigned.
The dual-threshold system (consecutive passes AND elapsed time) prevents maturity inflation. A burst of 5,000 runs in one day cannot achieve Gold Master — the 180-day time requirement ensures tests have run across multiple code versions, infrastructure changes, and resolver conditions. This mirrors how real-world confidence is earned: through sustained performance, not a single marathon session.
Fewer than 100 consecutive passing audit runs. The engine is learning.
100+ consecutive passes. Results are reliable but still maturing.
500+ passes over 30+ days with no regressions. Production-grade correctness.
1,000+ passes over 90+ days. Battle-tested across diverse domains.
5,000+ passes over 180+ days. The highest confidence tier — reference-grade intelligence.
Two-Layer Auditing
Each protocol is audited at two independent layers:
Collection Layer
Validates that DNS records are queried, retrieved, parsed, and filtered correctly. Tests cover multi-resolver consensus algorithms (5-resolver agreement checks), record type extraction (MX hosts, CAA issuers), record filtering (identifying valid BIMI, MTA-STS, SPF records from TXT noise), TLSA parsing, NS provider classification, and DKIM key parsing. Currently 27 test cases.
Analysis Layer
Validates that collected data is interpreted correctly against RFC standards. Tests cover SPF qualifier classification, DMARC policy enforcement logic, DKIM key strength assessment, DNSSEC chain-of-trust verdicts, brand impersonation verdicts, CAA issuer identification, DANE coverage assessment, and regression guards from every past correctness bug. Currently 102 test cases.
Timeout & Efficiency Strategy
DNS Tool manages timeout budgets across multiple concurrent lookups to maximize intelligence coverage while respecting external service constraints.
Parallel Execution
DNS, SPF, DMARC, DKIM, DNSSEC, CT logs, MTA-STS, TLS-RPT, BIMI, CAA, infrastructure, and security.txt are dispatched concurrently. DANE and SMTP run sequentially after MX resolution. Total analysis targets 10–30 seconds for most domains.
Per-Section Budgets
Each section has independent timeout budgets: DNS lookups (5s per resolver), CT log queries (15s with cooldown), SMTP transport probes (10s per host per port), and HTTP fetches for MTA-STS/BIMI/security.txt (3–5s each). No single section can block the entire analysis.
Graceful Degradation
When a section times out or errors, the analysis continues with remaining sections. Timed-out sections are flagged with a partial-failure banner so you know exactly which data may be incomplete. Re-analysis retries failed sections.
Remote Probe Failover
SMTP probing uses dedicated remote infrastructure (US region) for reliable port 25/465/587 access. If the remote probe is unavailable (network, auth, rate limit), the system falls back to local direct probing. Rate limiting: 30 requests per 60 seconds per client.
Efficiency Tracking
The ICAE Collection layer audits timeout handling as part of protocol correctness. Proper timeout behavior (returning informational status vs. crashing, logging appropriately, enabling re-analysis) is tested alongside data correctness. Timeout patterns feed into maturity progression.
Scanning Philosophy
DNS Tool is designed to be a responsible participant in every system it touches. Our approach: gather the intelligence we need while leaving the smallest possible footprint.
Minimal Footprint
Analysis uses standard DNS protocol queries and lightweight HTTP HEAD/GET requests with per-section timeout budgets. No brute-force enumeration, no credential stuffing, no port scanning beyond mail transport (25/465/587). Exposure checks use 200ms inter-request delays to avoid overwhelming target infrastructure.
Adaptive Rate Awareness
Third-party services (certificate transparency logs, RDAP registries) are monitored with telemetry-based exponential backoff — automatic cooldown from 5 seconds to 5 minutes when services signal degradation. When a source is unavailable or rate-limited, DNS Tool says so honestly rather than hiding the gap.
Symbiotic Interfacing
Every external data source is documented on the Sources page with its rate limits, methodology, and verification commands. SecurityTrails is user-key-only and never called automatically. Community services like Team Cymru are queried via standard DNS protocol with no API keys required.
Honest Reporting
When a section times out, gets rate-limited, or encounters an error, the report says exactly that — never “no issues found” when the data simply could not be checked. Four clear states: success, rate-limited, error, and partial. Transparency is non-negotiable.
Why This Matters
A security grade without a disclosed confidence level is an assertion, not an analysis. The ICAE provides full transparency into analytical correctness — because the score means nothing if you can’t see how certain we are of our own results.
Every protocol’s confidence level is backed by a verifiable count of consecutive audit passes. No black boxes. No hand-waving. Every claim backed by deterministic test cases.
Intelligence Currency Levels
Intelligence Currency Audit Engine (ICuAE)
Companion to ICAE. While ICAE measures correctness (did we interpret the data right?), ICuAE measures currency (is the data still valid?). Five standards-grounded dimensions evaluate data freshness, TTL compliance, completeness, source credibility, and TTL relevance for every scan.
Runtime Performance
ICuAE measures how close each scan’s data comes to the theoretical ideal — a perfectly tuned, machine-locked collector that requests every DNS record at exactly the right cadence, receives responses within authoritative TTL windows, achieves complete multi-resolver consensus, and returns a full record set. A score of 100 means the collected data is indistinguishable from ground truth. Because real-world DNS inherently fluctuates (caches age, resolvers disagree, optional records vary by domain), we track statistical stability across scans rather than pass/fail maturity.
Grade Distribution
How often each currency grade appears across all evaluated scans. A healthy system clusters toward Excellent and Good.
Per-Dimension Averages
Each of the five currency dimensions, averaged across all scans. Low-scoring dimensions indicate systemic patterns; tuning hints suggest how to improve collection fidelity.
| Dimension | Standard | Avg Score | Grade | Samples |
|---|---|---|---|---|
| Completeness | NIST SP 800-53 SI-7 | 35.2 | degraded | 5105 |
| Tuning Advisory: Multiple expected record types are consistently missing. Expanding the query set or adding retry logic for failed lookups would improve coverage. | ||||
| Currentness | ISO/IEC 25012 | 100.0 | excellent | 5105 |
| Source Credibility | ISO/IEC 25012 + SPJ | 97.6 | excellent | 5105 |
| TTL Compliance | RFC 8767 | 97.7 | excellent | 5105 |
| TTL Relevance | NIST SP 800-53 SI-7 | 43.0 | degraded | 5105 |
| Tuning Advisory: Observed TTLs deviate significantly from expected ranges for their record types. This often indicates domain-side misconfiguration rather than collection issues. | ||||
Why Track Currency Separately?
ICAE — Correctness
“Did we read the data right?” ICAE runs deterministic test vectors against our analysis engine. If SPF says ~all, does the tool correctly identify it as a softfail? This is pass/fail, so we track consecutive passes and maturity tiers.
ICuAE — Currency
“How close is the collected data to ground truth?” ICuAE scores each scan against a theoretical ideal — a machine-locked collector with perfect TTL compliance, complete records, and full resolver consensus. Real-world DNS fluctuates, so instead of pass/fail we track statistical stability — rolling averages and variance across scans.
Per ICD 203, confidence requires both: an accurate interpretation of data that is also current. One without the other is incomplete intelligence.
Excellence Benchmarks
What does “near-ideal” DNS collection look like in the real world? These targets are derived from large-scale passive DNS observation networks and authoritative resolver operations that approach the theoretical ideal.
| Dimension | Excellence Target | Real-World Reference |
|---|---|---|
| TTL Compliance | ≥95% | Farsight DNSDB and OpenINTEL passive sensors collect at TTL-aligned intervals. RFC 8767 defines serve-stale as an explicit protocol extension, making non-compliant caching measurably detectable. |
| Completeness | ≥98% | Large-scale collectors (RiskIQ, Censys) query all standard record types per zone. ≥98% coverage of the core set (A, AAAA, MX, TXT, NS, SOA, CAA, DMARC, SPF) is achievable for any domain that publishes them. |
| Source Credibility | ≥90% | Google Public DNS, Cloudflare 1.1.1.1, and Quad9 operate at global scale with near-identical authoritative views. ≥90% multi-resolver agreement is standard; unanimity is expected for NS and SOA records. |
| Currentness | <0.5× TTL | DNSPerf tests from 200+ locations every 60 seconds. Median data age below half the authoritative TTL indicates the collector is querying well within the freshness window. |
| TTL Relevance | Within Range | NIST SP 800-53 SI-7 treats information integrity as a measurable property. TTLs within the typical range for their record type (3600s for TXT, 86400s for NS) indicate well-configured authoritative zones. |
Where these numbers come from: Farsight Security’s DNSDB processes billions of DNS observations daily from sensor networks worldwide. OpenINTEL (University of Twente) performs daily active measurements across all .com, .net, and .org zones. These systems represent the closest real-world approximation to the theoretical machine-locked ideal. Our scoring model uses their operational characteristics as the upper boundary of what is achievable.
Self-Tuning Intelligence Pipeline
ICuAE is not just a measurement engine — it is the diagnostic instrument for the collection pipeline itself. By tracking per-dimension statistics across scans, ICuAE identifies exactly which stage of the analysis chain needs attention.
Phase 1: Advisory
Dimension-level tuning hints surfaced in the Per-Dimension Averages table. When a dimension scores below 90, ICuAE explains what’s happening and suggests specific improvements. Live
Phase 2: Suggested Config
Generate recommended scanner profiles from rolling statistics — resolver set, retry thresholds, record type priorities — requiring explicit approval before applying. Generation Live Approval On the Roadmap
Phase 3: Adaptive Tuning
Fully automatic, non-destructive adjustments (timing jitter, retries, resolver weighting) with rollback if stability decreases. Gated by minimum sample count and confidence thresholds. On the Roadmap
The vision: With enough scans and enough science, the confidence engine tunes TTLs, resolver weighting, query cadence, and retry logic until the system achieves the highest possible fidelity against the theoretical ideal — automatically, measurably, and with full provenance.
Standards Foundation
ICuAE is grounded in five authoritative standards from the intelligence community, information quality, and journalism ethics.
ICD 203 CIA Timeliness
Intelligence Community Directive 203 identifies timeliness as one of five core analytic standards. Data that was accurate yesterday may be misleading today.
NIST SP 800-53 SI-7
NIST SI-7 addresses information integrity — ensuring data has not been improperly modified and remains complete. ICuAE operationalizes completeness and TTL relevance as integrity dimensions for DNS data.
ISO 25012 Currentness
ISO/IEC 25012 defines “Currentness” — data of the right age for its context. DNS records have inherent validity windows defined by TTL values.
RFC 8767 TTL
RFC 8767 defines TTL-based cache expiration and serve-stale behavior. ICuAE detects when resolver TTLs exceed authoritative values — which may indicate serve-stale behavior, timing skew, or cache misconfiguration.
SPJ Source Ethics
SPJ Code of Ethics requires multiple independent sources for verification. ICuAE measures multi-resolver agreement as a credibility indicator.
Five Measurement Dimensions
| Dimension | Standard | What It Measures |
|---|---|---|
| Currentness | ISO/IEC 25012 | Data age relative to its TTL-derived validity window. Are the DNS records still within their expected freshness period? |
| TTL Compliance | RFC 8767 | Whether resolver TTLs respect authoritative limits. Exceedances may indicate RFC 8767 serve-stale behavior, timing skew, or cache misconfiguration. |
| Completeness | NIST SI-7 | Percentage of expected record types with authoritative TTL data. Gaps reduce overall intelligence quality. |
| Source Credibility | ISO + SPJ | Multi-resolver agreement scoring. When all five resolvers return identical data, source credibility is highest. |
| TTL Relevance | NIST SI-7 | Observed TTL versus typical range for each record type. Extreme deviations may indicate misconfiguration. |
Deterministic Test Matrix
29 test cases verify ICuAE scoring logic across all five dimensions. Every grade boundary, edge case, and nil-input path is tested deterministically — no randomness, no approximation.
Currency Grading Scale
The 0–100 score measures proximity to a theoretical ideal: a perfectly tuned collection system that requests every record type at exactly the right cadence, receives responses within authoritative TTL windows, achieves complete multi-resolver consensus, and returns a full record set with zero gaps. A score of 100 means the data is indistinguishable from what an ideally configured, machine-locked collector would produce. Each dimension is scored independently; the overall grade is their average.
| Grade | Range | What It Means | Signal |
|---|---|---|---|
| Excellent | 90–100 | Data was collected within authoritative TTL windows, all resolvers agree, and the record set is complete. Near-ideal collection fidelity. | The system is performing at or near the theoretical machine-locked ideal. Minimal drift from ground truth. |
| Good | 75–89 | Minor deviations from ideal: perhaps one resolver returned a slightly stale cache, or a non-critical record type was absent. Data remains operationally reliable. | Healthy collection with small imperfections. Acceptable for production intelligence. |
| Adequate | 50–74 | Measurable gaps: some resolvers served cached data beyond authoritative TTL, optional record types are missing, or source agreement is partial. Data is usable but not pristine. | The domain’s DNS configuration has real-world imperfections common in production environments. Worth investigating but not alarming. |
| Degraded | 25–49 | Significant staleness or incompleteness: resolver caches substantially exceed authoritative TTLs, multiple record types are absent, or resolvers disagree on fundamental records. | Data collection is meaningfully distant from the ideal. Results should be interpreted with caution; re-scan recommended after cache expiry. |
| Stale | 0–24 | Severe currency failure: data is likely cached well beyond TTL, critical record types are absent, or resolvers returned fundamentally conflicting answers. | The collected data does not reflect current ground truth. Per ICD 203, stale data should not be used for confidence assessments without explicit caveats. |
Why 0–100? ISO/IEC 25012 defines timeliness as a quantitative data quality dimension — it exists on a continuum, not as a binary. A 0–100 normalized score allows statistical tracking (rolling averages, standard deviation, trend analysis) that binary pass/fail cannot. NIST SP 800-53 SI-7 (Information Integrity) similarly treats data completeness and validity as measurable properties requiring periodic verification. The five-tier grading scale maps the continuous score to actionable categories, paralleling how ICD 203 maps analytic confidence to five levels (almost no confidence through high confidence).
Mathematical Foundations
Every confidence score is derived from deterministic, standards-grounded mathematics — not heuristics or machine learning. The formulas below are the actual computations running in the engine.
EWMA Drift Detection
The Exponentially Weighted Moving Average tracks currency score stability over time. Each new scan updates the statistic, giving recent observations more weight than historical ones.
Control limits detect statistically significant drift — not just any change, but changes that exceed normal process variation:
Where \(\lambda\) is the smoothing factor (0.2), \(L\) is the control limit multiplier (3σ), and \(t\) is the observation period. Based on NIST/SEMATECH Engineering Statistics Handbook §6.3.2.4.
Implementation: icuae/ewma.go → EWMAControlChart.Add(), EWMAControlChart.IsOutOfControl() · Parameters: NewEWMAControlChart(λ=0.2, μ0=50, σ=10, L=3.0)
Bootstrap note: The initial parameters (μ0=50, σ=10, L=3.0) are heuristic defaults that allow monitoring to begin immediately without a Phase I calibration dataset. σ is refined adaptively from observed data after 10+ observations (see Add() method). These are operational starting points per NIST/SEMATECH §6.3.2.4, not values fitted from historical in-control DNS data.
Reliability-Weighted Shrinkage Calibration
Each protocol carries an empirical prior — a Beta distribution encoding historical detection reliability. Measurement quality (resolver agreement) determines how much the raw observation is trusted versus the prior anchor — a Bayesian-inspired shrinkage estimator:
Where \(w = \frac{\text{agreeing resolvers}}{\text{total resolvers}}\) is measurement quality, and \(\frac{\alpha}{\alpha+\beta}\) is the prior mean from a \(\text{Beta}(\alpha, \beta)\) distribution for the protocol category. When resolver agreement is low, the prior mean anchors the estimate; as agreement increases, the raw observation dominates. This is a convex shrinkage estimator — structurally similar to, but distinct from, the true Beta-Bernoulli posterior mean \(E[\theta|D] = \frac{\alpha+s}{\alpha+\beta+n}\), where the weight on data is derived from observation count rather than set independently. Prior parameters evolve via conjugate updating: each passing ICAE test increments \(\alpha\), each failure increments \(\beta\).
Implementation: icae/priors.go → CalibrationEngine.CalibratedConfidence() · Per-protocol Beta priors defined in CalibrationEngine.priors map
Currency Score Normalization
Each ICuAE dimension is scored on a continuous 0–100 scale. The overall currency score is the weighted mean across all dimensions:
Dimension weights are equal by default (each \(w_i = \frac{1}{n}\)). Per ISO/IEC 25012, timeliness is a quantitative data quality dimension — the continuous score enables statistical tracking (rolling averages, standard deviation, trend analysis) that binary pass/fail cannot.
Implementation: icuae/icuae.go → BuildCurrencyReport() · Five dimensions scored independently via score* functions, averaged into composite grade
Cryptographic Integrity
Every analysis result is sealed with a SHA-3-512 digest over a canonical pipe-delimited representation of posture fields. The hash function is the NIST FIPS 202 standard (Keccak sponge construction):
Where \(R\) is the canonical posture representation — protocol statuses, records, policies, and posture labels joined in deterministic field order. The digest is independently verifiable — anyone with the same posture fields can recompute and confirm integrity.
Implementation: analyzer/posture_hash.go → CanonicalPostureHash() · Pipe-delimited canonical string with deterministic field ordering, verified by icae/hash_audit.go
Dual Engine Architecture
DNS Tool employs two companion engines that measure scientifically distinct properties of intelligence quality. ICAE (correctness) and ICuAE (currency) are never conflated — accuracy and timeliness are independent dimensions per ICD 203 and NIST SP 800-53. These engines are one of five analytic perspectives that together form our Symbiotic Security model.
ICAE — Correctness
“Did we interpret the DNS data correctly?” Deterministic golden-rule tests with per-protocol maturity tracking and cryptographic hash integrity.
ICuAE — Currency
“Is the DNS data still valid/current?” Five standards-grounded dimensions evaluated per-scan with TTL-aware validity windows and multi-resolver credibility.
