Intelligence Confidence Levels

Intelligence Confidence Audit Engine (ICAE)

Operational confidence posture for every DNS intelligence protocol. ICAE assessments are derived from deterministic pass thresholds plus time-in-service, with no manual overrides or self-grading. Every result is cryptographically hashed and retained — a tamper-evident, historically verifiable audit trail. View the accountability log.

State Rendered 5 May 2026 15:50:10 UTC

Page State Hash (SHA-3-256) bba92764093c06477e7e55eecccfb8ecb15b428a742f8791b76561550a968204

This attestation seals the confidence posture at the moment of rendering. The hash covers: UTC timestamp, all protocol pass counts, maturity levels, hash audit results, calibration scores, and app version — computed via SHA-3-256 (NIST FIPS 202) over a canonical pipe-delimited representation. The resulting digest is a time-bound, tamper-evident fingerprint of the intelligence state.

134615 Analysis Passes

130602 Collection Passes

100% Audit Pass Rate

9/9 Protocols

75d Days Running

129 Audit Cases

0 Regressions

Intelligence Confidence Matrix

Consistent 9/9

Development < 100 passes

Verified 100+ passes

Consistent 500+ & 30d

Gold 1K+ & 90d

Gold Master 5K+ & 180d

SPF Consistent 15106 passes · 15106 runs

Collection

Consistent 14574 runs

Analysis

Consistent 15106 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 15106/1000 passes 75/90 days 83%

Collection → Gold 14574/1000 passes 73/90 days 81%

DKIM Consistent 14883 passes · 14883 runs

Collection

Consistent 14510 runs

Analysis

Consistent 14883 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14883/1000 passes 74/90 days 82%

Collection → Gold 14510/1000 passes 73/90 days 81%

DMARC Consistent 15087 passes · 15087 runs

Collection

Consistent 14463 runs

Analysis

Consistent 15087 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 15087/1000 passes 75/90 days 83%

Collection → Gold 14463/1000 passes 73/90 days 81%

DANE/TLSA Consistent 14865 passes · 14865 runs

Collection

Consistent 14498 runs

Analysis

Consistent 14865 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14865/1000 passes 74/90 days 82%

Collection → Gold 14498/1000 passes 73/90 days 81%

DNSSEC Consistent 15064 passes · 15064 runs

Collection

Consistent 14521 runs

Analysis

Consistent 15064 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 15064/1000 passes 75/90 days 83%

Collection → Gold 14521/1000 passes 73/90 days 81%

BIMI Consistent 14880 passes · 14880 runs

Collection

Consistent 14515 runs

Analysis

Consistent 14880 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14880/1000 passes 74/90 days 82%

Collection → Gold 14515/1000 passes 73/90 days 81%

MTA-STS Consistent 14901 passes · 14901 runs

Collection

Consistent 14544 runs

Analysis

Consistent 14901 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14901/1000 passes 74/90 days 82%

Collection → Gold 14544/1000 passes 73/90 days 81%

TLS-RPT Consistent 14916 passes · 14916 runs

Collection

Consistent 14435 runs

Analysis

Consistent 14916 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14916/1000 passes 74/90 days 82%

Collection → Gold 14435/1000 passes 73/90 days 81%

CAA Consistent 14913 passes · 14913 runs

Collection

Consistent 14542 runs

Analysis

Consistent 14913 runs

First pass: 2026-02-19 Last evaluated: 2026-05-05

Analysis → Gold 14913/1000 passes 74/90 days 82%

Collection → Gold 14542/1000 passes 73/90 days 81%

Dual Threshold (passes + time): Development <100 Verified 100+ Consistent 500+ & 30d Gold 1K+ & 90d Gold Master 5K+ & 180d

Effective Maturity = min(Analysis, Collection). A protocol’s assessed level reflects the lower of its two layers — both collection and analysis must independently reach each threshold. No layer is exempted.

Hash Integrity Audit

Every analysis result is SHA-3-512 (Keccak, NIST FIPS 202) hashed at creation using a canonical, deterministic serialization of all protocol findings. This audit recomputes hashes from stored results and compares against the retained posture hash to verify no data has been altered post-analysis.

Audit window: 100 of 16914 hashed analyses (most recent 100, ordered by creation date)

100 Audited

100 Verified

0 Mismatched

100% Integrity

All 100 of 100 audited results verified — no posture hash mismatches detected. Tamper-evident audit trail intact.

Open Accountability Log — full hash ledger with timestamps

Calibration Validation

Empirical accuracy of the confidence scoring system, measured by running 129 golden test cases across 5 resolver agreement scenarios (645 total predictions). This answers: “when we state a confidence level, how often are we actually correct?”

Limitation: All 129 golden test cases currently pass (outcome=1.0). These metrics are validated in the success regime only — they confirm the system is well-calibrated when findings are correct, but do not yet validate behavior when outcomes diverge from confident predictions. The calibration function uses fixed rawConfidence=1.0 and varies only resolver agreement to test shrinkage behavior under degraded measurement quality.

Brier Score excellent

0.0018

Excellent — near-perfect probabilistic accuracy

Scale: 0.0 (perfect) — 0.25 (no skill) — 1.0 (worst). Reference: Brier (1950)

Expected Calibration Error good

0.0310

Good — minor calibration gap, operationally reliable

Mean |predicted − observed| across bins, weighted by population. Reference: Naeini et al. (2015)

Reliability Diagram

Each bin shows how well stated confidence matches observed accuracy. A perfectly calibrated system would have predicted = observed in every bin.

Confidence Bin	Predictions	Predicted	Observed	Gap	Distribution
`80–90%`	14	88.0%	100.0%	0.1200
`90–100%`	631	97.1%	100.0%	0.0290

Per-Protocol Calibration

Calibration gap by protocol — how far each protocol’s stated confidence deviates from observed accuracy. Sorted from best to worst calibrated.

Protocol	Cases	Mean Confidence	Pass Rate	Brier	Gap	Rating
DMARC	120	98.8%	100%	`0.0002`	`0.0120`	excellent
CAA	50	98.0%	100%	`0.0006`	`0.0200`	excellent
SPF	100	98.0%	100%	`0.0006`	`0.0200`	good
TLS-RPT	25	97.2%	100%	`0.0012`	`0.0280`	good
DNSSEC	125	96.8%	100%	`0.0015`	`0.0320`	good
DKIM	40	96.0%	100%	`0.0024`	`0.0400`	good
MTA-STS	60	96.0%	100%	`0.0024`	`0.0400`	good
BIMI	55	95.2%	100%	`0.0035`	`0.0480`	good
DANE/TLSA	70	94.0%	100%	`0.0054`	`0.0600`	adequate

Methodology: Predictions use fixed rawConfidence=1.0 (engine predicts “correct”) with 5 resolver agreement levels (5/5 through 1/5) per test case. No label leakage — ground-truth outcomes are never used as prediction inputs. The shrinkage estimator w·C_raw + (1-w)·α/(α+β) is what produces the varying confidence levels as measurement quality degrades.

Confidence Degradation Log

No degradation events recorded. All protocols have maintained continuous passing status since first evaluation.

Protocol Test Dossier

Each protocol is audited against deterministic test cases grounded in specific RFC sections. Below is what ICAE tests for each protocol — the signals, the standards, and the methodology.

SPF — 20 total (19 analysis + 1 collection) test cases

Validates qualifier classification (~all, -all, +all, ?all), DNS lookup counting (RFC 7208 §4.6.4), 10-lookup limit enforcement, no-mail intent detection, multiple-record error handling (§3.2), record classification (valid vs. spf-like), and cross-protocol RFC 7489 §10.1 premature rejection warnings.

Methodology: Deterministic input → expected output. No live DNS. Pure logic validation.

DMARC — 24 total (21 analysis + 3 collection) test cases

Validates policy enforcement logic (reject/quarantine/none per RFC 7489 §6.3), partial percentage coverage, SPF-only exposure, DMARC-without-SPF gaps, null MX no-mail domains (RFC 7505), and structured color/severity mapping for spoofability assessment.

Methodology: Combinatorial policy matrix → expected verdict + severity color.

DKIM — 8 total (7 analysis + 1 collection) test cases

Validates RSA key strength classification (1024-bit weak, 2048-bit adequate per RFC 8301), Ed25519 key type parsing (RFC 8463), revoked key detection (empty p= per RFC 6376 §3.6.1), test mode flag detection (t=y), and provider fingerprinting (Google, Microsoft 365).

Methodology: Synthetic DKIM records → expected key analysis + provider classification.

DNSSEC — 25 total (18 analysis + 7 collection) test cases

Validates chain-of-trust verdicts (signed/unsigned/broken per RFC 4033 §2), tampering exposure assessment, DS digest classification (SHA-256 per RFC 8624 §3.3), enterprise DNS provider detection (Cloudflare, AWS Route 53, Google Cloud DNS, Azure, Akamai, NS1 per RFC 1035), and inheritance chain classification.

Methodology: Simulated NS/DS inputs → expected verdict + provider fingerprint.

DANE/TLSA — 14 total (10 analysis + 4 collection) test cases

Validates TLSA usage type parsing (DANE-EE usage 3 per RFC 7672 §3.1), deprecated usage recommendations (usage 0 triggers RFC 7672 advisory), MX host extraction (RFC 5321 §5), full-coverage verdict logic, and no-TLSA informational classification.

Methodology: Synthetic TLSA + MX inputs → expected verdict + coverage status.

MTA-STS — 12 total (9 analysis + 3 collection) test cases

Validates mode enforcement logic (enforce=success, testing=warning per RFC 8461 §5), policy line parsing (version, mode, max_age, mx per §3.2), STS record filtering (§3.1), and policy ID extraction.

Methodology: Synthetic DNS records + policy bodies → expected parsed fields.

CAA — 10 total (6 analysis + 4 collection) test cases

Validates CA issuer identification (Let’s Encrypt, DigiCert per RFC 8659 §4), issuewild detection (§4.2), iodef record detection (§4.3), and human-readable message construction.

Methodology: Synthetic CAA records → expected parsed issuers + flags.

BIMI — 11 total (9 analysis + 2 collection) test cases

Validates record filtering (v=BIMI1 per BIMI I-D), logo URL extraction, VMC (Verified Mark Certificate) URL extraction, and absent-VMC null handling.

Methodology: Synthetic BIMI records → expected parsed URLs + null checks.

TLS-RPT — 5 total (3 analysis + 2 collection) test cases

Validates TLS-RPT URI extraction from rua fields (RFC 8460 §3), plus cross-protocol cryptographic strength classification: DKIM key strength (2048-bit RSA adequate per RFC 8301, Ed25519 strong) and DS digest type classification (SHA-256 per RFC 8624 §3.3).

Methodology: Synthetic TLS-RPT records → expected URI extraction; known algorithm + key size inputs → expected strength label.

Maturity Levels

Modeled after the Capability Maturity Model (CMM) developed at Carnegie Mellon’s Software Engineering Institute. Five tiers of sustained correctness — earned through deterministic test runs, never self-assigned.

The dual-threshold system (consecutive passes AND elapsed time) prevents maturity inflation. A burst of 5,000 runs in one day cannot achieve Gold Master — the 180-day time requirement ensures tests have run across multiple code versions, infrastructure changes, and resolver conditions. This mirrors how real-world confidence is earned: through sustained performance, not a single marathon session.

Development

Fewer than 100 consecutive passing audit runs. The engine is learning.

Verified

100+ consecutive passes. Results are reliable but still maturing.

Consistent

500+ passes over 30+ days with no regressions. Production-grade correctness.

Gold

1,000+ passes over 90+ days. Battle-tested across diverse domains.

Gold Master

5,000+ passes over 180+ days. The highest confidence tier — reference-grade intelligence.

Two-Layer Auditing

Each protocol is audited at two independent layers:

Collection Layer

Validates that DNS records are queried, retrieved, parsed, and filtered correctly. Tests cover multi-resolver consensus algorithms (5-resolver agreement checks), record type extraction (MX hosts, CAA issuers), record filtering (identifying valid BIMI, MTA-STS, SPF records from TXT noise), TLSA parsing, NS provider classification, and DKIM key parsing. Currently 27 test cases.

Analysis Layer

Validates that collected data is interpreted correctly against RFC standards. Tests cover SPF qualifier classification, DMARC policy enforcement logic, DKIM key strength assessment, DNSSEC chain-of-trust verdicts, brand impersonation verdicts, CAA issuer identification, DANE coverage assessment, and regression guards from every past correctness bug. Currently 102 test cases.

Timeout & Efficiency Strategy

DNS Tool manages timeout budgets across multiple concurrent lookups to maximize intelligence coverage while respecting external service constraints.

Parallel Execution

DNS, SPF, DMARC, DKIM, DNSSEC, CT logs, MTA-STS, TLS-RPT, BIMI, CAA, infrastructure, and security.txt are dispatched concurrently. DANE and SMTP run sequentially after MX resolution. Total analysis targets 10–30 seconds for most domains.

Per-Section Budgets

Each section has independent timeout budgets: DNS lookups (5s per resolver), CT log queries (15s with cooldown), SMTP transport probes (10s per host per port), and HTTP fetches for MTA-STS/BIMI/security.txt (3–5s each). No single section can block the entire analysis.

Graceful Degradation

When a section times out or errors, the analysis continues with remaining sections. Timed-out sections are flagged with a partial-failure banner so you know exactly which data may be incomplete. Re-analysis retries failed sections.

Remote Probe Failover

SMTP probing uses dedicated remote infrastructure (US region) for reliable port 25/465/587 access. If the remote probe is unavailable (network, auth, rate limit), the system falls back to local direct probing. Rate limiting: 30 requests per 60 seconds per client.

Efficiency Tracking

The ICAE Collection layer audits timeout handling as part of protocol correctness. Proper timeout behavior (returning informational status vs. crashing, logging appropriately, enabling re-analysis) is tested alongside data correctness. Timeout patterns feed into maturity progression.

Scanning Philosophy

DNS Tool is designed to be a responsible participant in every system it touches. Our approach: gather the intelligence we need while leaving the smallest possible footprint.

Minimal Footprint

Analysis uses standard DNS protocol queries and lightweight HTTP HEAD/GET requests with per-section timeout budgets. No brute-force enumeration, no credential stuffing, no port scanning beyond mail transport (25/465/587). Exposure checks use 200ms inter-request delays to avoid overwhelming target infrastructure.

Adaptive Rate Awareness

Third-party services (certificate transparency logs, RDAP registries) are monitored with telemetry-based exponential backoff — automatic cooldown from 5 seconds to 5 minutes when services signal degradation. When a source is unavailable or rate-limited, DNS Tool says so honestly rather than hiding the gap.

Symbiotic Interfacing

Every external data source is documented on the Sources page with its rate limits, methodology, and verification commands. SecurityTrails is user-key-only and never called automatically. Community services like Team Cymru are queried via standard DNS protocol with no API keys required.

Honest Reporting

When a section times out, gets rate-limited, or encounters an error, the report says exactly that — never “no issues found” when the data simply could not be checked. Four clear states: success, rate-limited, error, and partial. Transparency is non-negotiable.

Why This Matters

A security grade without a disclosed confidence level is an assertion, not an analysis. The ICAE provides full transparency into analytical correctness — because the score means nothing if you can’t see how certain we are of our own results.

Every protocol’s confidence level is backed by a verifiable count of consecutive audit passes. No black boxes. No hand-waving. Every claim backed by deterministic test cases.

Intelligence Currency Levels

Intelligence Currency Audit Engine (ICuAE)

Companion to ICAE. While ICAE measures correctness (did we interpret the data right?), ICuAE measures currency (is the data still valid?). Five standards-grounded dimensions evaluate data freshness, TTL compliance, completeness, source credibility, and TTL relevance for every scan.

29 Deterministic Cases

5 Dimensions

5 Standards

5 Grade Tiers

7 Test Categories

Runtime Performance

ICuAE measures how close each scan’s data comes to the theoretical ideal — a perfectly tuned, machine-locked collector that requests every DNS record at exactly the right cadence, receives responses within authoritative TTL windows, achieves complete multi-resolver consensus, and returns a full record set. A score of 100 means the collected data is the closest match to an idealized reference collector. Because real-world DNS inherently fluctuates (caches age, resolvers disagree, optional records vary by domain), we track statistical stability across scans rather than pass/fail maturity.

19202 Scans Evaluated

74.3 Adequate

Trend

Good Stability Stability (σ=5.4)

Grade Distribution

How often each currency grade appears across all evaluated scans. A healthy system clusters toward Excellent and Good.

60%

Adequate

11453 scans

Degraded

1 scans

40%

Good

7748 scans

Per-Dimension Averages

Each of the five currency dimensions, averaged across all scans. Low-scoring dimensions indicate systemic patterns; tuning hints suggest how to improve collection fidelity.

Dimension	Standard	Avg Score	Grade	Samples
Completeness	NIST SP 800-53 SI-7	35.4	degraded	18843
Tuning Advisory: Multiple expected record types are consistently missing. Expanding the query set or adding retry logic for failed lookups would improve coverage.
Currentness	ISO/IEC 25012	100.0	excellent	18843
Source Credibility	ISO/IEC 25012 + SPJ	97.4	excellent	18843
TTL Compliance	RFC 8767	98.1	excellent	18843
TTL Relevance	NIST SP 800-53 SI-7	40.4	degraded	18843
Tuning Advisory: Observed TTLs deviate significantly from expected ranges for their record types. This often indicates domain-side misconfiguration rather than collection issues.

Last evaluated: 2026-05-05 09:11 UTC

Why Track Currency Separately?

ICAE — Correctness

“Did we read the data right?” ICAE runs deterministic test vectors against our analysis engine. If SPF says ~all, does the tool correctly identify it as a softfail? This is pass/fail, so we track consecutive passes and maturity tiers.

ICuAE — Currency

“How close is the collected data to ground truth?” ICuAE scores each scan against a theoretical ideal — a machine-locked collector with perfect TTL compliance, complete records, and full resolver consensus. Real-world DNS fluctuates, so instead of pass/fail we track statistical stability — rolling averages and variance across scans.

Per ICD 203, confidence requires both: an accurate interpretation of data that is also current. One without the other is incomplete intelligence.

Excellence Benchmarks

What does “near-ideal” DNS collection look like in the real world? These targets are derived from large-scale passive DNS observation networks and authoritative resolver operations that approach the theoretical ideal.

Dimension	Excellence Target	Real-World Reference
TTL Compliance	≥95%	Farsight DNSDB and OpenINTEL passive sensors collect at TTL-aligned intervals. RFC 8767 defines serve-stale as an explicit protocol extension, making non-compliant caching measurably detectable.
Completeness	≥98%	Large-scale collectors (RiskIQ, Censys) query all standard record types per zone. ≥98% coverage of the core set (A, AAAA, MX, TXT, NS, SOA, CAA, DMARC, SPF) is achievable for any domain that publishes them.
Source Credibility	≥90%	Google Public DNS, Cloudflare 1.1.1.1, and Quad9 operate at global scale with near-identical authoritative views. ≥90% multi-resolver agreement is standard; unanimity is expected for NS and SOA records.
Currentness	<0.5× TTL	DNSPerf tests from 200+ locations every 60 seconds. Median data age below half the authoritative TTL indicates the collector is querying well within the freshness window.
TTL Relevance	Within Range	NIST SP 800-53 SI-7 treats information integrity as a measurable property. TTLs within the typical range for their record type (3600s for TXT, 86400s for NS) indicate well-configured authoritative zones.

Where these numbers come from: Farsight Security’s DNSDB processes billions of DNS observations daily from sensor networks worldwide. OpenINTEL (University of Twente) performs daily active measurements across all .com, .net, and .org zones. These systems represent the closest real-world approximation to the theoretical machine-locked ideal. Our scoring model uses their operational characteristics as the upper boundary of what is achievable.

Self-Tuning Intelligence Pipeline

ICuAE is not just a measurement engine — it is the diagnostic instrument for the collection pipeline itself. By tracking per-dimension statistics across scans, ICuAE identifies exactly which stage of the analysis chain needs attention.

Phase 1: Advisory

Dimension-level tuning hints surfaced in the Per-Dimension Averages table. When a dimension scores below 90, ICuAE explains what’s happening and suggests specific improvements. Live

Phase 2: Suggested Config

Generate recommended scanner profiles from rolling statistics — resolver set, retry thresholds, record type priorities — requiring explicit approval before applying. Generation Live Approval On the Roadmap

Phase 3: Adaptive Tuning

Fully automatic, non-destructive adjustments (timing jitter, retries, resolver weighting) with rollback if stability decreases. Gated by minimum sample count and confidence thresholds. On the Roadmap

The vision: With enough scans and enough science, the confidence engine tunes TTLs, resolver weighting, query cadence, and retry logic until the system achieves the highest possible fidelity against the theoretical ideal — automatically, measurably, and with full provenance.

Standards Foundation

ICuAE is grounded in five authoritative standards from the intelligence community, information quality, and journalism ethics.

ICD 203 CIA Timeliness

Intelligence Community Directive 203 identifies timeliness as one of five core analytic standards. Data that was accurate yesterday may be misleading today.

NIST SP 800-53 SI-7

NIST SI-7 addresses information integrity — ensuring data has not been improperly modified and remains complete. ICuAE operationalizes completeness and TTL relevance as integrity dimensions for DNS data.

ISO 25012 Currentness

ISO/IEC 25012 defines “Currentness” — data of the right age for its context. DNS records have inherent validity windows defined by TTL values.

RFC 8767 TTL

RFC 8767 defines TTL-based cache expiration and serve-stale behavior. ICuAE detects when resolver TTLs exceed authoritative values — which may indicate serve-stale behavior, timing skew, or cache misconfiguration.

SPJ Source Ethics

SPJ Code of Ethics requires multiple independent sources for verification. ICuAE measures multi-resolver agreement as a credibility indicator.

Five Measurement Dimensions

Dimension	Standard	What It Measures
Currentness	ISO/IEC 25012	Data age relative to its TTL-derived validity window. Are the DNS records still within their expected freshness period?
TTL Compliance	RFC 8767	Whether resolver TTLs respect authoritative limits. Exceedances may indicate RFC 8767 serve-stale behavior, timing skew, or cache misconfiguration.
Completeness	NIST SI-7	Percentage of expected record types with authoritative TTL data. Gaps reduce overall intelligence quality.
Source Credibility	ISO + SPJ	Multi-resolver agreement scoring. When all five resolvers return identical data, source credibility is highest.
TTL Relevance	NIST SI-7	Observed TTL versus typical range for each record type. Extreme deviations may indicate misconfiguration.

Deterministic Test Matrix

29 test cases verify ICuAE scoring logic across all five dimensions. Every grade boundary, edge case, and nil-input path is tested deterministically — no randomness, no approximation.

Score-to-Grade Boundaries 1

All Standards

Currentness 6

ISO/IEC 25012

TTL Compliance 5

RFC 8767

Completeness 4

NIST SP 800-53 SI-7

Source Credibility 3

ISO/IEC 25012 + SPJ

TTL Relevance 6

NIST SP 800-53 SI-7

Integration & Constants 4

All Standards

Currency Grading Scale

The 0–100 score measures proximity to a theoretical ideal: a perfectly tuned collection system that requests every record type at exactly the right cadence, receives responses within authoritative TTL windows, achieves complete multi-resolver consensus, and returns a full record set with zero gaps. A score of 100 means the data is indistinguishable from what an ideally configured, machine-locked collector would produce. Each dimension is scored independently; the overall grade is their average.

Grade	Range	What It Means	Signal
Excellent	90–100	Data was collected within authoritative TTL windows, all resolvers agree, and the record set is complete. Near-ideal collection fidelity.	The system is performing at or near the theoretical machine-locked ideal. Minimal drift from ground truth.
Good	75–89	Minor deviations from ideal: perhaps one resolver returned a slightly stale cache, or a non-critical record type was absent. Data remains operationally reliable.	Healthy collection with small imperfections. Acceptable for production intelligence.
Adequate	50–74	Measurable gaps: some resolvers served cached data beyond authoritative TTL, optional record types are missing, or source agreement is partial. Data is usable but not pristine.	The domain’s DNS configuration has real-world imperfections common in production environments. Worth investigating but not alarming.
Degraded	25–49	Significant staleness or incompleteness: resolver caches substantially exceed authoritative TTLs, multiple record types are absent, or resolvers disagree on fundamental records.	Data collection is meaningfully distant from the ideal. Results should be interpreted with caution; re-scan recommended after cache expiry.
Stale	0–24	Severe currency failure: data is likely cached well beyond TTL, critical record types are absent, or resolvers returned fundamentally conflicting answers.	The collected data does not reflect current ground truth. Per ICD 203, stale data should not be used for confidence assessments without explicit caveats.

Why 0–100? ISO/IEC 25012 defines timeliness as a quantitative data quality dimension — it exists on a continuum, not as a binary. A 0–100 normalized score allows statistical tracking (rolling averages, standard deviation, trend analysis) that binary pass/fail cannot. NIST SP 800-53 SI-7 (Information Integrity) similarly treats data completeness and validity as measurable properties requiring periodic verification. The five-tier grading scale maps the continuous score to actionable categories, paralleling how ICD 203 maps analytic confidence to five levels (almost no confidence through high confidence).

Mathematical Foundations

Every confidence score is derived from deterministic, standards-grounded mathematics — not heuristics or machine learning. The formulas below are the actual computations running in the engine.

EWMA Drift Detection

The Exponentially Weighted Moving Average tracks currency score stability over time. Each new scan updates the statistic, giving recent observations more weight than historical ones.

Z_t = \lambda \cdot X_t + (1 - \lambda) \cdot Z_{t-1}

Control limits detect statistically significant drift — not just any change, but changes that exceed normal process variation:

\text{UCL/LCL} = \mu_0 \pm L \cdot \sigma \sqrt{\frac{\lambda}{2 - \lambda}\left[1 - (1-\lambda)^{2t}\right]}

Where \(\lambda\) is the smoothing factor (0.2), \(L\) is the control limit multiplier (3σ), and \(t\) is the observation period. Based on NIST/SEMATECH Engineering Statistics Handbook §6.3.2.4.

Implementation: icuae/ewma.go → EWMAControlChart.Add(), EWMAControlChart.IsOutOfControl() · Parameters: NewEWMAControlChart(λ=0.2, μ₀=50, σ=10, L=3.0)

Bootstrap note: The initial parameters (μ₀=50, σ=10, L=3.0) are heuristic defaults that allow monitoring to begin immediately without a Phase I calibration dataset. σ is refined adaptively from observed data after 10+ observations (see Add() method). These are operational starting points per NIST/SEMATECH §6.3.2.4, not values fitted from historical in-control DNS data.

Reliability-Weighted Shrinkage Calibration

Each protocol carries an empirical prior — a Beta distribution encoding historical detection reliability. Measurement quality (resolver agreement) determines how much the raw observation is trusted versus the prior anchor — a Bayesian-inspired shrinkage estimator:

C_{\text{calibrated}} = w \cdot C_{\text{raw}} + (1 - w) \cdot \frac{\alpha}{\alpha + \beta}

Where \(w = \frac{\text{agreeing resolvers}}{\text{total resolvers}}\) is measurement quality, and \(\frac{\alpha}{\alpha+\beta}\) is the prior mean from a \(\text{Beta}(\alpha, \beta)\) distribution for the protocol category. When resolver agreement is low, the prior mean anchors the estimate; as agreement increases, the raw observation dominates. This is a convex shrinkage estimator — structurally similar to, but distinct from, the true Beta-Bernoulli posterior mean \(E[\theta|D] = \frac{\alpha+s}{\alpha+\beta+n}\), where the weight on data is derived from observation count rather than set independently. Prior parameters evolve via conjugate updating: each passing ICAE test increments \(\alpha\), each failure increments \(\beta\).

Implementation: icae/priors.go → CalibrationEngine.CalibratedConfidence() · Per-protocol Beta priors defined in CalibrationEngine.priors map

Currency Score Normalization

Each ICuAE dimension is scored on a continuous 0–100 scale. The overall currency score is the weighted mean across all dimensions:

S_{\text{currency}} = \sum_{i=1}^{n} w_i \cdot s_i \quad \text{where} \quad \sum_{i=1}^{n} w_i = 1

Dimension weights are equal by default (each \(w_i = \frac{1}{n}\)). Per ISO/IEC 25012, timeliness is a quantitative data quality dimension — the continuous score enables statistical tracking (rolling averages, standard deviation, trend analysis) that binary pass/fail cannot.

Implementation: icuae/icuae.go → BuildCurrencyReport() · Five dimensions scored independently via score* functions, averaged into composite grade

Cryptographic Integrity

Every analysis result is sealed with a SHA-3-512 digest over a canonical pipe-delimited representation of posture fields. The hash function is the NIST FIPS 202 standard (Keccak sponge construction):

H = \text{SHA-3-512}\left(\text{Canonical}(R)\right)

Where \(R\) is the canonical posture representation — protocol statuses, records, policies, and posture labels joined in deterministic field order. The digest is independently verifiable — anyone with the same posture fields can recompute and confirm integrity.

Implementation: analyzer/posture_hash.go → CanonicalPostureHash() · Pipe-delimited canonical string with deterministic field ordering, verified by icae/hash_audit.go

Dual Engine Architecture

DNS Tool employs two companion engines that measure scientifically distinct properties of intelligence quality. ICAE (correctness) and ICuAE (currency) are never conflated — accuracy and timeliness are independent dimensions per ICD 203 and NIST SP 800-53. These engines are one of five analytic perspectives that together form our Symbiotic Security model.

ICAE — Correctness

“Did we interpret the DNS data correctly?” Deterministic golden-rule tests with per-protocol maturity tracking and cryptographic hash integrity.

ICuAE — Currency

“Is the DNS data still valid/current?” Five standards-grounded dimensions evaluated per-scan with TTL-aware validity windows and multi-resolver credibility.

Rules of Engagement

What You May Do

What You May Not Do

Intelligence Confidence Levels

Intelligence Confidence Matrix

Hash Integrity Audit

Calibration Validation

Reliability Diagram

Per-Protocol Calibration

Confidence Degradation Log

Protocol Test Dossier

SPF — 20 total (19 analysis + 1 collection) test cases

DMARC — 24 total (21 analysis + 3 collection) test cases

DKIM — 8 total (7 analysis + 1 collection) test cases

DNSSEC — 25 total (18 analysis + 7 collection) test cases

DANE/TLSA — 14 total (10 analysis + 4 collection) test cases

MTA-STS — 12 total (9 analysis + 3 collection) test cases

CAA — 10 total (6 analysis + 4 collection) test cases

BIMI — 11 total (9 analysis + 2 collection) test cases

TLS-RPT — 5 total (3 analysis + 2 collection) test cases

Maturity Levels

Two-Layer Auditing

Collection Layer

Analysis Layer

Timeout & Efficiency Strategy

Parallel Execution

Per-Section Budgets

Graceful Degradation

Remote Probe Failover

Efficiency Tracking

Scanning Philosophy

Minimal Footprint

Adaptive Rate Awareness

Symbiotic Interfacing

Honest Reporting

Why This Matters

Intelligence Currency Levels

Runtime Performance

Grade Distribution

Per-Dimension Averages

Why Track Currency Separately?

ICAE — Correctness

ICuAE — Currency

Excellence Benchmarks

Self-Tuning Intelligence Pipeline

Phase 1: Advisory

Phase 2: Suggested Config

Phase 3: Adaptive Tuning

Standards Foundation

ICD 203 CIA Timeliness

NIST SP 800-53 SI-7

ISO 25012 Currentness

RFC 8767 TTL

SPJ Source Ethics

Five Measurement Dimensions

Deterministic Test Matrix

Currency Grading Scale

Mathematical Foundations

EWMA Drift Detection

Reliability-Weighted Shrinkage Calibration

Currency Score Normalization

Cryptographic Integrity

Dual Engine Architecture

ICAE — Correctness

ICuAE — Currency