Product

Agentic Whitebox Pentester

An autonomous pentester with full source-code access. Reads your code, runs the live app, and ships only what it can actually exploit.

Schedule a Technical Demo →
Keygraph whitebox pentester finding detail with reproduction steps and severity

How the pipeline works.

The Whitebox Pentester doesn't pattern-match. It forms hypotheses, reasons about attack paths, and generates working proof-of-concept exploits. 13 specialized agents across 5 phases: phases 1, 2, and 5 run sequentially, phases 3 and 4 run pipelined and parallel.

01
Phase 1 · Pre-Recon
Code Analyst

Maps the codebase end-to-end: entry points, auth flows, database access, and security sinks. Static reasoning only. No browser yet.

02
Phase 2 · Recon
Recon Specialist

Crawls the live app to confirm endpoints, forms, and auth boundaries. Adds Nmap and Subfinder for infrastructure scope.

03
Phase 3 · Vulnerability Analysis
Five analysts in parallel

Injection, XSS, Auth, SSRF, and Authz specialists each build an exploitation queue for their domain.

04
Phase 4 · Exploitation
Paired exploiters

Each analyst hands off to a paired exploiter that crafts payloads, runs them live, and confirms working PoCs. Skipped if the queue is empty.

05
Phase 5 · Reporting
Reporting Agent

Synthesizes verified exploits into the final report. Speculative findings are dropped. Only confirmed evidence ships.

PHASE 1 Pre-Reconnaissance Code Analyst Maps the codebase end to end: entry points, auth flows, database access, and security sinks. No browser yet. PHASE 2 Reconnaissance Recon Specialist Crawls the live app to confirm endpoints, forms, and auth boundaries. Adds Nmap + Subfinder for infrastructure scope. PHASE 3 · VULNERABILITY ANALYSIS Five specialists work in parallel. Each builds an exploitation queue for its domain. Injection Analyst Reviews how untrusted input flows into queries and shells. HUNTS • SQL injection • Command injection • Template injection • LDAP / NoSQL XSS Analyst Audits how user input is rendered back into the DOM. HUNTS • Reflected XSS • Stored XSS • DOM-based XSS • Header / URL injection Auth Analyst Reviews login, session, and password handling logic. HUNTS • Session bypass • MFA weakness • Password reset flaws • Token leakage SSRF Analyst Traces user-controlled URLs reaching outbound HTTP clients. HUNTS • Cloud metadata access • Internal port probing • URL filter bypass • Protocol smuggling Authz Analyst Inspects how access decisions are enforced across resources. HUNTS • Broken access control • IDOR • Privilege escalation • Missing checks if vulnerabilities found PHASE 4 · EXPLOITATION Each analyst hands off to a paired exploiter. Skipped if the queue is empty. Injection Exploiter Crafts payloads, runs them live, confirms data leak or shell exec. XSS Exploiter Builds working payloads and validates execution in a real browser. Auth Exploiter Tries session takeover, MFA bypass, credential reuse against the app. SSRF Exploiter Probes cloud metadata, internal ports, and URL filter bypasses live. Authz Exploiter Tests IDOR, vertical and horizontal escalation, broken access live. PHASE 5 Reporting Reporting Agent Synthesizes verified exploits into a pentest-grade report. Speculative findings dropped. Only confirmed evidence ships. Pentest Report Delivered

Five parallel attack domains.

Each domain is covered by a dedicated agent pair: one analyzer that combines static and dynamic signals, and one exploiter that fires real attacks. Because the agents run in parallel, the pipeline moves at the speed of the slowest domain, not the sum.

Injection
SQLi, NoSQLi, command, template

Traces user input from source to dangerous sink using the code graph, then fires structured payloads against each confirmed vector.

XSS
Reflected, stored, DOM-based

Finds sanitization gaps and context-escape bugs in render paths, then validates with real browser payloads executed via Playwright.

SSRF
Cloud metadata, internal services

Identifies server-side fetchers and URL parsers in code, then probes cloud metadata APIs, internal endpoints, and private services.

Auth
Broken auth, session handling

Reads session, token, and login logic directly from source, then tests bypass paths, fixation, and credential edge cases at runtime.

Authz
IDOR, privilege escalation

Maps object-ownership and role checks across controllers, then attempts horizontal and vertical access as real authenticated users.

SAST findings get their proof.

Existing SAST results are the pentester's starting line. Every static finding is enriched, mapped, queued for exploitation, and linked back to its origin in the canonical findings table. The static findings that survive become the ones worth fixing.

01
Ingest
Parse the SARIF.

Pulls the latest SAST SARIF, parses file paths, line numbers, and CWE categories into a structured queue, and tags each finding with priority, confidence, and application context.

02
Map + Filter
CWE to OWASP, in-scope only.

Maps each CWE to an OWASP Top 10 category. Only secrets, data-flow findings, and point issues that land in an in-scope OWASP category enter the exploitation queue.

03
Exploit
Start where SAST flagged.

Injection, XSS, SSRF, Auth, and Authz agents each pull from their queue and attempt working proof-of-concept attacks against the live application, focused on locations the static scanner already flagged.

04
Link
One flaw, one record.

Cross-scanner deduplication links the dynamic exploit back to the original SAST finding via canonical ID, content hash, or LLM semantic comparison. A SAST entry in auth.go:142 and an exploit against POST /login become the same canonical finding.

False Positive Signal
SAST findings the pentester cannot reproduce are flagged for review.

A noisy static scanner becomes a calibrated one. The static findings that survive the pentester are the ones worth fixing.

Semantic deduplication, not alert fatigue.

Findings are deduplicated through a two-path system: a content hash for identical exploits, and an LLM semantic comparison for cases where different payloads target the same underlying root cause. The system is intentionally conservative: when in doubt, it reports rather than suppresses.

Path 1
Content hash.

Deterministic match for identical exploits. If two findings hash to the same value, they collapse into one canonical record.

Path 2
LLM semantic comparison.

An LLM recognizes when different payloads target the same underlying root cause, so variant exploits collapse without losing distinct vulnerabilities.

The Result
The findings that land in your dashboard are distinct, high-confidence vulnerabilities, not noise.

Conservative-by-default behavior: when the deduplicator is uncertain, it surfaces both findings rather than silently merging them. You get fewer dashboards full of duplicates and zero quietly suppressed bugs.

See it prove what's exploitable.

Watch the pipeline run end-to-end against a real codebase, on a call with our engineering team. Self-hosted, in your VPC, with your own LLM keys.