Product

Agentic Whitebox Pentester

An autonomous pentester with full source-code access. It reads your code, attacks the running application, and reports only the vulnerabilities it can validate with a working exploit. No Exploit, No Report.

Schedule a Technical Demo →

How the pipeline works.

The Whitebox Pentester doesn't pattern-match. It forms vulnerability hypotheses, reasons about attack paths, and resolves each hypothesis by attempting a working proof-of-concept exploit. The pipeline runs 13 specialized agents across 5 phases that follow the arc of a penetration test. Phases 1, 2, and 5 run in sequence; phases 3 and 4 run as a parallel pipeline.

Phase 1 · Pre-Recon

Code Analyst

Maps the codebase end-to-end: entry points, auth flows, database access, and security sinks. Static reasoning only. No browser yet.

Phase 2 · Recon

Recon Specialist

Crawls the live app to confirm endpoints, forms, and auth boundaries. Adds Nmap and Subfinder for infrastructure scope.

Phase 3 · Vulnerability Analysis

Five analysts in parallel

Injection, XSS, Auth, SSRF, and Authz specialists each build an exploitation queue for their domain.

Phase 4 · Exploitation

Paired exploiters

Each analyst hands off to a paired exploiter that crafts payloads, runs them live, and confirms working PoCs. Skipped if the queue is empty.

Phase 5 · Reporting

Reporting Agent

Synthesizes validated exploits into the final report, with reproduction steps and severity for each finding. Speculative findings are dropped; only confirmed findings ship. Each run stands on its own as penetration test evidence.

Five parallel attack domains.

Each domain is covered by a dedicated agent pair: an analyst that correlates static and dynamic signals to identify candidate vulnerabilities, and an exploiter that attempts working exploits against the running application to validate them. Because the agents run in parallel, the pipeline moves at the speed of the slowest domain, not the sum.

Injection

SQLi, NoSQLi, command, template

Traces user input from source to dangerous sink using the code graph, then fires structured payloads against each confirmed vector.

XSS

Reflected, stored, DOM-based

Finds sanitization gaps and context-escape bugs in render paths, then validates with real browser payloads executed via Playwright.

SSRF

Cloud metadata, internal services

Identifies server-side fetchers and URL parsers in code, then probes cloud metadata APIs, internal endpoints, and private services.

Auth

Broken auth, session handling

Reads session, token, and login logic directly from source, then tests bypass paths, fixation, and credential edge cases at runtime.

Authz

IDOR, privilege escalation

Maps object-ownership and role checks across controllers, then attempts horizontal and vertical access as real authenticated users.

SAST findings get their proof.

Existing SAST results are the pentester's starting line. Every static finding is enriched, queued for exploitation, and, once validated, recorded as one canonical finding. The static findings that survive become the ones worth fixing.

Ingest

Parse the SARIF.

Pulls the latest SAST SARIF, parses file paths, line numbers, and CWE categories into a structured queue, and tags each finding with priority, confidence, and application context.

Map + Filter

CWE to OWASP, in-scope only.

Maps each CWE to an OWASP Top 10 category. Only secrets, data-flow findings, and point issues that land in an in-scope OWASP category enter the exploitation queue.

Exploit

Start where SAST flagged.

Injection, XSS, SSRF, Auth, and Authz agents each pull from their queue and attempt working proof-of-concept attacks against the live application, focused on locations the static scanner already flagged.

Link

One flaw, one record.

Cross-scanner deduplication links the dynamic exploit back to the original SAST finding via canonical ID, content hash, or LLM semantic comparison. A SAST entry in auth.go:142 and an exploit against POST /login become the same canonical finding.

False Positive Signal

SAST findings the pentester cannot reproduce are flagged for review.

A noisy static scanner becomes a calibrated one. What remains is a shortlist you can trust.

Semantic deduplication, not alert fatigue.

Findings are deduplicated through a two-path system: a content hash for identical exploits, and an LLM semantic comparison for cases where different payloads target the same root cause.

Path 1

Content hash.

Deterministic match for identical exploits. If two findings hash to the same value, they collapse into one canonical record.

Path 2

LLM semantic comparison.

An LLM recognizes when different payloads hit the same flaw, so variant exploits collapse without losing distinct vulnerabilities.

The Result

The findings that land in your dashboard are distinct, high-confidence vulnerabilities, not noise.

Conservative-by-default behavior: when the deduplicator is uncertain, it surfaces both findings rather than silently merging them. You get a dashboard with fewer duplicates, and zero quietly suppressed bugs.

See it prove what's exploitable.

Watch the pipeline run end-to-end against a real codebase, on a call with our engineering team. Keygraph runs self-hosted in your VPC, with your own LLM API keys and read-only repository access by default.

Schedule a Technical Demo → Pentest with Shannon →