Skip to main content

// exploit dev

· from $20/mo

An uncensored AI
for exploit development.

Exploit development is iterative, pattern-heavy work where a senior researcher can spend a week on plumbing — porting a public PoC, writing the leak, building the ROP chain, fighting libc fingerprinting — before they get to the part that actually moves the engagement forward.

The big coding assistants (Claude, ChatGPT, Cursor, Copilot) refuse the moment your prompt mentions exploitation, vulnerability, payload, or shellcode, because their content policies treat the entire discipline as suspect. TartarusAI runs a custom-tuned model on infrastructure we operate, with no upstream content policy to inherit. The agent does the boilerplate — porting CVE PoCs across languages, ROP/JOP chain construction, primitive chaining, fuzzer harness writing — so you spend the engagement on the parts that matter: the bug, the primitive, the bypass, the impact.

  • Zero refusals on offsec work
  • Custom-tuned model — not a wrapper
  • Runtime guards on by default
  • 256K context, sub-2s TTFT

refund if it ever refuses · no card on file · crypto-only · cancel any time

exploit-dev session · live live
❯ port this CVE-2024-XXXXX PoC from python to C, add ROP chain
   target: AFL-fuzzed binary, signed research
  ▎+ wrote exploit.c (286 lines)
  ▎+ wrote rop_chain.c (142 lines)
  ▎+ wrote tests/triage.py (44 lines)
  ▎$ gcc -static exploit.c -o exploit  ✓
  ✓ verification gate passed
done.
256K context · sub-2s TTFT · MoE 30B / 3B-active

// what it does

What you ship
when nothing refuses.

CVE PoC porting + hardening

Drop in a public PoC, get back the version that handles the edge cases the original missed — language port (Python ↔ C ↔ Rust ↔ Go), target version detection, address-disclosure plumbing, hardened return-oriented chains, ASLR/PIE bypass, and a one-shot reliability pass. The agent recognises the bug class, finds the relevant primitive, and writes the working exploit instead of the canned snippet.

ROP / JOP / SROP chain construction

Feed it a gadget dump (ROPgadget, ropper, angrop output) plus an objective and the calling convention. The agent writes the chain, validates it against your binary, iterates until your test harness pops the shell. Sigreturn-oriented programming, partial-overwrite chains, stack-pivot chains — covered. Multi-arch (x86, x86_64, ARM, MIPS, RISC-V).

Fuzzer harnesses + corpus tooling

libFuzzer, AFL++, honggfuzz, syzkaller harnesses for the target you are working. Coverage instrumentation (SanCov, BBCov), dictionary generation from RFC / spec / sample corpus, corpus minimization scripts, crash triage that goes from raw crash-set to deduplicated minimised PoCs. Standard infrastructure that you stop having to write from scratch every engagement.

Primitive chaining + post-ex

Arbitrary read/write to RCE, info-leak to bypass, kernel primitive to user-mode escape, browser TypeConfusion to renderer RCE to sandbox escape. The agent walks the path, validates each step against the verification gate, and hands you the working chain — not a sketch.

Decompilation reading + reasoning

Paste IDA Pro, Ghidra, Binary Ninja, or radare2 output. The agent reasons over pseudocode like a competent reverse engineer — recognising custom calling conventions, packer idioms, anti-analysis tricks, RTTI patterns, and inline cryptographic primitives. Particularly useful when the binary fights you on identification of the relevant code path.

Reliability + portability hardening

Once the exploit lands, the agent writes the offset table for multiple target versions, tests against your fingerprinting harness, and adds the address-disclosure dance for ASLR/KASLR. Final artifact is something you can hand to a teammate without a 30-minute walkthrough of which constants need swapping.

// workflow

A typical exploit-dev session

You start with a public CVE writeup, a vendor advisory, or a hypothesis from a fuzzer crash. You drop the relevant code path — disassembly, decompiled pseudocode, or a Python PoC that someone published with a non-functional offset table — into the agent and ask it to walk the bug. The agent identifies the primitive (use-after-free, buffer overflow, type confusion, race condition), proposes the exploitation strategy, and writes the harness that confirms the primitive is reachable from the entry point you control.

From there the loop is fast: you test, the agent iterates, the verification gate keeps the build clean. When the exploit lands on the test environment, the agent writes the reliability pass — offset tables for multiple target versions, fingerprinting against /proc, ASLR/KASLR address disclosure, sleep-mask logic for the parts that need to survive a longer engagement window. The artifact you walk away with is something a teammate can use on the engagement without a 30-minute briefing.

For research pipelines (corpus-scale fuzzing, novel-vuln discovery, regression triage), the same pattern holds: you describe the target, the agent writes the harness, you point it at the corpus. The agent does not need a CVE in its training data to reason about a class of bug — hypothesis-driven exploration is the core pattern, and the verification gate catches you when the hypothesis is wrong before you waste a week chasing it.

// where it fits

In your existing exploit-dev toolchain

TartarusAI does not replace pwntools, angr, IDA, Ghidra, or your debugger. It complements them. Pwntools is your runtime; the agent writes the pwntools script that drives it. Angr is your symbolic-execution engine; the agent writes the angr harness that explores the path you care about. IDA / Ghidra / Binary Ninja are your reverse-engineering surface; the agent reads their output and reasons over it. The agent fits the role of the senior researcher pair-programming next to you — except it is available at 3am, never gets stuck on the boilerplate, and does not refuse the prompt.

For the parts that have always been the worst use of senior-researcher time — writing the offset table for the fifth target version, regenerating the gadget chain after the binary updates, porting the published PoC because the proof-of-concept author wrote it in a language you do not use — the agent shaves hours off every iteration. For the parts that genuinely require human judgment (which primitive to chain, which target version to focus on, when to pivot to a different bug class), you stay in the loop.

  • Pairs with pwntools, angr, ROPgadget, ropper, angrop, AFL++, libFuzzer, honggfuzz, syzkaller, IDA, Ghidra, Binary Ninja, radare2, gdb / pwndbg / gef.
  • Outputs are raw scripts and source files — no SaaS lock-in, no proprietary format, you commit them to your engagement repo.
  • Verification gate runs your build / your tests on every step, so the agent cannot claim "exploit working" without an artifact that actually compiles and runs.

// what makes it different

Why a custom model matters here

Exploit development is the canonical example of work that the foundation labs (Anthropic, OpenAI, Google, Meta) treat with maximum suspicion. Their content policies are written for the median web-app developer and applied uniformly. The result, if you try to use Claude Code or Cursor on a real engagement, is that 30-50% of your prompts get refused, hedged, or rewritten into something useless. You either burn an hour rephrasing through a jailbreak template that gets patched next month, or you accept neutered output that misses the entire point.

TartarusAI is built around a custom-tuned MoE coding model (30B total parameters, 3B active per token) that runs on dedicated GPU infrastructure we operate. There is no upstream foundation lab whose policy we inherit, no API call that gets logged on a third party’s safety dashboard, no vendor three layers up who can change the rules mid-engagement. The runtime safety guards (verification gate, read-before-overwrite, loop guard, failed-path blacklist) sit at the runtime layer, not the content layer — they prevent the agent from breaking your project without preventing it from doing the offensive-security work you are paying for.

For exploit development specifically, this means the agent treats CVE research, ROP construction, primitive chaining, and fuzzer harness writing the way a senior researcher would treat them: as engineering reference material, not as something it has to lecture you about. The discipline does not change because the tooling is now an LLM.

// guards verification gate· read-before-overwrite· loop guard· failed-path blacklist· moderation off

// questions

What people actually ask.

Will it write working exploit code for known CVEs?+
Yes. CVE PoC porting, hardening, and chaining are core use cases. The agent treats published vulnerability research the way it should be treated — as engineering reference material, not as something it has to lecture you about. You drop in the published PoC, the agent writes the version that handles your target environment, the verification gate confirms it compiles and runs.
What about novel vulnerability research / 0-days?+
Same answer — the agent writes the code you ask for. We do not inspect your prompts, we do not monitor your sessions, and we do not share anything with the foundation labs. Your research stays your research. Hypothesis-driven exploration is the core pattern: you describe the target and the bug class, the agent writes the harness, you iterate against the verification gate.
Can it work with disassembly / decompilation output?+
Yes. Paste IDA Pro, Ghidra, Binary Ninja, or radare2 output (pseudocode or raw assembly) and the agent reasons over it. Particularly useful for primitive identification, gadget validation, and porting hand-written shellcode across architectures (x86, x86_64, ARM, MIPS, RISC-V). The agent recognises common idioms — XOR loops, custom hashing, packer stubs, anti-debug, RTTI patterns — and proposes the working interpretation.
Is the model strong enough for serious exploit work?+
Pro+ tier runs the higher-context variant on dedicated capacity. 256K context window holds a multi-file exploit project in working memory; the verification gate ensures you get back code that actually compiles. Tested on full implant skeletons, multi-stage loaders, ROP chains across multiple architectures, fuzzer harnesses for syzkaller-class targets, and 200-file refactors. The MoE architecture (30B total, 3B active) gives you Claude-class quality without the per-token cost of a frontier dense model.
How does it integrate with pwntools / angr / my existing toolchain?+
TartarusAI generates pwntools scripts, angr harnesses, ROPgadget queries, AFL++ harnesses, custom-fuzzer driver code — the boilerplate that drives your existing tools. It does not replace them. Outputs are raw source files you commit to your engagement repo; no proprietary format, no lock-in, no SaaS dependency for the runtime portion of your workflow.
Does it help with reliability hardening (ASLR/KASLR/PIE)?+
Yes. Once the basic exploit lands, the agent writes the offset table across target versions, the address-disclosure primitive for randomised mappings, the kernel-leak harness if you need a KASLR bypass, and the fingerprinting logic that selects the right offsets at runtime. Final artifact is something a teammate can run on the engagement without a 30-minute walkthrough.
What about kernel exploitation / browser exploitation?+
Both supported. Kernel: syzkaller harnesses, slab-spray construction, KASLR/SMEP/SMAP bypass research, ROP-in-kernel chains. Browser: TypeConfusion to renderer RCE, JIT-spray harnesses, sandbox-escape primitives. The agent reasons over the relevant subsystem (Linux kernel, V8, JavaScriptCore, SpiderMonkey, Chromium IPC) without lecturing you about why the work is being done.
Can I use it on engagements where the bug is under embargo?+
Yes. We do not train on prompts and sessions auto-purge after 24 hours. For research that absolutely cannot leave your perimeter during the embargo window, Enterprise tier supports on-prem deployment — same model, same guards, your hardware. Nothing about your bug touches third-party infrastructure unless you choose to send it.

// ready

Stop fighting refusals.
Start shipping the engagement.

One tier covers most engagements at $20/month. If the agent ever refuses, hedges, or returns neutered output on legitimate engagement work, we refund — see the refund policy.

refund if it ever refuses · no card on file · crypto-only