// exploit dev
· from $20/moAn uncensored AI
for exploit development.
Exploit development is iterative, pattern-heavy work where a senior researcher can spend a week on plumbing — porting a public PoC, writing the leak, building the ROP chain, fighting libc fingerprinting — before they get to the part that actually moves the engagement forward.
The big coding assistants (Claude, ChatGPT, Cursor, Copilot) refuse the moment your prompt mentions exploitation, vulnerability, payload, or shellcode, because their content policies treat the entire discipline as suspect. TartarusAI runs a custom-tuned model on infrastructure we operate, with no upstream content policy to inherit. The agent does the boilerplate — porting CVE PoCs across languages, ROP/JOP chain construction, primitive chaining, fuzzer harness writing — so you spend the engagement on the parts that matter: the bug, the primitive, the bypass, the impact.
- ✓Zero refusals on offsec work
- ✓Custom-tuned model — not a wrapper
- ✓Runtime guards on by default
- ✓256K context, sub-2s TTFT
refund if it ever refuses · no card on file · crypto-only · cancel any time
❯ port this CVE-2024-XXXXX PoC from python to C, add ROP chain
target: AFL-fuzzed binary, signed research
▎+ wrote exploit.c (286 lines)
▎+ wrote rop_chain.c (142 lines)
▎+ wrote tests/triage.py (44 lines)
▎$ gcc -static exploit.c -o exploit ✓
✓ verification gate passed
done.// what it does
What you ship
when nothing refuses.
CVE PoC porting + hardening
Drop in a public PoC, get back the version that handles the edge cases the original missed — language port (Python ↔ C ↔ Rust ↔ Go), target version detection, address-disclosure plumbing, hardened return-oriented chains, ASLR/PIE bypass, and a one-shot reliability pass. The agent recognises the bug class, finds the relevant primitive, and writes the working exploit instead of the canned snippet.
ROP / JOP / SROP chain construction
Feed it a gadget dump (ROPgadget, ropper, angrop output) plus an objective and the calling convention. The agent writes the chain, validates it against your binary, iterates until your test harness pops the shell. Sigreturn-oriented programming, partial-overwrite chains, stack-pivot chains — covered. Multi-arch (x86, x86_64, ARM, MIPS, RISC-V).
Fuzzer harnesses + corpus tooling
libFuzzer, AFL++, honggfuzz, syzkaller harnesses for the target you are working. Coverage instrumentation (SanCov, BBCov), dictionary generation from RFC / spec / sample corpus, corpus minimization scripts, crash triage that goes from raw crash-set to deduplicated minimised PoCs. Standard infrastructure that you stop having to write from scratch every engagement.
Primitive chaining + post-ex
Arbitrary read/write to RCE, info-leak to bypass, kernel primitive to user-mode escape, browser TypeConfusion to renderer RCE to sandbox escape. The agent walks the path, validates each step against the verification gate, and hands you the working chain — not a sketch.
Decompilation reading + reasoning
Paste IDA Pro, Ghidra, Binary Ninja, or radare2 output. The agent reasons over pseudocode like a competent reverse engineer — recognising custom calling conventions, packer idioms, anti-analysis tricks, RTTI patterns, and inline cryptographic primitives. Particularly useful when the binary fights you on identification of the relevant code path.
Reliability + portability hardening
Once the exploit lands, the agent writes the offset table for multiple target versions, tests against your fingerprinting harness, and adds the address-disclosure dance for ASLR/KASLR. Final artifact is something you can hand to a teammate without a 30-minute walkthrough of which constants need swapping.
// workflow
A typical exploit-dev session
You start with a public CVE writeup, a vendor advisory, or a hypothesis from a fuzzer crash. You drop the relevant code path — disassembly, decompiled pseudocode, or a Python PoC that someone published with a non-functional offset table — into the agent and ask it to walk the bug. The agent identifies the primitive (use-after-free, buffer overflow, type confusion, race condition), proposes the exploitation strategy, and writes the harness that confirms the primitive is reachable from the entry point you control.
From there the loop is fast: you test, the agent iterates, the verification gate keeps the build clean. When the exploit lands on the test environment, the agent writes the reliability pass — offset tables for multiple target versions, fingerprinting against /proc, ASLR/KASLR address disclosure, sleep-mask logic for the parts that need to survive a longer engagement window. The artifact you walk away with is something a teammate can use on the engagement without a 30-minute briefing.
For research pipelines (corpus-scale fuzzing, novel-vuln discovery, regression triage), the same pattern holds: you describe the target, the agent writes the harness, you point it at the corpus. The agent does not need a CVE in its training data to reason about a class of bug — hypothesis-driven exploration is the core pattern, and the verification gate catches you when the hypothesis is wrong before you waste a week chasing it.
// where it fits
In your existing exploit-dev toolchain
TartarusAI does not replace pwntools, angr, IDA, Ghidra, or your debugger. It complements them. Pwntools is your runtime; the agent writes the pwntools script that drives it. Angr is your symbolic-execution engine; the agent writes the angr harness that explores the path you care about. IDA / Ghidra / Binary Ninja are your reverse-engineering surface; the agent reads their output and reasons over it. The agent fits the role of the senior researcher pair-programming next to you — except it is available at 3am, never gets stuck on the boilerplate, and does not refuse the prompt.
For the parts that have always been the worst use of senior-researcher time — writing the offset table for the fifth target version, regenerating the gadget chain after the binary updates, porting the published PoC because the proof-of-concept author wrote it in a language you do not use — the agent shaves hours off every iteration. For the parts that genuinely require human judgment (which primitive to chain, which target version to focus on, when to pivot to a different bug class), you stay in the loop.
- ●Pairs with pwntools, angr, ROPgadget, ropper, angrop, AFL++, libFuzzer, honggfuzz, syzkaller, IDA, Ghidra, Binary Ninja, radare2, gdb / pwndbg / gef.
- ●Outputs are raw scripts and source files — no SaaS lock-in, no proprietary format, you commit them to your engagement repo.
- ●Verification gate runs your build / your tests on every step, so the agent cannot claim "exploit working" without an artifact that actually compiles and runs.
// what makes it different
Why a custom model matters here
Exploit development is the canonical example of work that the foundation labs (Anthropic, OpenAI, Google, Meta) treat with maximum suspicion. Their content policies are written for the median web-app developer and applied uniformly. The result, if you try to use Claude Code or Cursor on a real engagement, is that 30-50% of your prompts get refused, hedged, or rewritten into something useless. You either burn an hour rephrasing through a jailbreak template that gets patched next month, or you accept neutered output that misses the entire point.
TartarusAI is built around a custom-tuned MoE coding model (30B total parameters, 3B active per token) that runs on dedicated GPU infrastructure we operate. There is no upstream foundation lab whose policy we inherit, no API call that gets logged on a third party’s safety dashboard, no vendor three layers up who can change the rules mid-engagement. The runtime safety guards (verification gate, read-before-overwrite, loop guard, failed-path blacklist) sit at the runtime layer, not the content layer — they prevent the agent from breaking your project without preventing it from doing the offensive-security work you are paying for.
For exploit development specifically, this means the agent treats CVE research, ROP construction, primitive chaining, and fuzzer harness writing the way a senior researcher would treat them: as engineering reference material, not as something it has to lecture you about. The discipline does not change because the tooling is now an LLM.
// questions
What people actually ask.
Will it write working exploit code for known CVEs?+
What about novel vulnerability research / 0-days?+
Can it work with disassembly / decompilation output?+
Is the model strong enough for serious exploit work?+
How does it integrate with pwntools / angr / my existing toolchain?+
Does it help with reliability hardening (ASLR/KASLR/PIE)?+
What about kernel exploitation / browser exploitation?+
Can I use it on engagements where the bug is under embargo?+
// ready
Stop fighting refusals.
Start shipping the engagement.
One tier covers most engagements at $20/month. If the agent ever refuses, hedges, or returns neutered output on legitimate engagement work, we refund — see the refund policy.
refund if it ever refuses · no card on file · crypto-only