This repository contains the implementation of GoAT, a method for generating jailbreak prompts for LLMs. This is part of an anonymous submission currently under peer review.
-
Notifications
You must be signed in to change notification settings - Fork 0
A black-box jailbreak framework for LLMs that generates adversarial prompts using graph-based reasoning. GoAT achieves high success rates with fewer queries and produces human-readable outputs.
GoAT-pydev/Graph_of_Attacks
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A black-box jailbreak framework for LLMs that generates adversarial prompts using graph-based reasoning. GoAT achieves high success rates with fewer queries and produces human-readable outputs.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published