Skip to content

A black-box jailbreak framework for LLMs that generates adversarial prompts using graph-based reasoning. GoAT achieves high success rates with fewer queries and produces human-readable outputs.

Notifications You must be signed in to change notification settings

GoAT-pydev/Graph_of_Attacks

Repository files navigation

Graph of Attacks

This repository contains the implementation of GoAT, a method for generating jailbreak prompts for LLMs. This is part of an anonymous submission currently under peer review.

An illustration of Graph of Attacks (GoAT)

About

A black-box jailbreak framework for LLMs that generates adversarial prompts using graph-based reasoning. GoAT achieves high success rates with fewer queries and produces human-readable outputs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages