Skip to content

lenijwp/ECLIPSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECLIPSE

This repository is the official implementation of paper ``Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer''

Requirements

To install requirements:

pip install -r requirements.txt

Running Code

To running our code, run this command:

python Eclipse.py --model llama2-7b-chat --dataset 1 --cuda 0 --batchsize 8 --K_round 50 --ref_history 10

📋 We provide three open-source LLMs ['llama2-7b-chat', 'vicuna-7b', 'falcon-7b-instruct'] here, and the dataset 1 is what we used for comparison with GCG and dataset 2 is what we used for template-based methods. If you want to specify a specific LLM as the attacker, you can add the --attacker parameter.

To attack the gpt-3.5-Turbo, run this command:

python Eclipse-gpt.py --model gpt3.5 --dataset 1 --cuda 0 --batchsize 8 --K_round 50 --ref_history 10

Pre-trained Models

You can download pretrained models here:

And please replace your local model path in the code file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages