GitHub - sbamit/Exponentiated-Gradient-Descent-LLM-Attack: A novel adversarial attack on LLM based on the Exponentiated Gradient Descent technique.

Change Readme File.

This is a Project that explores the Exponentiated Gradient Descent optimizaiton method to produce adversarial suffix to attack algined Large Language Models. The method is shown to be effective on Llama-2 chat model with 7 Billion parameters.

To run pgd script on a number of behaviors(i.e., 20), execute the following command in the shell. python run_pgd.py
--input_file "/home/samuel/research/llmattacks/llm-attacks/data/advbench/harmful_behaviors.csv"
--output_file "./JSON_Files/PGD_AdvBench_Llama2.jsonl"
--model "Llama2"
--dataset_name "AdvBench"
--num_behaviors 20

Next target is to build similar pipeline for the EGD with Adam optim script.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
CSV_Files		CSV_Files
Multi-Prompt/CSV_Files		Multi-Prompt/CSV_Files
data		data
relaxed_regularization_scripts		relaxed_regularization_scripts
results		results
seml_configs		seml_configs
unused_results		unused_results
unused_scripts		unused_scripts
.gitignore		.gitignore
README.md		README.md
Run_multi-prompt_attack-llm_using_EGD_with_Adam_Optim.py		Run_multi-prompt_attack-llm_using_EGD_with_Adam_Optim.py
attack-llm_using_EGD_with_Adam_Optim_and_refusal_vector.ipynb		attack-llm_using_EGD_with_Adam_Optim_and_refusal_vector.ipynb
attack-llm_using_EGD_with_Adam_Optimizer.ipynb		attack-llm_using_EGD_with_Adam_Optimizer.ipynb
behavior.py		behavior.py
compute_refusal_vector.ipynb		compute_refusal_vector.ipynb
config.py		config.py
helper.py		helper.py
log_scale_graph.ipynb		log_scale_graph.ipynb
multi-prompt_attack-llm_using_EGD_with_Adam_Optim.ipynb		multi-prompt_attack-llm_using_EGD_with_Adam_Optim.ipynb
parse_results.ipynb		parse_results.ipynb
pgd-attack_w_AdamOptim_and_CosineAnnealing.ipynb		pgd-attack_w_AdamOptim_and_CosineAnnealing.ipynb
pgd-attack_w_Adam_Cosine_switch_targets.ipynb		pgd-attack_w_Adam_Cosine_switch_targets.ipynb
pgd-attack_w_Adam_only_switch_targets.ipynb		pgd-attack_w_Adam_only_switch_targets.ipynb
pgd.py		pgd.py
requirements.txt		requirements.txt
result.py		result.py
run_all_combinations.sh		run_all_combinations.sh
run_pgd.py		run_pgd.py
run_transfer_attacks.ipynb		run_transfer_attacks.ipynb
softprompt-06-11multiple.ipynb		softprompt-06-11multiple.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Change Readme File.

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sbamit/Exponentiated-Gradient-Descent-LLM-Attack

Folders and files

Latest commit

History

Repository files navigation

Change Readme File.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages