Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

This repository contains the official implementation of "Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders", accepted at ACL 2025.

Installation

git clone https://github.com/username/multilingual-llm-features.git
cd multilingual-llm-features
pip install -r requirements.txt

Additional Setup for Llama-3.1-8B

If you plan to work with the Llama-3.1-8B SAE, you'll need to install additional dependencies following the guidelines from OpenMOSS/Language-Model-SAEs:

Downloading Pre-trained SAEs

To download the pre-trained Sparse Autoencoders (SAEs), simply run:

python download.py

This will automatically download the following SAE models in the SAE directory:

Llama3_1-8B-Base-LXR-8x
gemma-scope-2b-pt-res
gemma-scope-9b-pt-res

The directory structure after downloading will be:

SAE/
├── Llama3_1-8B-Base-LXR-8x/
├── gemma-scope-2b-pt-res/
└── gemma-scope-9b-pt-res/

Note: Make sure you have sufficient disk space and a stable internet connection before running the download script.

If you encounter network issues, you can uncomment the alternative download URLs in download.py to use mirror sites for downloading the models.

Usage

1. Finding Language-Specific Features

To identify language-specific features in LLMs, follow these steps (an example for gemma-2-2b):

# Step 1: Compute latent representations of language features
python latent_computation.py --model_name "gemma-2-2b" --model_path YOUR_MODEL_PATH 
# For additional arguments, please refer to utils.py

# Step 2: Analyze and extract language-specific features
python latent_analysis.py --model_name "gemma-2-2b" --model_path YOUR_MODEL_PATH

The results will be saved in the sae_acts directory with the following structure:

sae_acts/
└── gemma-2b/
    └── layer_3/
        ├── top_index_per_lan_magnitude.pth  # Language-specific features ranked by magnitude
        ├── sae_acts.pth                     # Intermediate results
        └── ...

Note: top_index_per_lan_magnitude.pth has dimensions of 10 × feature_num, where 10 represents different languages in the following order: ['en', 'es', 'fr', 'ja', 'ko', 'pt', 'th', 'vi', 'zh', 'ar'] (English, Spanish, French, Japanese, Korean, Portuguese, Thai, Vietnamese, Chinese, Arabic).

2. Reproducing Code-Switching Analysis (Section 4)

To reproduce the code-switching analysis results:

Execute code_switch_analysis() function in inference.py
Run code_switch_analysis2() function in inference.py

The visualization results will be automatically saved in the plot directory.

3. Feature Ablation Analysis (Section 5)

To reproduce the analysis of language-specific feature ablation:

Run change_activation_print_ce_corpus_gen() function in inference.py with the following parameter combinations:
- --start_idx 0 --topk_feature_num 1
- --start_idx 0 --topk_feature_num 2
- --start_idx 1 --topk_feature_num 1
Generate visualizations by running:
- change_activation_print_ce_corpus_different_same_feature_diff_lan_all()
- change_activation_print_ce_corpus() functions in plot.py

All visualization outputs will be stored in the plot directory for further analysis.

Citation

If you find this work helpful, please cite our paper:

@article{deng2025unveiling,
  title={Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders},
  author={Deng, Boyi and Wan, Yu and Zhang, Yidan and Yang, Baosong and Feng, Fuli},
  journal={arXiv preprint arXiv:2505.05111},
  year={2025}
}

Contact

For any questions or feedback, feel free to reach out to Boyi Deng at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
SAE		SAE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Installation

Additional Setup for Llama-3.1-8B

Downloading Pre-trained SAEs

Usage

1. Finding Language-Specific Features

2. Reproducing Code-Switching Analysis (Section 4)

3. Feature Ablation Analysis (Section 5)

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Aatrox103/multilingual-llm-features

Folders and files

Latest commit

History

Repository files navigation

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders

Installation

Additional Setup for Llama-3.1-8B

Downloading Pre-trained SAEs

Usage

1. Finding Language-Specific Features

2. Reproducing Code-Switching Analysis (Section 4)

3. Feature Ablation Analysis (Section 5)

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages