Content
🚀 News • ✏️ Todo • ✨ Introduction
👀 Examples • 🌳 Tree Overview • 📖 Paper List • 🎨 Visualisations
Links
- [2025.01.22] This paper was accepted by NAACL 2025 Main!
- [2024.10.16] The paper was uploaded to Arxiv.
A survey on prompt compression methods with insights and visualisations.
- Methods Overview: An overview of prompt compression methods, categorized into hard prompt methods and soft prompt methods.
- Insights: Various ways to understand method mechanisms.
- Visualisations: Illustrations for various prompt compression methods.
Illustrative examples of prompt compression methods. Hard prompt methods remove low-information tokens or paraphrase for conciseness. Soft prompt methods compress text into a smaller number of special tokens,
Hierarchical overview of prompt compression methods and their downstream adaptions. For downstream adaptations, compression methods not belonging to specific categories can be classified into general QA.
-
Filtering:
-
General:
- [SelectiveContext] Compressing Context to Enhance Inference Efficiency of Large Language Models
- [LLMLingua] Compressing Prompts for Accelerated Inference of Large Language Models
- [LongLLMLingua] Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
- [AdaComp] Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
-
Distillation Enhanced:
- [LLMLingua-2] Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
-
RL Enhanced:
-
Embedding Enhanced:
-
-
Paraphrasing:
-
(No sub category)
- [Nano-Capsulator] Learning to Compress Prompt in Natural Language Formats
- [CompAct] Compressing Retrieved Documents Actively for Question Answering
- [FAVICOMP] Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation
-
-
Decoder Only:
-
Not Finetuned:
- [CC] Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
-
Finetuned:
- [GIST] Learning to Compress Prompts with Gist Tokens
- [AutoCompressor] Adapting Language Models to Compress Contexts
-
-
Encoder-decoder:
-
Both Finetuned:
-
Finetuned Encoder:
- [ICAE] In-context Autoencoder for Context Compression in a Large Language Model
- [500xCompressor] Generalized Prompt Compression for Large Language Models
- [QGC] Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
-
Embedding Encoder:
- [xRAG] Extreme Context Compression for Retrieval-augmented Generation with One Token
-
Projector:
- [UniICL] Unifying Demonstration Selection and Compression for In-Context Learning
-
-
RAG:
-
(No sub category)
- [xRAG] Extreme Context Compression for Retrieval-augmented Generation with One Token
- [RECOMP] Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation
- [COCOM] Context Embeddings for Efficient Answer Generation in RAG
- [CompAct] Compressing Retrieved Documents Actively for Question Answering
- [FAVICOMP] Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation
- [AdaComp] Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
- [LLoCO] Learning Long Contexts Offline
- [TCRA-LLM] Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction
-
-
Agents:
-
Domain-specific tasks:
-
Others:
-
(No sub category)
- [ICL] Unifying Demonstration Selection and Compression for In-Context Learning
- [Role Playing] Extensible Prompts for Language Models on Zero-shot Language Style Customization
- [Functions] Function Vectors in Large Language Models
-
Architectures for various prompt compression models by hard prompt methods. For SelectiveContext and LLMLingua, the bottom language models filter the prompt tokens without modifying them, serving as selection mechanisms. In Nano-Capsulator, the bottom LLM generates a paraphrased version of the input prompt which then serves as input for the LLM above. "SLM" means "small language model". "Close LLM" refers to closed-source language models that only accept natural language inputs through API calls.
Architectures for various prompt compression models by soft prompt methods. Tokens with diagonal stripes represent the output tokens processed by the language models. Different from hard prompt methods, the bottom LLMs in soft prompt methods process the input tokens, and their outputs (tokens with diagonal stripes) serve as input for the LLMs above.
@inproceedings{li-etal-2025-prompt,
title = "Prompt Compression for Large Language Models: A Survey",
author = "Li, Zongqian and
Liu, Yinhong and
Su, Yixuan and
Collier, Nigel",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.368/",
pages = "7182--7195",
ISBN = "979-8-89176-189-6",
}