🦉🫥 PDF Anonymizer

This application anonymizes large PDF, Markdown or Text files using LLMs.

High-Quality Anonymization: Leverages LLMs to identify and replace Personally Identifiable Information (PII) with high accuracy.
Large File Support: Consistently anonymizes large files (tested up to 1GB).
Multi-Provider & Cost-Effective: Free to use with local Ollama models. It also supports major providers like OpenAI, Anthropic, Google, Hugging Face, and OpenRouter.
Reversible: Supports deanonymization to recover original data when needed.
Multi-Format: Works with PDF, Markdown, and plain text files.

Project Structure

This project is a monorepo containing two main packages:

packages/pdf-anonymizer-core: The core library containing the anonymization and deanonymization logic. See the core README for more details.
packages/pdf-anonymizer-cli: A command-line interface for using the anonymizer. See the CLI README for detailed usage instructions.

Development Installation

Install uv: This project uses uv for package management. Follow the official installation instructions.

Clone the repository:

git clone <repository_url>
cd anonymizer

Install dependencies:
```
uv sync --group dev
```
Install Ollama (optional): If you want to use a local model for anonymization, install Ollama.

Set up environment variables: Create a .env file in the packages/pdf-anonymizer-cli directory and add the necessary API keys for the providers you want to use. For example:

# For Google models
GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"

# For OpenAI models
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

# For Anthropic models
ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"

# For Hugging Face models
HUGGING_FACE_TOKEN="YOUR_HF_TOKEN"

# For OpenRouter models
OPENROUTER_API_KEY="YOUR_OPENROUTER_KEY"

Quick Start

To anonymize a file, use the pdf-anonymizer command:

pdf-anonymizer run document.pdf

For detailed command-line options and examples, please refer to the CLI README.

Testing

To run the test suite:

uv run pytest

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
packages		packages
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦉🫥 PDF Anonymizer

Project Structure

Development Installation

Quick Start

Testing

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

leo-gan/anonymizer

Folders and files

Latest commit

History

Repository files navigation

🦉🫥 PDF Anonymizer

Project Structure

Development Installation

Quick Start

Testing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages