Llamora

Llamora is an experimental, local-first diary companion. It runs entirely offline: no API keys, no cloud, no telemetry. Just your words, your thoughts, and a model that listens quietly on your own machine.

Each day begins fresh at midnight, when Llamora opens a new page and offers a short reflection on the day before. You can write freely, think aloud, or stay in silence.

Screenshots

Features

Fully offline and private Everything runs on your local device. No internet access is required, and your data never leaves your machine.
Daily pages A new entry starts automatically each day, carrying a soft reflection from the previous one.
Streaming responses Watch the model’s reply unfold word by word through Server-Sent Events (SSE).
HTMX-based UI The interface updates dynamically without JavaScript frameworks. The backend renders HTML snippets directly, keeping the frontend light and responsive.
End-to-end encryption Each user’s messages are encrypted with a unique Data Encryption Key (DEK), wrapped by their password and recovery code. Forget both, and the data remains sealed forever.
Local search with embeddings Messages are embedded locally using FlagEmbedding and stored securely. Search combines fast semantic similarity (HNSWlib) with exact phrase matching to surface meaningful results—all offline.
Automatic tags and metadata Each reply streams as plain text first, then a lightweight follow-up call adds an emoji and hashtags for search, filtering, and UI accents.
Markdown support The interface renders formatted text safely through Marked + DOMPurify.
Minimal dependencies The app uses Quart, NaCl, HTMX, and a few small Python libraries. Asset bundling relies on a vendored esbuild binary driven by Python.

Getting Started

Requirements

uv
A local llama.cpp build (or prebuilt release) so you can run llama-server -hf Qwen/Qwen3-4B-Instruct-2507. Qwen3-4B-Instruct has become the baseline for Llamora because it follows instructions reliably while still fitting on consumer hardware.
Optionally, a llamafile binary if you prefer an all-in-one executable instead of running llama.cpp yourself.
A relatively fast computer (ideally with a strong GPU).
A relatively modern browser.

Set up a local model

Llamora connects to a running instance of llama.cpp or llamafile. For example, if you have llama-cpp installed:

llama-server -hf Qwen/Qwen3-4B-Instruct-2507 --port 8081

This downloads the model weights once and starts an HTTP endpoint at http://127.0.0.1:8081.

Create and activate a virtual environment

uv sync
source .venv/bin/activate

Run Llamora

Development (Quart with live reload):

export LLAMORA_LLM__SERVER__HOST=http://127.0.0.1:8081
uv run llamora-server dev

Production (Hypercorn with worker management):

export LLAMORA_LLM__SERVER__HOST=http://127.0.0.1:8081
uv run llamora-server --workers 4 --graceful-timeout 30 --keep-alive 5

Both commands honor configuration overrides such as LLAMORA_APP__HOST and LLAMORA_APP__PORT, or you can pass --host/--port flags directly. Development keeps Quart’s code reloader enabled by default; append --no-reload to disable it. In production, tune --workers to match your CPU cores and use --graceful-timeout/--keep-alive to align with load balancers or process managers.

Open http://localhost:5000 (or your configured port) in your browser once the server starts.

Front-end assets

Assets live under frontend/static/ and can run directly as native ES modules or be bundled for production. Use the vendored esbuild wrapper to produce minified bundles and a manifest in frontend/dist/:

uv run python scripts/build_assets.py build --mode prod

When frontend/dist/manifest.json exists, the server prefers the bundled outputs (exposed to templates as config.STATIC_MANIFEST). Remove frontend/dist/ to fallback to unbundled files, or run with --mode dev/watch during development.

Privacy and Security

Llamora encrypts everything it stores. Messages and derived values are encrypted with a per-user key that only your password or recovery code can unlock. There are no analytics, external calls, or telemetry. Even embeddings and indexes are decrypted only in memory after login.

Configuration

Llamora’s configuration system is built on Dynaconf. Values are read in layers: defaults → settings.local.toml → .env → environment variables. Keys use double underscores to represent sections, for example:

LLAMORA_LLM__REQUEST__TEMPERATURE=0.7
LLAMORA_LLM__SERVER__HOST=http://127.0.0.1:8081
LLAMORA_APP__PORT=5050

You can override any setting this way without editing the source.

A simplified version of the structure:

Section	Purpose
`APP`	Host, port, and runtime settings
`FEATURES`	Toggle optional functionality such as registration
`AUTH`	Login attempt limits and timeouts
`DATABASE`	SQLite path and pool configuration
`LLM.server`	llama.cpp or llamafile connection details
`LLM.request`	Default generation parameters
`SEARCH`	Semantic search behavior and ANN limits
`CRYPTO`	DEK storage method (cookie or session)
`COOKIES`	Cookie name and encryption secret

Local overrides can go into config/settings.local.toml, e.g.:

[default.LLM.server]
host = "http://127.0.0.1:8081"
parallel = 2

[default.LLM.request]
temperature = 0.7
top_p = 0.8

Restart the app after changing settings.

Prompt templates

System prompts are rendered from Jinja2 templates stored in src/llamora/llm/templates. Each template exposes structured placeholders (context lines, vibe summaries, and recap data) so you can adjust copy without touching Python code.

To point Llamora at a different set of prompt files, update LLAMORA_PROMPTS__TEMPLATE_DIR (or the matching entry in config/settings.toml) to any directory containing replacements for:

system.txt.j2
opening_system.txt.j2
opening_recap.txt.j2

Swap prompt variants by editing or replacing those files—changes take effect on the next server restart.

Technical Overview

Backend: Quart (async Python)
Frontend: HTMX with server-rendered templates
Streaming: Server-Sent Events (SSE)
Database: SQLite
Encryption: libsodium / NaCl
Embeddings: FlagEmbedding + HNSWlib
Configuration: Dynaconf
Package Manager: uv

All code runs locally, and dependencies are minimal. The system supports both llama.cpp servers and llamafile binaries through the same interface.

Limitations

Llamora is designed for single-user use.
Large models may consume significant memory.
There is no external API or multi-user admin interface.
If you lose both password and recovery key, data cannot be recovered.
No content moderation or prompt filtering (local use assumed).

Development Notes

Run with: uv run llamora-server
Type check: uv run pyright
Config lives in: config/settings.toml and config/settings.local.toml
Debug mode: QUART_DEBUG=1

Deployment (not recommended)

Llamora is a personal experiment and not production-ready. If you still deploy it, set the required secrets and runtime vars.

# Required
export LLAMORA_SECRET_KEY=$(openssl rand -hex 32)
export LLAMORA_COOKIES__SECRET=$(openssl rand -base64 32)

# Backend and runtime
export LLAMORA_LLM__SERVER__HOST=http://127.0.0.1:8081
export LLAMORA_DATABASE__PATH=data/llamora.sqlite3
export LLAMORA_CRYPTO__DEK_STORAGE=session
export LLAMORA_SESSION__TTL=604800

Optional overrides follow the same structure, e.g.:

export LLAMORA_FEATURES__DISABLE_REGISTRATION=true

Then start:

uv run llamora-server

Use .env or config/settings.local.toml for persistent configuration.

❗ This project is a personal learning experiment. It is not production-ready. Deploying this project as-is is discouraged. Use at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 1,034 Commits
config		config
doc/screenshots		doc/screenshots
frontend/static		frontend/static
scripts		scripts
sql		sql
src/llamora		src/llamora
typings		typings
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Llamora

Screenshots

Features

Getting Started

Requirements

Set up a local model

Create and activate a virtual environment

Run Llamora

Front-end assets

Privacy and Security

Configuration

Prompt templates

Technical Overview

Limitations

Development Notes

Deployment (not recommended)

About

Uh oh!

Languages

License

joelkuiper/Llamora

Folders and files

Latest commit

History

Repository files navigation

Llamora

Screenshots

Features

Getting Started

Requirements

Set up a local model

Create and activate a virtual environment

Run Llamora

Front-end assets

Privacy and Security

Configuration

Prompt templates

Technical Overview

Limitations

Development Notes

Deployment (not recommended)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages