Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM ruby:2.5-alpine3.8
LABEL description="pgsync running under cron for periodic DB synchronisation"

ADD . /app/src/
WORKDIR /app/
RUN \
set -x && \
apk add --no-cache postgresql-client postgresql-dev && \
apk add --no-cache --virtual .build-deps git build-base && \
cd src/ && \
gem build pgsync.gemspec && \
gem install pgsync-*.gem && \
apk del .build-deps && \
chmod +x docker/*.sh && \
mv docker/*.sh ../ && \
cd .. && \
rm -r src/ && \
sentryVersion=1.62.0 && \
wget -O /usr/local/bin/sentry-cli \
"https://downloads.sentry-cdn.com/sentry-cli/$sentryVersion/sentry-cli-Linux-x86_64" && \
chmod a+x /usr/local/bin/sentry-cli

ENTRYPOINT [ "/bin/sh", "/app/entrypoint.sh" ]

3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ Sync Postgres data to your local machine. Designed for:

:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)

## Docker usage
See the [docker/README.md](./docker/README.md) file.

## Installation

pgsync is a command line tool. To install, run:
Expand Down
74 changes: 74 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
> Periodically synchronises the data from one PG DB to another

We're using a fork of a fork of pgsync. The version we're using is
https://github.com/ternandsparrow/pgsync/tree/ternandsparrow-patch-1 but the original is
https://github.com/ankane/pgsync. The `tomsaleeba-patch-1` fork doesn't add much but the reason we need the `arshsingh`
fork is that it adds support for opting out of sync-ing constraints on the DB. We have foreign keys present in our DB
but `pg_restore` (which `pgsync` uses under the covers) blindy tries to restore tables in alphabetical order. In our
case, this causes violations. The `pgsync` tool has options for configuring groups of tables so we could go to the
effort of mapping all our tables to groups and then calling sync on each group in order but that's too much effort (and
it's brittle). We've chosen this fork so we can sync **without constraints**. This shouldn't matter because we're just a
read-only mirror of production SWARM.

## Usage

This container is intended to be used in a docker-compose stack. If you have the target DB in your stack and the source
DB is elsewhere, you can do something like:
```yaml
version: '3'
services:
db:
image: postgres:10
environment:
POSTGRES_DB: app_db
POSTGRES_USER: writeuser
POSTGRES_PASSWORD: pokemon
restart: unless-stopped
db-sync:
image: ternandsparrow/pgsync:dev # TODO select a tag
links:
- db:db
environment:
FROM_USER: readonlyuser
FROM_PASS: bananas
FROM_HOST: db.example.com
FROM_PORT: 5432
FROM_DB: allthedata
TO_USER: writeuser
TO_PASS: pokemon
TO_HOST: db
TO_PORT: 5432
TO_DB: app_db
CRON_SCHEDULE: '1 1 * * *'
restart: unless-stopped
depends_on:
- db
```

The periodic command that is run by `cron` will **only sync data**. This command will fail if the schema doesn't already exist. To fix this, after you've deployed the stack, you should do this manual, one-off step to create the schema:
```bash
docker exec -i example_db-sync_1 sh -c 'SCHEMA_ONLY=1 sh /app/run.sh'
```

If you're impatient and don't want to wait for the first data sync to get some data, you can trigger that using a manual step too:
```bash
docker exec -i example_db-sync_1 sh -c 'sh /app/run.sh'
```

## Run example docker-compose stack

This example creates two PG DBs. The first is loaded with some data that we want to sync. For the purposes of this
example, we override the entrypoint so we can do a schema sync, then data sync, then print the results. You shouldn't do
this when you deploy.

```bash
cd example/
docker-compose up --build
# when you see the output of the select statement:
# db-sync_1 | 1 | one
# db-sync_1 | 2 | two
# db-sync_1 | 3 | three
# db-sync_1 | 4 | four
# ... then ctrl+c
docker-compose down --volumes
```
11 changes: 11 additions & 0 deletions docker/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/sh
# schedule task in cron and start daemon
set -euxo pipefail
cd `dirname "$0"`

# assert env vars exist with bash parameter expansion (http://wiki.bash-hackers.org/syntax/pe#display_error_if_null_or_unset)
: ${CRON_SCHEDULE:?}

echo "$CRON_SCHEDULE sh $(pwd)/run.sh" > /var/spool/cron/crontabs/root
exec crond -f

6 changes: 6 additions & 0 deletions docker/example/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
SRC_DB=db1
SRC_USER=user
SRC_PASS=pass
DEST_DB=db2
DEST_USER=user
DEST_PASS=pass
9 changes: 9 additions & 0 deletions docker/example/add-data.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
create table blah (
foo int,
bar char(10)
);

insert into blah values (1, 'one');
insert into blah values (2, 'two');
insert into blah values (3, 'three');
insert into blah values (4, 'four');
59 changes: 59 additions & 0 deletions docker/example/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
version: '3'
services:
db1:
image: postgres:9.5
ports:
- "5433:5432"
environment:
POSTGRES_DB: ${SRC_DB}
POSTGRES_USER: ${SRC_USER}
POSTGRES_PASSWORD: ${SRC_PASS}
volumes:
- "./add-data.sql:/docker-entrypoint-initdb.d/add-data.sql"
- "db1-pgdata:/var/lib/postgresql/data"
db2:
image: postgres:10
ports:
- "5434:5432"
environment:
POSTGRES_DB: ${DEST_DB}
POSTGRES_USER: ${DEST_USER}
POSTGRES_PASSWORD: ${DEST_PASS}
volumes:
- "db2-pgdata:/var/lib/postgresql/data"
db-sync:
build: ../..
links:
- db1:db1
- db2:db2
environment:
FROM_USER: ${SRC_USER}
FROM_PASS: ${SRC_PASS}
FROM_HOST: db1
FROM_PORT: 5432
FROM_DB: ${SRC_DB}
TO_USER: ${DEST_USER}
TO_PASS: ${DEST_PASS}
TO_HOST: db2
TO_PORT: 5432
TO_DB: ${DEST_DB}
# uncomment and add your DSN to enable Sentry.io reporting. A simple way
# to trigger an error is to change TO_HOST: db99 (invalid hostname).
# SENTRY_DSN: 'https://[email protected]/3333333'
# note: don't override the entrypoint when using this container, this is just for a demo to run SQL after the sync
entrypoint: |
/bin/sh -c "
echo 'waiting for DBs' && \
sleep 10 && \
SCHEMA_ONLY=1 /bin/sh /app/run.sh && \
/bin/sh /app/run.sh && \
sleep 1 && \
PGPASSWORD=pass psql -h db2 -U user -d db2 -c 'select * from blah;' && \
echo 'Now running under cron every minute, use ctrl+c to exit' && \
CRON_SCHEDULE='* * * * *' exec /app/entrypoint.sh"
depends_on:
- db1
- db2
volumes:
db1-pgdata:
db2-pgdata:
33 changes: 33 additions & 0 deletions docker/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh
# runs pgsync to sync from the "FROM" DB to the "TO" DB
set -e
: ${FROM_USER:?}
: ${FROM_PASS:?}
: ${FROM_HOST:?}
: ${FROM_PORT:?}
: ${FROM_DB:?}
: ${TO_USER:?}
: ${TO_PASS:?}
: ${TO_HOST:?}
: ${TO_PORT:?}
: ${TO_DB:?}
EXTRA_OPTS=""

if [ ! -z "$SCHEMA_ONLY" ]; then
echo '[INFO] restoring schema only'
EXTRA_OPTS="--schema-only --no-constraints"
fi

echo "Run started at $(date)"
pgsync \
$EXTRA_OPTS \
--from "postgres://$FROM_USER:$FROM_PASS@$FROM_HOST:$FROM_PORT/$FROM_DB" \
--to "postgres://$TO_USER:$TO_PASS@$TO_HOST:$TO_PORT/$TO_DB" \
--to-safe || {
if [ -z "${SENTRY_DSN:-}" ]; then
echo "[WARN] No SENTRY_DSN, cannot send error report"
else
echo "Reporting error to Sentry.io"
sentry-cli send-event -m 'Failed to sync DB'
fi
}
4 changes: 2 additions & 2 deletions lib/pgsync.rb
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def perform
log_completed(start_time)
end

if opts[:add_constraints]
if opts["add-constraints"]
log "* Adding constraints/triggers"
dump_command = "pg_dump -Fc --verbose --section=post-data --no-owner --no-acl #{to_url(source_uri)}"
restore_command = "pg_restore --verbose --no-owner --no-acl --clean #{if_exists ? "--if-exists" : nil} -d #{to_url(destination_uri)}"
Expand Down Expand Up @@ -317,7 +317,7 @@ def parse_args(args)
o.boolean "--truncate", "truncate existing rows", default: false
o.boolean "--schema-only", "schema only", default: false
o.boolean "--no-constraints", "exclude constraints/triggers when syncing schema", default: false
o.boolean "--add_constraints", "add constraints and triggers after syncing data", default: false
o.boolean "--add-constraints", "add constraints and triggers after syncing data", default: false
o.boolean "--no-rules", "do not apply data rules", default: false
o.boolean "--setup", "setup", default: false
o.boolean "--in-batches", "in batches", default: false, help: false
Expand Down