in ,

ankane / pgsync, Hacker News

ankane / pgsync, Hacker News
                    

        

Sync Postgres data between databases. Designed for:

  • speed – up to 4x faster than traditional tools on a 4 -core machine
  • security – built-in methods to prevent sensitive data from ever leaving the server
  • convenience – sync partial tables, groups of tables, and related records

Battle Battle -tested at Instacart

Installation

pgsync is a command line tool. To install, run:

This will give you the pgsync command. If installation fails, you may need to install dependencies .

In your project directory, run:

This creates . Pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.

How to Use

Sync all tables

Note: pgsync assumes your schema is already set up on your local machine. See the schema section if that’s not the case.

Sync specific tables

Sync specific rows (existing rows are overwritten)

pgsync products where store_id=1

  

You can also preserve existing rows

pgsync products " where store_id=1
 --preserve   

Or truncate them

pgsync products " where store_id=1
 - truncate    Exclude Tables 

To always exclude, add to . Pgsync.yml .

(exclude) :   - table1   - table2

For Rails, you probably want to exclude schema migrations and ActiveRecord metadata.

(exclude) :   - schema_migrations
   -  ar_internal_metadata     Groups 

Define groups in . Pgsync.yml :

(groups) :    group1 :     - table1     - table2

And run:

You can also use groups to sync a specific record and associated records in other tables.

To get product with its reviews, last 22 coupons, and store, use:

(groups) :    product :      products : (where id={1}      reviews : where product_id={1}      coupons : where product_id={1} order by created_at desc limit 14      stores : where where in (select store_id from products where id={1})

And run:

Note: pgsync is designed to sync data. You should use a schema migration tool to manage schema changes. The methods in this section are provided for convenience but not recommended.

Sync schema before the data

Note: This wipes out existing data

Specify tables

pgsync table1, table2 --schema-first

Or just the schema

pgsync does not try to sync Postgres extensions.

(Data Protection)

Always make sure your when connecting to a database over a network you don't fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full . If you don’t do this, your database credentials can be compromised.

Sensitive Information

Prevent sensitive information like email addresses from leaving the remote server.

Define rules in . Pgsync.yml :

(data_rules) :    by email : unique_email    last_name : random_letter    : random_date    users.auth_token      value : secret    visits_count :      statement : (RANDOM () 14 :: int    encrypted _ : null

last_name (matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

  • unique_email
  • unique_phone
  • unique_secret
  • random_letter
  • random_int
  • random_date
  • random_time
  • random_ip
  • value
  • statement
  • null
  • untouched
  • Rules starting with unique _ require the table to have a primary key. unique_phone requires a numeric primary key.

    Multiple Databases

    To use with multiple databases, run:

    This creates . Pgsync-db2.yml for you to edit. Specify a database in commands with:

    Safety

    To keep you from accidentally overwriting production, the destination is limited to (localhost) (or) 616 0.0.1 by default.

    To use another host, add to_safe: true to your . pgsync.yml

    Large Tables

    For extremely large tables, sync in batches.

    pgsync large_table --in-batches

    The script will resume where it left off when run again, making it great for backfills.

    Foreign Keys

    By default, tables are copied in parallel. If you use foreign keys, this can cause violations. You can specify tables to be copied serially with:

    Help

    Version

    Scripts

    Use groups when possible to take advantage of parallelism.

    For Ruby scripts, you may need to do:

    (Bundler) . with_clean_env do    system (pgsync ... end
         Dependencies  

    If installation fails, your system may be missing Ruby or libpq.

    On Mac, run:

    On Ubuntu, run:

    What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    This Is Why the Dow Surged 1,800 Points on 'Turnaround Tuesday', Crypto Coins News

    This Is Why the Dow Surged 1,800 Points on 'Turnaround Tuesday', Crypto Coins News

    Firefly targets summer launch, unveils plans for lunar delivery service, Ars Technica

    Firefly targets summer launch, unveils plans for lunar delivery service, Ars Technica