PostgreSQL/BackupAndRecovery

Overview

Creating backups is an essential task for every database administrator. If the hardware crashes or there is software corruption, the DBA must ensure that a database can be restored with minimal data loss. PostgreSQL offers multiple strategies to support the DBA in achieving this goal.

Backup technology can be divided into two classes: cold backups and hot backups. A cold backup is a backup taken when no database file is open. In the case of PostgreSQL this means that the instance is stopped, the backup is taken and the instance is restarted. A hot backup is a backup taken during normal working operations. Applications can perform read and write actions in parallel with the backup creation. There are different types of hot backups. The main types are: logical, physical, and physical with PITR. These will be described in more detail in the section on hot backups.

Backup
   |
   +—— Cold Backup
   |
   +—— Hot Backup
        |
        +—— Logical Backup
        |
        +—— Physical Backup
        |
        +—— Physical Backup plus PITR

Cold Backup (Offline Backup)

A cold backup is a backup taken when the PostgreSQL instance is not running and includes a consistent copy of all files which constitute the cluster with all of its databases. Cold backups are also called offline backups.

There is only one way to create a consistent useful cold backup: the PostgreSQL instance should be stopped by issuing the pg_ctl stop command. This will disconnect all applications from all the cluster's databases. After the instance is shut down, make a backup using one of the standard operating system utilities (cp, tar, dd, rsync, etc.) to create a copy of all cluster files to a secure location, such as: on the disk of a different server, on a backup system at a SAN or intranet, a tape system or other reliable location.

To be a successful backup, the backup must include the following cluster files:

Include all files under the directory node where the cluster is installed. The environment variable $PGDATA points to this directory and usually resolves to something like, .../postgresql/9.4/main. Use echo $PGDATA on the command-line or in psql, show data_directory; to retrieve the directory node path.
Include configuration files, which may be in $PGDATA, but can also be located elsewhere, for example, as they are on the default Ubuntu install. These are the main configuration files: postgresql.conf, pg_hba.conf, and pg_ident.conf. Their locations can be found by running the following commands from the psql utility:

    show config_file;
    show hba_file;
    show ident_file;

Include all tablespace files. These files are located elsewhere on the file-system. Their locations can be found by looking at the symlinks in the $PGDATA/pg_tblspc directory:

    cd $PGDATA/pg_tblspc
    ls -lt

Caution! One may try to backup only special parts of a cluster, eg. a huge file which represents a table on a separate partition or tablespace - or the opposite: everything except the huge file. Even if the instance is shut down during the generation of such a partial copy, copies of this kind are useless. The restore of a cold backup needs all data files and meta-information files of the cluster to re-create the cluster.

Advantages

A cold backup is easy to do and to restore.

Disadvantages

A continuous 7x24 operation mode of any of the databases in the cluster is not possible.
It is not possible to backup smaller parts of a cluster like a single database or table.
Partial restores are not possible. Restores must include all the cluster files.
After a crash, any data changes that occur after the most recent cold backup are lost. Only the data in the backup will be restored.

How to Recover

It is strongly recommended to do the following steps on a test host to verify the restore before doing them on the production server.

In the case of a crash, restore the data from a cold backup by performing the following steps:

Stop the instance.
Backup the original files of the crashed cluster: the files in the $PGDATA path and the configuration files.
Delete all original files of the crashed cluster: the files in the $PGDATA path and the configuration files.
Copy the files of the backup to their original places.
Start the instance. It should start in the normal way, without any special message.

Hot Backup (Online Backup)

In contrast to cold backups, hot backups are taken while the instance is running and applications may change data as the backup is taken.

Hot backups are also known as online backups.

Logical Backup

A logical backup is a consistent copy of the data within a database or some of its parts. These backups are created with the utilities pg_dump and pg_dumpall. The instance must be running for these tools to operate. While they may be run in parallel to applications, they create consistent snapshots as of the time they are called. For example, if an application changes some data values during the creation of a backup, the backup takes the old values whereas the application sees the new values.

Logical backups run in serializable transactions. This is possible because of PostgreSQL's MVCC (Multi-version concurrency control) implementation.

pg_dump

pg_dump works at the database level and can backup specific subsets of a database such as individual tables. It is able to dump data, schema definitions or both. The parameters --data-only and --schema-only restrict the output to information respective to the given flag.

pg_dump supports two output formats; text format and binary format. The format type is chosen by pg_dumps parameter --format. The text format contains SQL commands like CREATE and INSERT. Files created in this format may be used by psql to restore the backed-up data. The binary format is also called the archive format. To restore files with this format you must use the pg_restore tool.

The following diagram visualizes the cooperation of pg_dump, psql and pg_restore.

    +<———  pg_dump text format <———  original database  ———>  pg_dump binary format  ———>+
    |                                                                                    |
script in SQL syntax                                                              binary file
    |                                                                                    |
    +——————————>  psql  ——————————>  restored database  <————————  pg_restore  <—————————+

Some Examples:

$ # dump complete database 'finance' in text format to a file
$ pg_dump --dbname=finance --username=boss --file=finance.sql
$
$ # restore database content (to a different or an empty database)
$ psql --dbname=finance_x --username=boss <finance.sql
$
$
$
$ # dump table 'person' of database 'finance' in binary format to a file
$ pg_dump --dbname=finance --username=boss --table=person --format=c --file=finance_person.archive
$
$ restore table 'person' from binary archive
$ pg_restore --dbname=finance_x --username=boss --format=c <finance_person.archive
$

pg_dumpall

The pg_dumpall command works at the cluster level and can backup up important cluster level objects like user/roles and their rights. pg_dumpall without detailed parameters will dump the complete contents of the cluster: all data, and structures of all databases plus all user definitions and definitions of their rights. The parameter --globals-only can be used to restrict its behavior to dump global objects only. pg_dumpall outputs in the text format.

Advantages

Continuous 7x24 operation mode is possible.
Small parts of cluster or database may be backup-ed or restored.
When you use the text format you can switch from one PostgreSQL version to another or from one hardware platform to another.

Disadvantages

The text format uses much space, but it compresses well.

How to Recover

As mentioned in the above diagram the recovery process depends on the format of the dump. Text files are in standard SQL syntax. To recreate objects from this format you can use SQL utilities like psql. Binary files are in the archive format. They can only be used by the utility pg_restore.

Physical Backup

A physical backup is an inconsistent copy of the files of a cluster, created with operating system utilities like cp or tar taken at a time whereas applications modify data. At first glance such a backup seems to be useless. To understand its purpose, you must know PostgreSQL's recover-from-crash strategy.

At all times and independent from any backup/recovery action, PostgreSQL maintains Write Ahead Log (WAL) files - primarily for crash-safety purposes. Such WAL files contain log records, which reflects all changes made to the data and the schema. Prior to their transfer to the data files of the database the log records are stored in the (sequentially written) WAL file. In the case of a system crash those log records are used to recover the cluster to a consistent state during restart. The recover process searches the timestamp of the last checkpoint and replays all subsequent log records in chronological order against this version of the cluster. Through that action the cluster gets recovered to a consistent state and contains all changes up to the last COMMIT.

The existence of a physical backup, which is inconsistent by definition but contains its WAL files, in combination with this recovery-from-crash technique can be used for backup/recovery purposes. To implement this, you have to restore the previous taken physical backup (including its WAL files). When the instance starts again, it uses the described recovery-from-crash technique and replays all log records in the WAL files against the database files. In the exact same way as before, the cluster comes to a consistent state and contains all changes up to the point in time when the backup-taken has started.

Please keep in mind, that physical backups work only on cluster level, not on any finer granularity like database or table.

Physical backups without PITR sometimes are called standalone hot physical backup.

Advantages

Continuous 7x24 operation mode is possible.

Disadvantages

Physical backup works only on cluster level, not on any finer granularity like database or table.
Without PITR (see below) you will lose all data changes between the time, when the physical backup is taken, and the crash.

How to Take the Backup and Recover

To use this technique it is necessary to configure some parameters in the postgres.conf file for WAL and archive actions. As the usual technique is Physical Backup plus PITR we describe it in the next chapter.

Physical Backup plus PITR

The term PITR stands for Point In Time Recovery and denotes a technique, where you can restore the cluster to any point in time between the creation of the backup and the crash.

The Physical Backup plus PITR strategy takes a physical backup plus all WAL files, which are created since the time of taking this backup. To implement it, three actions must be taken:

Define all necessary parameters in postgres.conf
Generate a physical backup
Archive all arising WAL files

If a recovery becomes necessary, you have to delete all files in the cluster, recreate the cluster by copying the physical backup to its original location, create the file recovery.conf with some recovery-information (especially to what location WAL files have been archived) and restart the instance. The instance will recreate the cluster according to its parameters in postgres.conf and recovery.conf to a consistent state including all data changes up to the last COMMIT.

Advantages

Continuous 7x24 operation mode is possible.
Recover with minimal data loss.
Generating WAL files is the basis for additional features like replication.

Disadvantages

Physical backup works only on cluster level, not on any finer granularity like database or table.
If your database is very busy and changes a lot of data, many WAL files may arise.

How to Take the Backup

Step 1

You have to define some parameters in postgres.conf so that: WAL files are on the level 'archive' or higher, archiving of WAL files is activated and a copy command is defined to transfers WAL files to a fail-safe location.

# collect enough information in WAL files
wal_level = 'archive'
# activate ARCHIVE mode
archive_mode = on
# supply a command to transfer WAL files to a failsafe location (cp, scp, rsync, ...)
# %p is the pathname including filename. %f is the filename only.
archive_command = 'scp %p dba@archive_server:/postgres/wal_archive/%f'

After the parameters are defined, you must restart the cluster pg_ctl restart. The cluster will continuously generate WAL files in its subdirectory pg_wal (pg_xlog in Postgres version 9.x and older) in concordance with data changes in the database. When it has filled a WAL file and must switch to the next one, it will copy the old one to the defined archive location.

Step 2

You must create a physical or base backup with an operating system utility during the instance is in a special 'backup' mode. In this mode the instance will perform a checkpoint and create some additional files.

$ # start psql and set the instance to 'backup' mode, where it creates a checkpoint
$ psql -c "SELECT pg_start_backup('AnyBackupLabel');"
$ 
$ # copy the cluster's files
$ scp -r $PGDATA dba@archive_server:/postgres/whole_cluster/
$
$ # start psql again and finish 'backup' mode
$ psql -c "SELECT pg_stop_backup();"
$

If you like to do so, you can replace the three steps by a single call to the utility pg_basebackup.

Step 3

That's all. All other activities are taken by the instance, especially the continuous copy of completely filled WAL files to the archive location.

How to Recover

To perform a recovery the original physical or base backup is copied back and the instance is configured to perform recovery during its start.

Stop the instance - if it is still running.
Create a copy of the crashed cluster - if you have enough disc space. Maybe, you will need it in a later stage.
Delete all files of the crashed cluster.
Recreate the cluster files from the base backup.
Create a file recovery.conf in $PGDATA. It must contain a command similar to: restore_command = 'scp dba@archive_server:/postgres/wal_archive/%f %p'. This copy command is the reverse of the command in postgres.conf, which saved the WAL files to the archive location.
Start the instance. During startup the instance will copy and process all WAL files found in the archive location.

The fact, that recovery.conf exists, signals the instance to perform a recovery. After a successful recovery this file is renamed.

If you want to recover to some previous point in time prior to the occurrence of the crash (but behind the creation of the backup), you can do so by specifying this point in time in the recovery.conf file. In this case the recovery process will stop before processing all archived WAL files. This feature is the origin of the term Point-In-Time-Recovery.

In summary the recovery.conf file may look like this:

restore_command      = 'scp dba@archive_server:/postgres/wal_archive/%f %p'
recovery_target_time = '2016-01-31 06:00:00 CET'

Additional Tools

There is an open source project Barman ^[1], which simplifies the steps of backup and recovery. The system helps you, if you have to manage a lot of servers and instances and it becomes complicate to configure and remember all the details about your server landscape.

References

↑ Barman

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Barman