PostgreSQL, especially versions 9.4 and later, come with many system views,
functions and features that expose rich information about the internal workings
of your PostgreSQL cluster.
Traditionally, there have been scripts – like the venerable
check_postgres and others, usually passed
on from DBA to DBA and shared on public and private Wikis – that collect, or
check some of these metrics and information.
There hasn’t been a standard, easy way to collect all the information available
from your PostgreSQL server, in a way that can be used by DBAs to troubleshoot
and ops folks to use in scripting for automation and monitoring.
pgmetrics
pgmetrics aims to be the simplest way to collect comprehensive
information and statistics from a running PostgreSQL cluster. It can display
this information in a human-readable text format for troubleshooting and
diagnosis, or export it as a JSON file for scripting, automation amd
monitoring.
We built pgmetrics based on our interactions with the customers of our
server, database and service monitoring product OpsDash,
as well as from our own needs arising from using PostgreSQL in production.
pgmetrics is open-source (Apache 2.0 licensed), is available now and lives at
pgmetrics.io.
Usage
pgmetrics is a single, dependency-free, statically-linked executable that can be
easily deployed to any server or machine by just copying it there. It can be
invoked just like psql, and takes nearly the same set of command-line
parameters and environment variables that you’re used to:
For example, to collect metrics from databases called “main” and “reports” from
a server, you might use:
Have a look at the information and metrics pgmetrics can collect and display, in
the sections below:
Overall Status
Basic, overall information about the server includes last checkpoint,
transaction times, checkpoint lag, active backend count and transaction ID age.
The range of transaction IDs must be within 2 billion to prevent transaction ID
wraparound issues.
Replication Status
When run an a primary with outgoing streaming replication, pgmetrics displays
the state and progress of each replication – the lag, in bytes, between the
current state, and the state that has been received, flushed and replayed at
the destination.
Replication Slots
On the master side, a list of replication slots, along with the progress
information, is captured.
Standby Replication Status
When pgmetrics is run on a hot standby, it collects the recovery and replication
status. This shows how much data has been received and replayed at the standby
end. If replication slots are used, richer data is available as seen below:
WAL Archiving
When WAL archiving is enabled, these stats are also collected and displayed. The
number of WAL files in the pg_wal (or pg_xlog in older PostgreSQL versions)
directory, as well as the number of “ready” files are also collected.
Relevant configuration settings are also displayed in this section. pgmetrics
will collect all settings, including changes from default, it any.
BG Writer
The stats for the bgwriter process include the distribution of scheduled and
requested checkpoints, write frequency and amounts, buffer writes segregated by
cause, and other information.
Backends
The report about active backends highlights common causes for concern, like
transactions that have been open for too long, or are idling. Processes waiting
for various reasons, including locks, are also called out:
Vacuum Progress
Information of ongoing vacuum and autovacuum jobs are also captured by
pgmetrics. This is helpful in diagnosing “stuck” vacuum jobs and also trying to
make an educated guess about when ongoing jobs will finish.
Roles
All the roles (user and groups) in the cluster, including group membership
and attributes like superuser, connection limit etc. are also captured by
pgmetrics. The display is a bit too wide for this blog post, so we’re omitting
it here.
Tablespaces
Each tablespace, it’s location and the size consumed (as reported by
pg_tablespace_size) is collected by pgmetrics. If run locally, it also
examines the mounted filesystem where the tablespace is located and reports the
disk and inode usage for that filesystem.
Databases
The stats for each database includes the commit ratio, cache efficiency, the
age of oldest active transaction ID, the size in bytes etc. Also included are
the list of installed extensions, the cache efficiency of sequence objects and
the list of disabled triggers.
Tables and Indexes
For each table, the last vacuum and analyze information, as well as estimates of
live and dead rows are collected. Stats like percentage of updates that are HOT
updates, cache efficiency for table and indexes, size, bloat, the use of
sequential and index scans etc are included.
Stats for each index associated with the table, like cache efficiency and
rows fetched/scan etc. are also listed.
The bloat figure calculated by pgmetrics uses the query taken from
check_postgres.
System Information
Finally, pgmetrics also captures system-level information (if run locally, of
course).
Availability
pgmetrics is available as a single binary for Linux, Windows and macOS, for
64-bit platforms. You can download these from the GitHub
releases page. You can easily
build pgmetrics for other platforms yourself using a Go
development environment, read more here.
System metrics (disk space, memory usage etc) are currently supported only for
Linux servers.
pgmetrics currently runs on PostgreSQL versions from 9.3 to 10. Patches to make
it work on other versions are welcome.
Get Involved!
We’d be happy to hear your bug reports, suggestions and feedback; and
incorporate them to make pgmetrics even more useful. Find out more at the
pgmetrics home page at pgmetrics.io.
Join the pgmetrics-users
mailing list (Google groups) for announcements and discussions.
Report bugs and suggestions to the GitHub project’s
issue tracker.