sobota, 4 sierpnia 2018

Simplistic PostgreSQL monitoring: pg_stat_database_hist

Good day!

In this series I will show you, how to create very simple but useful monitoring - using only PostgreSQL native tools, simple shell scripts and finally Grafana to add some visual sugar and dashboards.

There are dozens of ready-made monitoring tools, but they all need data source.  Here you are, going in "bare metal" mode.

Part 1 - Data store.

Run this once, in your postgres database:

CREATE TABLE pg_stat_database_hist (
  LIKE pg_stat_database,
  time timestamptz, 
  PRIMARY KEY (datname,time)
);
Do not worry about data size - it is tiny, compared to any kind of real life application data. Something like 3 MB per month.

Note: keeping this in same database simplifies a few things, but when your database server is out of order, you will not be able to read the stats.

Part 2 - data collection

Run this INSERT in your postgres database every 10 minutes.

INSERT INTO pg_stat_database_hist 
  SELECT *, now() FROM pg_stat_database;

As you are probably running your postgres on Linux host, this can be done with a cron job. Put this into /etc/cron.d/pgmon (in a single line of text!)

*/10 * * * * postgres psql -Xqc 'INSERT INTO pg_stat_database_hist SELECT *, now() FROM pg_stat_database'

Part 3 - What is There.

I will not repeat the official documentation.


Part 4: Sample Queries

For "numbackends" which is a gauge - type metric, this is simple.
Number of client sessions, over time:

SELECT time, sum(numbackends) 
FROM pg_stat_database_hist
GROUP BY time ORDER BY time;

For the other metrics, they are counters (accumulated) so to extract deltas, you need a little of substraction & division here and there:
 

WITH level1 AS (
  SELECT time, sum(tup_inserted) AS v
  FROM pg_stat_database_hist GROUP BY 1)
SELECT time, (v - lag(v) OVER w1)

 / extract(epoch from time - lag(time) OVER w1) AS tup_inserted_per_second
FROM level1
WINDOW w1 AS (ORDER BY time);

Note: actually, you can delegate the substraction logic to Grafana but we will get to this later.


To Be Continued....