pg_stat_statements in action

• 📚 4 min read • Tweet this post

Postgres extension pg_stat_statements is a tool that allows you to monitor the activity of your database. It is a great tool to monitor the activity of your database. If you use postgres and haven’t used pg_stat_statements before, this article is a great place to start.

pg_stat_statements can be found in the contrib directory of the postgres distribution. This extension is installed by default. If not, you can load/enable it by running the following command:

CREATE EXTENSION pg_stat_statements;

After enabled, pg_stat_statements will record the activity of your database, strip out the information you don’t need, and store it in a table called pg_stat_statements. But it doesn’t save individual queries, rather it parameterizes the queries and stores the aggregated information.

For example we have simple query like this:

SELECT * FROM users where username = 'lutfi';

It would be stored in the table pg_stat_statements as:

SELECT * FROM users where username = $1;

Let’s take a look the data in the table pg_stat_statements:

SELECT * FROM pg_stat_statements;

-- the results
userid              | 12
dbid                | 1
query               | select * from users where username = $1;
calls               | 100
total_time          | 0.0073
rows                | 2
shared_blks_hit     | 16
shared_blks_read    | 0
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
...

After run some query sometimes we have no clue the performance of the query or after run the systems for a month or so. We can see the performance of the query using simple query.

SELECT
  (total_exec_time) as total,
  (mean_exec_time) as avg,
  query
FROM pg_stat_statements
ORDER BY 1 DESC
LIMIT 100;

We can use complex query and adapt it to our needs. Before continue, let’s take a look at the result of the query above.

   total  |   avg  |        query
  --------+--------+-------------------------
   295.06 |  10.13 | SELECT id FROM users...
   219.13 |  80.24 | SELECT * FROM consoles...
  (2 rows)

Looking at the result, we can see that the query with the highest total time is SELECT id FROM users and the average time is 10.13 milliseconds. For more information about fields, please refer to the PostgreSQL documentation.

Using simple query above we can get a glimpse of the performance of the query in the current system. Let’s move to the next step how to get 20 data with a most total_exec_time with other information like how many query it is called, how many percentage of cpu time it uses, etc.

SELECT substring(query, 1, 50) AS short_query,
              round(total_exec_time::numeric, 2) AS total_exec_time,
              calls,
              round(mean_exec_time::numeric, 2) AS mean,
              round((100 * total_exec_time /
              sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_cpu
FROM    pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

What we get from those query is:

                   short_query                      | total_exec_time | calls  |  mean   | percentage_cpu
----------------------------------------------------+-----------------+--------+---------+----------------
 select "revisions".*, "revision-user"."id" as "rev |       246207.30 |    178 | 1383.19 |          46.66
 select kcu.table_name as table_name, rel_kcu.table |       115622.58 |   2608 |   44.33 |          21.91
 update "release_log" set "status" = $1, "summary"  |        62675.82 |     92 |  681.26 |          11.88
 select column_name, column_default, is_nullable, u |        20830.61 |   2608 |    7.99 |           3.95
 select "e0".* from "qore_engine_columns" as "e0" w |        19033.21 | 245508 |    0.08 |           3.61
 select "e0".* from "qore_engine_columns" as "e0" w |         8642.93 |  12160 |    0.71 |           1.64
 select "revisions".* from "revisions" as "revision |         7707.62 |     94 |   82.00 |           1.46
 select "e0".* from "qore_engine_columns" as "e0"   |         6885.04 |  12160 |    0.57 |           1.30
 select "release_log".* from "release_log" as "rele |         6668.81 |   6330 |    1.05 |           1.26
 select "e0".* from "qore_engine_views" as "e0"     |         6333.34 |  12158 |    0.52 |           1.20
 select "e0".* from "qore_engine_permissions" as "e |         5973.79 |  12180 |    0.49 |           1.13
 select "e0".* from "qore_engine_relations" as "e0" |         5751.96 | 245476 |    0.02 |           1.09
 select "e0".* from "qore_engine_permissions" as "e |         4864.06 |  12162 |    0.40 |           0.92
 select "e0".* from "qore_engine_columns" as "e0" l |         1514.96 |  12450 |    0.12 |           0.29
 insert into "revisions" ("apprevision", "data", "n |         1196.83 |   1598 |    0.75 |           0.23
 select "e0".* from "qore_engine_relations" as "e0" |          989.98 |  11861 |    0.08 |           0.19
 select "e0".* from "qore_engine_view_tables" as "e |          916.29 |  12205 |    0.08 |           0.17
 select "e0".* from "qore_engine_permissions" as "e |          723.87 |  11663 |    0.06 |           0.14
 select "e0".* from "qore_engine_tables" as "e0" wh |          687.20 |  13675 |    0.05 |           0.13
 select "e0".* from "qore_engine_relations" as "e0" |          633.02 |  12163 |    0.05 |           0.12
(20 rows)

After we find out the query with the highest total_exec_time, we can use EXPLAIN ANALYZE to get more information about the query. For example we got query select * from revisions where apprevision = $1 and we can use EXPLAIN ANALYZE select * from revisions where apprevision = $1 to get more information about the query. Here’s the result of explain analyze:

EXPLAIN ANALYZE select * from revisions where apprevision = 123;


Sort  (cost=58.28..58.28 rows=1 width=348) (actual time=0.336..0.337 rows=0 loops=1)
   Sort Key: revisions.updated_at DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Group  (cost=58.26..58.27 rows=1 width=348) (actual time=0.310..0.311 rows=0 loops=1)
         Group Key: revisions.id, "revision-user".id, "production-revision_items".id
         ->  Sort  (cost=58.26..58.26 rows=1 width=348) (actual time=0.309..0.310 rows=0 loops=1)
               Sort Key: revisions.id, "revision-user".id, "production-revision_items".id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop Left Join  (cost=0.00..58.25 rows=1 width=348) (actual time=0.288..0.288 rows=0 loops=1)
                     Join Filter: (revisions.id = "production-revision_items"."production-revision")
                     ->  Nested Loop Left Join  (cost=0.00..57.16 rows=1 width=258) (actual time=0.287..0.288 rows=0 loops=1)
                           Join Filter: (revisions."revision-user" = "revision-user".id)
                           ->  Seq Scan on revisions  (cost=0.00..56.00 rows=1 width=62) (actual time=0.287..0.287 rows=0 loops=1)
                                 Filter: (apprevision = 1300)
                                 Rows Removed by Filter: 2112
                           ->  Seq Scan on users "revision-user"  (cost=0.00..1.07 rows=7 width=196) (never executed)
                     ->  Seq Scan on applications "production-revision_items"  (cost=0.00..1.04 rows=4 width=90) (never executed)
 Planning Time: 0.311 ms
 Execution Time: 0.416 ms

How to read those explain analyze result? Luckily we can use https://explain.depesz.com/ to read explain analyze result and get hints about the query. See the result https://explain.depesz.com/s/85LD#html.

database postgresql postgres pg_stat_statements