- Published on
pg_stat_statements in action
- Authors
- Name
- Moch Lutfi
- @kaptenupi
Postgres extension pg_stat_statements is a tool that allows you to monitor the activity of your database. It is a great tool to monitor the activity of your database. If you use postgres and haven't used pg_stat_statements before, this article is a great place to start.
Getting Started
pg_stat_statements
can be found in the contrib
directory of the postgres
distribution. This extension is installed by default. If not, you can load/enable it by running the following command:
_1CREATE EXTENSION pg_stat_statements;
After enabled, pg_stat_statements
will record the activity of your database, strip out the information you don't need, and store it in a table called pg_stat_statements
. But it doesn't save individual queries, rather it parameterizes the queries and stores the aggregated information.
For example we have simple query like this:
_1SELECT * FROM users where username = 'lutfi';
It would be stored in the table pg_stat_statements
as:
_1SELECT * FROM users where username = $1;
Let's take a look the data in the table pg_stat_statements
:
_18SELECT * FROM pg_stat_statements;_18_18-- the results_18userid | 12_18dbid | 1_18query | select * from users where username = $1;_18calls | 100_18total_time | 0.0073_18rows | 2_18shared_blks_hit | 16_18shared_blks_read | 0_18shared_blks_dirtied | 0_18shared_blks_written | 0_18local_blks_hit | 0_18local_blks_read | 0_18local_blks_dirtied | 0_18local_blks_written | 0_18...
Get insights from pg_stat_statements
After run some query sometimes we have no clue the performance of the query or after run the systems for a month or so. We can see the performance of the query using simple query.
_7SELECT_7 (total_exec_time) as total,_7 (mean_exec_time) as avg,_7 query_7FROM pg_stat_statements_7ORDER BY 1 DESC_7LIMIT 100;
We can use complex query and adapt it to our needs. Before continue, let's take a look at the result of the query above.
_5 total | avg | query_5 --------+--------+-------------------------_5 295.06 | 10.13 | SELECT id FROM users..._5 219.13 | 80.24 | SELECT * FROM consoles..._5 (2 rows)
Looking at the result, we can see that the query with the highest total time is SELECT id FROM users
and the average time is 10.13
milliseconds. For more information about fields, please refer to the PostgreSQL documentation.
Using simple query above we can get a glimpse of the performance of the query in the current system. Let's move to the next step how to get 20 data with a most total_exec_time with other information like how many query it is called, how many percentage of cpu time it uses, etc.
_9SELECT substring(query, 1, 50) AS short_query,_9 round(total_exec_time::numeric, 2) AS total_exec_time,_9 calls,_9 round(mean_exec_time::numeric, 2) AS mean,_9 round((100 * total_exec_time /_9 sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_cpu_9FROM pg_stat_statements_9ORDER BY total_exec_time DESC_9LIMIT 20;
What we get from those query is:
_23 short_query | total_exec_time | calls | mean | percentage_cpu_23----------------------------------------------------+-----------------+--------+---------+----------------_23 select "revisions".*, "revision-user"."id" as "rev | 246207.30 | 178 | 1383.19 | 46.66_23 select kcu.table_name as table_name, rel_kcu.table | 115622.58 | 2608 | 44.33 | 21.91_23 update "release_log" set "status" = $1, "summary" | 62675.82 | 92 | 681.26 | 11.88_23 select column_name, column_default, is_nullable, u | 20830.61 | 2608 | 7.99 | 3.95_23 select "e0".* from "qore_engine_columns" as "e0" w | 19033.21 | 245508 | 0.08 | 3.61_23 select "e0".* from "qore_engine_columns" as "e0" w | 8642.93 | 12160 | 0.71 | 1.64_23 select "revisions".* from "revisions" as "revision | 7707.62 | 94 | 82.00 | 1.46_23 select "e0".* from "qore_engine_columns" as "e0" | 6885.04 | 12160 | 0.57 | 1.30_23 select "release_log".* from "release_log" as "rele | 6668.81 | 6330 | 1.05 | 1.26_23 select "e0".* from "qore_engine_views" as "e0" | 6333.34 | 12158 | 0.52 | 1.20_23 select "e0".* from "qore_engine_permissions" as "e | 5973.79 | 12180 | 0.49 | 1.13_23 select "e0".* from "qore_engine_relations" as "e0" | 5751.96 | 245476 | 0.02 | 1.09_23 select "e0".* from "qore_engine_permissions" as "e | 4864.06 | 12162 | 0.40 | 0.92_23 select "e0".* from "qore_engine_columns" as "e0" l | 1514.96 | 12450 | 0.12 | 0.29_23 insert into "revisions" ("apprevision", "data", "n | 1196.83 | 1598 | 0.75 | 0.23_23 select "e0".* from "qore_engine_relations" as "e0" | 989.98 | 11861 | 0.08 | 0.19_23 select "e0".* from "qore_engine_view_tables" as "e | 916.29 | 12205 | 0.08 | 0.17_23 select "e0".* from "qore_engine_permissions" as "e | 723.87 | 11663 | 0.06 | 0.14_23 select "e0".* from "qore_engine_tables" as "e0" wh | 687.20 | 13675 | 0.05 | 0.13_23 select "e0".* from "qore_engine_relations" as "e0" | 633.02 | 12163 | 0.05 | 0.12_23(20 rows)
What's next?
After we find out the query with the highest total_exec_time, we can use EXPLAIN ANALYZE
to get more information about the query. For example we got query select * from revisions where apprevision = $1
and we can use EXPLAIN ANALYZE select * from revisions where apprevision = $1
to get more information about the query. Here's the result of explain analyze:
_22EXPLAIN ANALYZE select * from revisions where apprevision = 123;_22_22_22Sort (cost=58.28..58.28 rows=1 width=348) (actual time=0.336..0.337 rows=0 loops=1)_22 Sort Key: revisions.updated_at DESC_22 Sort Method: quicksort Memory: 25kB_22 -> Group (cost=58.26..58.27 rows=1 width=348) (actual time=0.310..0.311 rows=0 loops=1)_22 Group Key: revisions.id, "revision-user".id, "production-revision_items".id_22 -> Sort (cost=58.26..58.26 rows=1 width=348) (actual time=0.309..0.310 rows=0 loops=1)_22 Sort Key: revisions.id, "revision-user".id, "production-revision_items".id_22 Sort Method: quicksort Memory: 25kB_22 -> Nested Loop Left Join (cost=0.00..58.25 rows=1 width=348) (actual time=0.288..0.288 rows=0 loops=1)_22 Join Filter: (revisions.id = "production-revision_items"."production-revision")_22 -> Nested Loop Left Join (cost=0.00..57.16 rows=1 width=258) (actual time=0.287..0.288 rows=0 loops=1)_22 Join Filter: (revisions."revision-user" = "revision-user".id)_22 -> Seq Scan on revisions (cost=0.00..56.00 rows=1 width=62) (actual time=0.287..0.287 rows=0 loops=1)_22 Filter: (apprevision = 1300)_22 Rows Removed by Filter: 2112_22 -> Seq Scan on users "revision-user" (cost=0.00..1.07 rows=7 width=196) (never executed)_22 -> Seq Scan on applications "production-revision_items" (cost=0.00..1.04 rows=4 width=90) (never executed)_22 Planning Time: 0.311 ms_22 Execution Time: 0.416 ms
How to read those explain analyze result? Luckily we can use https://explain.depesz.com/ to read explain analyze result and get hints about the query. See the result https://explain.depesz.com/s/85LD#html.
Wrap up
- By using
pg_stat_statements
we can easily find out the query performance. - Use https://explain.depesz.com/ to visualize the query plan.