background

28 Tips, Lucky 13, Autovacuum Tuning, Logical Pitfalls | Scaling Postgres 129

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss 28 tips & tricks, Postgres 13 is lucky, autovacuum tuning and logical replication pitfalls.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your family, and coworkers continue to do well. Our first piece of content is "Postgres Tips And Tricks". This is from pgdash.io and is a set of about 28 different tips and tricks with Postgres from things you can do with SQL commands to other types of configuration. So I'll just run through a few of these so that you get a sense of it. But there are 28 of them, so I'm not going to cover everything. First, you can insert multiple rows in one multi-row insert. Definitely a better thing to do for performance. It allows you to insert data much faster. Another tip is to insert a row, and return automatically assigned values. So this is the returning statement. Further down below, you can see how you can use it with DELETE statements or UPDATE statements as well to return data from changed data. How to auto-generate UUID primary keys. There are 25 other tips you can use for working with Postgres. So I highly encourage you to check out this blog post to see if there's a tip that could maybe help you in your work.

The next piece of content is "Why PostgreSQL 13 is a Lucky Release". This is from crunchydata.com and he's highlighting a few features that are coming with version 13. The first question is why are my indexes smaller? Basically, with the deduplication that is in version 13, you can get a lot of space savings with B-tree indexes. So he generates an index here and gives you an example where Postgres 12 had a 28-megabyte index, whereas Postgres 13 had a 9.5-megabyte index. In addition, query times were 2x faster on Postgres 13. So definitely a benefit. But of course, once you upgrade to 13, you're going to need to REINDEX CONCURRENTLY your indexes to get this deduplication and presumably some of these performance benefits. But as he mentions here, your mileage may vary based on your performance gains. 

The next thing is speeding up vacuum or autovacuum, because now you can do indexes in parallel, which is a huge advantage because that takes the longest amount of time for vacuum to run, in my experience, is going through all of the indexes. The next is something I wasn't as familiar with, but it says sort less, sort quicker that they are offering an incremental sorting. It appears like when you're sorting by more than one column, it uses some efficiency. As long as there's an index, I believe, on this column X, the first column you order by to give you faster query execution. So it looks like here in version 12, this particular query ran in 39 milliseconds, whereas in version 13, it ran in less than one millisecond, 0.82 milliseconds. So that's a huge performance change. 

Again, your mileage may vary, but if this is possible, that's definitely an interesting addition. Then, of course, he mentioned some of the other enhancements whereby more create statistics options are possible, different partitioning benefits in terms of full support for logical replication, and before triggers. Hash aggregation used with aggregate functions and grouping sets, the libpq drivers now supports SCRAM authentication with channel binding, a date time function in the JSON path query language, and generating a random UUID can now be used without an extension. That's a good one. And Postgres Foreign Data Wrapper now supports certificate authentication. So a lot of big improvements are coming for 13 that should be out sometime this fall. If you're interested in learning more about it, you can check out this blog post.

Next piece of content is "TUNING POSTGRESQL AUTOVACUUM". This is from cybertec-postgresql.com. So first they cover what are the tasks of autovacuum and they mention the cleanup of dead tuples left behind after update or delete operations. Update the free space map that keeps track of free space and table blocks. Update the visibility map that's required for index-only scans. That helps the system determine if it needs to actually go look at the heap or if it can just stay on the index freeze table rows so that the transaction ID counter can safely wrap around and then also do an ANALYZE so that your statistics are kept up to date. So the first thing that they mentioned is to make sure that nothing keeps autovacuum from reclaiming dead tuples. So basically make sure autovacuum is running. The main things that stop it from running are long-running transactions. So basically do what you can to avoid those. 

They mentioned two configuration changes that you can make are the idle_in_transaction_session_timeout. You can set that to a value such that if you have transactions that are idle, they will be canceled after a period of time. Also the general statement timeout. So any long-running statements will automatically be canceled. The next step is tuning autovacuum to run faster. What I consider the primary method of doing that is the autovacuum_vacuum_cost_limit here. So you basically put this somewhere up in the thousands to make sure more of the system is dedicated to doing vacuums, as long as it doesn't impact your query performance, of course. And then auto vacuum vacuum cost delay. In newer versions, it's at two milliseconds, but you can of course adjust that. In older versions, it was 20. So the closer you get to zero, the faster you'll autovacuum tables. 

And they do mention a number of times in this post that you can set these settings not just at the cluster-level, but also at the table-level. So here they show an example of a busy table where they're changing the vacuum cost delay to one millisecond. Now, I've heard opinions on both sides. One opinion is that you shouldn't really do table options because it starts making things more difficult to manage and just stick with the cluster-level options. Whereas others say if you have particular tables that are exceptions, go ahead and change the parameters for those tables. So I'll leave it up to you to decide which you think is best. The next area is to change the workload so that fewer dead tuples are generated. So basically, can you make changes to your application where potentially you're updating or deleting rows less? So for example, in terms of deletion, could you maybe partition and then just truncate or drop those tables? 

Or if you're doing a bunch of updates, could you batch those updates or alter your application so that you're maybe doing more inserts and fewer updates? They also mentioned here that you could reduce the number of dead tuples by using HOT or heap-only tuple updates. So you could set a fill factor to something less than 100 and then make sure when you're doing an update that the column you're updating is not indexed so it doesn't have to go to the index to do an update. In terms of tuning autovacuum for index-only scans, again, this deals with a visibility map, he suggests setting your autovacuum scale factor relatively low so that it gets vacuumed frequently. Then for INSERTS in version 13, there's the new autovacuum_vacuum_insert_scale_factor that will help that table be picked up to be vacuumed more frequently so you get the visibility map updated more frequently. 

This is for the case where you have predominantly append-only tables. There are not a lot of UPDATES and DELETES happening now on lower versions. He says you can modify the freeze_max_age so that again you get it to vacuum more frequently. In terms of tuning autovacuum to avoid transaction wraparound problems, he mentions making sure that the anti-wraparound vacuum can freeze tuples in all tables. Again, we're going back to those very long transactions. You want to minimize those and do the suggestions that were mentioned previously and then address them for UPDATES, DELETES, and INSERTS. You just want to make sure the autovacuum is running. There are no long-running transactions or things of that nature that are blocking autovacuum from doing its job. Now they also mentioned partitioning and he says if you are partitioning with a lot of tables, then you probably want to consider having more autovacuum workers. 

Because more tables, then you can work on them more frequently with more workers. However, keep in mind the cost_limit, which was the first thing mentioned here, covers all workers. So if you have three workers, they can do more work per worker. But if you go to six, this limit is for all six, or if you go to nine, this limit is for all nine. So the more workers you have, the less work that can be done by each one, but if you're partitioning with a lot of tables, that's okay to do because there's less work to be done in each partition. Lastly, they cover tuning autoanalyze and some configuration changes you can make to have it analyze your tables more frequently. So overall, if you're looking to do some tuning to PostgreSQL's autovacuum, definitely check out this blog post.

The next piece of content is "Pitfalls and quirks of logical replication in Postgres 12". This is from elephanttamer.net. Now, he's talking about logical replication. He doesn't mention the particular version, but he talks about some quirks you need to be aware of. The first thing he talks about is disk space considerations because if you have logical replication failing for some reason, you could potentially run out of disk space on the primary because logical replication uses replication slots. Now, you can also run into this if you're using replication slots for physical replication or WAL-based replication, but you pretty much have to use slots with logical replication. Therefore this disk usage risk exists. So you need to have monitoring in place to handle that and identify if you're getting too much WAL due to, say, an orphan slot being present. 

The next thing that they mentioned is that basically, you need to have a replication slot first before you create your subscription. Otherwise, the create subscription statement just hangs. So that slot needs to be present. The next thing he mentions is to be careful with publication for all tables because if you do this, it doesn't really address tables that are newly added. In fact, he says, quote "...it will cause the replication to stop as soon as you issue a CREATE TABLE statement". In addition, it'll automatically include tables created by extensions such as the spatial_ref_sys table he mentions for PostGIS and also for tables that don't have a primary key or replica identity, which poses a problem for logical replication. 

So you want to be very cautious, he says, of these FOR ALL TABLES, and perhaps just do individual tables. The next thing, consider the tables that you're replicating because, in the case of the spatial_ref_sys table, this is actually supposed to be independent per PostgreSQL instance, even if you're replicating over. So you probably don't want to replicate this particular table. Next is to review your primary keys carefully and make sure each table that you're replicating has a primary key or a replica identity. The last recommendation is after adding a new table, be sure to refresh your publication. So for example, you do an ALTER PUBLICATION... ADD TABLE on the primary server, and then you ALTER SUBSCRIPTION... REFRESH PUBLICATION on your replica. So definitely some issues to be aware of if you are using logical replication. If you want more details about this, check out this post.

The next piece of content is actually a YouTube video and it's "Webinar Business Intelligence with Window Functions in PostgreSQL 11" by Gianni Jolie. This was actually done as a webinar in August 2019 and covers PostgreSQL 11. But I'm highlighting this because anything with window functions, I always like to remind myself of their features because I don't use them that frequently. So if you want to learn more about window functions, this YouTube video was just posted from the webinar that was done in 2019.

The next piece of content is "Authentication in Pgpool-II". This is from b-peng.blogspot.com. They're talking about how you set up Pgpool II in terms of authentication, in terms of how it works. Basically, we have a client that first authenticates to Pgpool II and then Pgpool II authenticates to the PostgreSQL server. They have all these different authentication methods that are supported by Trust, MD5, SCRAM-SHA-256, certificate, PAM, and LDAP. Some of these are version-specific, of course, for Pgpool II. Then it goes into some of the different settings you can make and configuration. And of course, they definitely suggest using either some sort of directory-based authentication or using the SCRAM for security reasons. So if you're interested in understanding how Pgpool II handles authentication, definitely check out this blog post.

The next piece of content is "Introducing the Postgres Prometheus Adapter. This is from crunchydata.com. Prometheus is a systems and service monitoring system, so it tracks metrics for different servers, for different metrics you want to measure. Now it has a built-in database it already uses, but this blog post describes an adapter that allows you to send data to Postgres as opposed to the default database. So it has the code of how you get the adapter and how you can set it up and configure it to be able to use Prometheus to send all of your data to Postgres. And it looks like it sends it to partition tables. So if you use Prometheus and want to try to send the data to Postgres as opposed to its default database, definitely check out this blog post.

The next piece of content is "Who is spending wal crazily". This is from highgo.ca, and they're basically talking about trying to identify what is causing the increase in the amount or size of WAL records. Basically, this post is a tool kind of describing pg_waldump. So this helps you look at each of the different types of records that exist in a set of WAL files and tells you the record size and the percentage of each type. So this blog post goes over how you can use the command pg_waldump, some different configurations that they've set, and also describes some of the different column output and the types of records that exist. Now, they don't cover everything, but they give a basic overview. So if you're interested in using pg_waldump to analyze your WAL files, definitely check out this blog post.

The next piece of content is "Advanced partition matching for partition-wise join". This is from 2ndquadrant.com, and this is a feature that's coming in Postgres 13 where they made some enhancements to partition-wise joins. So if you want to learn more about the enhancements that are coming to version 13, definitely check out this blog post.

Next piece of content is "Waiting for PostgreSQL 14 - pg_stat_statements: track number of rows processed by some utility commands". This is from depesz.com. Basically, for statements tracked in pg_stat_statements, it retains how many rows were processed. So this is now an output option. So you can see how many rows are returned by given statements that are tracked. So if you want to learn more about that, check out this blog post.

Next piece of content is "Preventing SQL Injection Attacks in Postgres". This is from crunchydata.com. Now, this isn't necessarily Postgres specific, but they do mention a few configuration changes you can make to help audit some of the statements coming through here, like log_statement and log_min_error_statement. But a lot of the advice I give is for any database system and a lot of application frameworks kind of do all of these things already, but it was still a good post to give you an overview of SQL injection attacks. So if you're interested in learning more, definitely check out this blog post.

The last piece of content, the PostgreSQL person of the week is Renee Phillips. So if you're interested in learning more about Renee and her contributions to PostgreSQL, definitely check out this blog post.


 

episode_image