Just Upgrade, Vacuum Updates, Connection Strings, Postgres Showcase | Scaling Postgres 80
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss why you should just upgrade, vacuum updates in Postgres 12, psql connection strings and a Postgres showcase.
Content Discussed
- Just Upgrade: How PostgreSQL 12 Can Improve Your Performance
- A Primer on PostgreSQL Upgrade Methods
- PG Phriday: Postgres 12 Revs up Vacuum
- Postgres Connection Strings and psql
- The Art of PostgreSQL Blog
- Updates for the Postgres Showcase project
- postgres-showcase on Github
- Remote Backup and Restore with pgBackRest
- Cloud Vendor Deep-Dive: PostgreSQL on Google Cloud Platform (GCP)
- Creating a PostgreSQL Cluster with Kubernetes CRDs
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Just Upgrade: How PostgreSQL 12 Can Improve Your Performance". This is from crunchydata.com. Now this blog post starts off talking about what is the major feature or features of this release, but what he says here quote "...many of the features and enhancements in PostgreSQL 12 will just make your applications run better without doing any work other than upgrading". So hence the title Just Upgrade. So he talks about some of these different areas and the first area he mentions is major improvements to indexing. Now this is something I wasn't aware of or haven't read about, but he said "PostgreSQL 12 makes significant improvements to how B-tree indexes work and experiments using TPCC-like tests showed a 40% reduction in space utilization on average".
So things that update tables regularly could see noticeable improvements to disk utilization and in turn that would mean less space required for caching those indexes, presumably. The other big benefit he also mentions here of course is being able to re-index indexes concurrently. Actually, you may need to do some of that to gain some of the benefits listed here in terms of space utilization. And then he also talks about how it reduces the overhead of WAL records for the GiST, gin, and SP-GiST indexes when an index is being built. So that is another potential advantage. Then he talks about partitioning improvements and significance in the area of being able to process tables that have thousands of partitions.
So things like maybe you're using time series, storing time series data, those will presumably be more efficient. Not just from querying a few tables out of those partition sets, but also improvements on insert speed. He also mentions along with that, there have also been some boosts to copy. Then he mentioned CTEs being in line. So that is a big boon that's mentioned here in this section: WITH queries get a big boost. The fact that JIT is now essentially a default. Then of course there are all the additional features, but all of the ones he mentioned here of mostly of a performance nature in some space wise that could maybe get you some performance improvements are all really great additions. So definitely a blog post to check out if you want to learn more.
The next post, given that PostgreSQL 12 is around the corner, is "A PRIMER ON POSTGRESQL UPGRADE METHODS". This is from cybertec-postgresql.com and he basically is covering the three main backup methods that exist. The first one is just a logical-based dump and restore. Just dump all the table data or doing a restore it. Then he covers binary in-place upgrades using a pg_upgrade and it can happen really fast by using the link method and then talking about using a logical replication as an upgrade method. He has this convenient table here that says essentially the logical dump and restore is simple, safe, and somewhat flexible. The downside, it's the slowest method of doing an upgrade, and doing a per-database approach has some pitfalls. Using a pg_upgrade, basically, the binary in place, as he's saying here, is fast, very fast. If you're doing the link method you still have to have some downtime, but it's still pretty fast and the old instance won't be affected in the default mode.
In other words, when you're not doing the link. The downside is it's more complex than the dump and restore method. It can be risky in link mode because you're actually reusing the same data files. It's just using a Linux hard link. You're going to possibly lose standby servers. You have to do a lot of work in using R sync to be able to sync them back up. There's no default way to upgrade the standby servers and if you're using the default mode, essentially you're going to have to have double the disk space required. Then the last method, using logical replication. So just transferring data from one version to a newer version of PostgreSQL.
It can be the shortest downtime possible, safe with the possibility of thorough live tests and essentially very flexible because you're just determining how you want to move it over. But this is the most complex method. You may need some schema changes. It can possibly be slow performance-wise because you're having to send everything over using logical replication and it's always per database so you can't do a whole cluster that way. You need to do each database. So if you're looking to upgrade your version of PostgreSQL, definitely a blog post to check out to help you determine what is the best method for upgrading your database.
The next post is "PG Phriday: Postgres 12 Revs up Vacuum". This is from 2ndquadrant.com. So basically they're talking about vacuum, they talk about some basics and they say that when you're doing a vacuum, it also needs to vacuum each of the indexes because they can bloat as well. So they did a test here and running a full vacuum using a pgbench example, which I think had about 10 million rows in it. It took about 21 seconds, but there's this feature in version 12 where you can avoid cleaning up the indexes. So when you set index_cleanup to false, it actually happens about half the time, about 12 seconds. So that's pretty great you're probably thinking but you always need to eventually vacuum the indexes, otherwise, they're going to bloat. I imagine there are issues if you are never going to vacuum those indexes. And I don't think there's a way to configure autovacuum to ignore the index_cleanup necessarily.
But they have some suggestions on how to potentially do this a little bit lower down in this blog post. Now here they added another index and you could see the time extended to 28 seconds. But doing the index_cleanup false was a little bit higher, but still, it's about half the time of vacuuming the indexes. And if you have tons of indexes on the table, I imagine the benefits could be increased. Now then they go over the disadvantage of skipping indexes. So basically you don't want to do this as a long-term thing. Your indexes do need to be cleaned up. Maybe you could have some scenario where you're doing index rebuilds along with not doing the index_cleanup versus vacuum. They talk about going into the transactions and the 2 billion XID limit, which one of the primary reasons vacuum exists in the first place is to be able to freeze visible rows so that they can all be visible to everyone.
So they get into using a vacuum and vacuumdb command line tool and you can actually say only do a vacuum where the min-xid-age is a certain value. So when the transaction age is at say 98,000 for each of these tables, you can set it at 100,000 and basically no work will be done when you're running a vacuum of the database. However, if you put the minimum at 50,000, then all of these will be picked up and it will do the vacuum and finish in 37 seconds. What's interesting is when you combine this with actually doing a freeze. So for example, they had a table with about 100,000 max-xact-id for one table and close to 200,000 for other tables. So we actually did a vacuumdb freeze with a min-xact-age of 100,000 and now it's going to just freeze those tables that have a transaction greater than 100,000. So you can see this as they talk about in the conclusion here.
This enables you to freeze certain tables at a certain level on a periodic basis, basically do the work of vacuum in smaller bits and parts, but more frequently, maybe it's a part of maintenance jobs that you schedule. Now, autovacuum should still be running, but this enables you to fine-tune and probably does more work when the database is less busy and even the index cleanup flag. Maybe you need to get a vacuum done on a table, but maybe you want to defer the cleanup on the indexes. Now you'll eventually need to do it, but there are probably one-time maintenance tasks or you have issues you could address by having this feature. It's just the suggestion of course here, which makes sense, is don't always do your vacuum with just an index cleanup, you need to eventually vacuum your indexes as well. So this is a great blog post talking about some of the new features coming in version 12 for vacuum. So I encourage you to check it out.
The next post is "Postgres Connection Strings and psql". This is from tapoueh.org. So this is talking about the psql command line to utility and how it can use convenient connection strings when connecting to it. Instead of using what you may typically see, where you're using parameters like U for the user, D for the database, H for the host, or P for the port. You can build it as a connection string, similar to how you may see it in some application development environments. Well, you can pass it in essentially as the database name and it will go ahead and connect. So essentially here there are three different connection methods being used and each of them gives you the same value. He gives an example of using Python here that uses libpq. So just a quick little post on connection strings for PostgreSQL.
The next post is actually a new blog that has been started. This is from theartofostgresql.com site for the book that was recently released by Dimitri Fontaine and he started doing some blogging about SQL. Now it's not really about PostgreSQL administration, but just discussing SQL in general and using PostgreSQL as an example. So if you feel this would be of interest to you, definitely check out this new blog.
The next post is "UPDATES FOR THE POSTGRES SHOWCASE PROJECT". This is from cybertec-postgresql.com. So this is a list of topics covered mostly of an SQL-related nature on how you create tables, and create databases. They go into more detail about how to connect, do transaction methods, create altering tables, partition other table modifiers constraints, view stored procedures, and triggers so a lot of information related to how you set up the table structures and objects within the database. So you can see the GitHub project here, that is the Postgres Showcase and it shows basically a very short set of code. So a lot of code that shows you how to do things like in this view, it explains views kind of what they are and shows you some examples. So if you're interested in this kind of using this as a resource, definitely a blog post and a GitHub project you may want to check out.
The next post is "REMOTE BACKUP AND RESTORE WITH PGBACKREST". This is also from cybertec-postgresql.com. So this is a follow-up to a previous blog post about pgBackRest where they just installed and did a backup on the same database server. This one actually sets up a separate pgbackup server that then connects to a database to do the backup and even tests to restore. So it goes through all the different codes that you would need to install and get it running and does the backup and restore. So if getting to learn pgBackRest is of interest to you, definitely a blog post to check out.
The next post is "Cloud Vendor Deep-Dive: PostgreSQL on Google Cloud Platform (GCP)". This is from the severalnines.com blog and this is a comprehensive examination of Google Cloud's implementation of their PostgreSQL service. So they go over that and it is a very long and comprehensive post about all the different available features. What are some advantages and disadvantages? So talking about almost every single example you can think of and using PostgreSQL on the Google Cloud platform. So if that is of interest to you, definitely a blog post to check out.
The last post is "Creating a PostgreSQL Cluster with Kubernetes CRDs". This is from crunchydata.com. CRDs are custom resource definitions. So this is a post about the PostgreSQL operator, which is a tool developed by Crunchy Data, to my knowledge, that eases the creation of PostgreSQL clusters on Kubernetes. So if you're interested in doing that, definitely a blog post to check out.