Out-Of-Cycle Release, Should You Upgrade, Postgres Survey, Automatic Indexing | Scaling Postgres 219
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss a Postgres 14 out-of-cycle release next week, whether you should upgrade, the state of Postgres survey and a tool for automatic indexing.
Content Discussed
- PostgreSQL 14 out-of-cycle release coming June 16, 2022
- Important PostgreSQL 14 update to avoid silent corruption of indexes
- Notes on updating to PostgreSQL 14.3, 13.7, 12.11, 11.16, and 10.21
- The 2022 State of PostgreSQL Survey Is Now Open!
- An automatic indexing system for Postgres: How we built the pganalyze Indexing Engine
- A balanced approach to automatic Postgres indexing: The new version of the pganalyze Index Advisor
- 5mins of Postgres E21: Server-side backup compression with LZ4 and Zstandard in Postgres 15, and the removal of exclusive backup mode
- Understand PG’s MVCC Visibility Basic Check Rules
- Update on the Trademark Actions Against the PostgreSQL Community
- Re: Update on the Trademark Actions
- Hironobu Suzuki
- Rubber Duck Dev Show Episode 45 | Reviewing the 2022 Rails Community Survey - Part 2
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 14 out-of-cycle release coming June 16, 2022". This is from postgresql.org. So that's this coming Thursday. This is to resolve the CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY bug that was introduced when Postgres 14 was released. They were trying to optimize something and unfortunately, it resulted in a potential index corruption issue when you're using that concurrently option when manipulating indexes. We covered this issue in last week's episode of Scaling Postgres, but this one says the patch for it will be released this Thursday. They also mentioned here you can use the pg_amcheck command with heapallindexed in order to see if you've had corruption if you have run indexes concurrently or reindex concurrently. They mentioned that if you do have corruption, how you can get around it now is just do a straight CREATE INDEX or REINDEX without using the CONCURRENTLY option. Of course, that's going to lock up the data, so you have to weigh those costs/benefits there. But definitely when the next version comes out, version 14.4, I would definitely upgrade to it as soon as you can.
The next piece of content- "Important PostgreSQL 14 update to avoid silent corruption of indexes". This is from migops.com and this is basically reiterating the issue that exists for the concurrent usage of indexes. But what was interesting about this post is that at the bottom here, they show you the individual commands to run to do the amcheck. CREATE EXTENSION amcheck- this is the command to be able to check a pattern of tables or a pattern of individual indexes to make sure that they are not corrupted. So you can check this post out if you want to get that information.
The next piece of content- "NOTES ON UPDATING TO POSTGRESQL 14.3, 13.7, 12.11, 11.16, AND 10.21". This is from jkatz05.com and he's talking about an issue of whether you should potentially upgrade to some of these because there've been a couple of issues that have come from some of the patches. Now, the first one he mentions is the CONCURRENTLY issue with indexes in Postgres 14. Now I don't think there's any question about that when 14.4 comes out with this fix that you should immediately upgrade to it. But most of this post is focused on the CVE-2022-1552. So this is a security issue that was patched in these most recent releases of Postgres that are listed here. It basically allows an unprivileged user to craft malicious SQL to escalate their privileges. Now, he said there's been another bug that's been discovered as a result of the CVE, so it impacts all of the patched versions.
It relates to, quote "Creating an expression index using an operator class from a different schema". So it seems kind of specific, but there may be other instances where something like this causes issues, particularly if it's for a gist_trgm_ops operator. Someone saw this from a pg_dump, but you can replicate it with a few commands that are hyperlinked here. But it's just something to be aware of, that this is kind of another bug that's known to exist right now. And he asked the question, should you upgrade? Now personally, I think the security risk is great enough that I would go ahead and upgrade, at least for the database that I'm directly managing.
And I say that mostly because I know that this gist_trgm_ops is something that's infrequently used, if at all, and I'll go ahead and check to see in the schema if any of these types of indexes exist anyway. But it's just more information to be aware of as you plan to do your upgrades of various different systems. Because if you're going to be upgrading to 14.4, you're going to be getting this CVE patch no matter what, and you're going to be exposed to this potential bug here. And there's no patch for this bug at this time, at least according to this post. But if you want to learn more, definitely check out this article.
The next piece of content- "The 2022 State of PostgreSQL Survey Is Now Open!". This is from timescale.com. They have a post announcing it, and you can just click on the link here and it looks like it's going to take you to a type form to fill the survey. So I encourage everyone to go ahead and do that.
Next piece of content- "An automatic indexing system for Postgres: How we built the pganalyze Indexing Engine". This is from pganalyze.com. Now, this is actually a part of their product, but they're kind of opening up the curtain a bit to help you understand how they actually created this process. So they built it as a way to give you index suggestions for your database, essentially. And they call it AI-assisted, so they're using some AI techniques to determine what index should be created. But it's not something that's done automatically, it's developer-driven, meaning it's kind of giving these suggestions to you. At least that's my interpretation. Basically, how it works is that it analyzes your database, it's connected to it, and it does query analysis.
So it looks at things like pg_stat_activity. Maybe it looks at pg_stat_statements. They didn't mention that, but maybe it does. And it collects all the different queries that are being run. Then from that, it assesses what tables are essentially being scanned. And then the next phase looks at the indexes. So what indexes exist and what indexes could be potentially introduced or combined. And in that process, it's doing essentially the AI part of it, the what-if analysis. So what if we had this index? Or what if we combine these indexes into a multi-column index? Now, how they're doing this is they actually rebuilt the Postgres planner, I'm assuming for different versions in their software product itself. So basically, they're using their software product to do this what-if analysis.
So it's not impacting your production environment at all, presumably. The only impact would be to my understanding, is it collecting the metrics to be able to do the query analysis and to know what indexes are already in place on it. Then from that point, it gives recommendations and analyzes from a cost improvement analysis. So in other words, how much could an index improve performance, but also from a write overhead, because anytime you're going to add a new index, it's going to have to be maintained and therefore it's going to have an overhead for writes to that table. So apparently it does an analysis of that as well to give you proper recommendations. Now, this post goes over a lot of detail about how they design this, how it's set up, kind of how it's working right now, and I definitely found it pretty interesting. So definitely suggest checking that post out.
A companion post to it, a more product-oriented one is "A balanced approach to automatic Postgres indexing: The new version of the pganalyze Index Advisor". This shows a few more product images, for example. So they're showing you here on this dashboard where you could see, essentially, the index recommendations, their relative assessment of its impact, the write costs appear here. This may be a little bit hard to see, but it basically shows these suggestions if you'd like to implement them. It's, again, my understanding of how this works. And they also show information with regard to the write overhead as well as certain of these recommendations. So if you're interested in that, definitely encourage you to check out these two blog posts.
Next piece of content also from pganalyze.com is "5mins of Postgres E21: Server-side backup compression with LZ4 and Zstandard in Postgres15, and the removal of exclusive backup mode". So the first thing that he discusses is the removal of the exclusive backup mode for backups. This is where you have to do a pg_start_backup and then backup the files manually and then do a pg_stop_backup. Now, most recently, I've used this for doing snapshots of the databases as a backup technique. I prefer to use pg_base_backup, but there are certain database systems where clients were interested in just relying on snapshots as opposed to being able to do a point-in-time recovery. And for that purpose, that's what I'm using some of these for. But in the next version of 15, the exclusive option is no longer available, which I wasn't using anyway. But what's interesting is they actually changed the name of the function.
So some people's backup scripts will no longer work because it has gone from pg_start_backup to pg_backup_start. Now, they did this so people would be aware that essentially the parameters changed. But it basically means I'm going to have to go in and change certain backup scripts and you will as well if you are using this technique. The other part that he covers is the server-side backup compression and the support for lz4 and zstd. Now, this has been covered in previous episodes of Scaling Postgres, and I must say as I've been using it with different clients, zstd has been an amazing improvement over gzip. So even though the server side backup isn't available now, I'm using it as a way to compress terabytes of data much more quickly and using much fewer CPU resources. In some cases a quarter of the CPU resources for the same end result. So if you're using something like parallel gzip or pigs on a Linux system, highly suggest checking out zstd to see if that will benefit your use case. But if you want to learn more, definitely check out this episode.
The next piece of content- "Understand PG's MVCC Visibility Basic Check Rules". This is from highgo.ca and this post is basically about how Postgres goes about determining if a row is visible and it goes through the process. So it talks about some of the information of the additional columns that are on each tuple or essential on each row. And it goes through the process. At first, it checks the hint bit and it goes through the different steps of that. Then it checks if the tuples xmin is equal to the global top transaction ID, and it goes through the process for that. As well as checking the current snapshot and finally checking the commit log. So if you want to have a better understanding of how Postgres checks the visibility of a row for a given session, definitely check out this blog post.
The next piece of content- "Update on the Trademark Actions Against the PostgreSQL Community". This is from postgresql.org. So this is an announcement that was actually made back in April 2022, and it's talking about the trademark issues that have been mentioned in some previous episodes of Scaling Postgres. I mention it here because very recently in June, the organization that they are discussing, located at postgresql.fund, is talking about an update on the trademark actions and it's essentially a reply to that. So if you're interested in that type of information, definitely check out these blog posts.
The next piece of content, the postgresql person of the week is Hironobu Suzuki. If you're interested in learning more about Hironobu and his contributions to Postgres, definitely check out this blog post.
The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one was on "Reviewing the 2022 Rails Community Survey - Part 2". So basically we finalize our review of that survey. So if you're interested in that type of developer content, definitely welcome you to check out our show.