5 Minutes of Postgres, Unnest, SCRAM, Multi-master Review | Scaling Postgres 200
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss the new series 5 minutes of Postgres, using unnest, switching to SCRAM authentication and reviewing multi-master solutions.
Content Discussed
- 5mins of Postgres E1: Using Postgres statistics to improve bad query plans, pg_hint_plan extension
- 5mins of Postgres E2: Using unnest(..), generate_series(), and PostGIS
- Postgres UNNEST cheat sheet for bulk operations
- PostgreSQL 14 and Recent SCRAM Authentication Changes – Should I Migrate to SCRAM?
- PG Phriday: Replication Engine Potpourri
- Accelerating VACUUM in Postgres
- Postgres SQL Lessons From Advent of Code Challenges
- Not so equal texts in PostgreSQL - and how to compare texts in a more elaborate way
- "Read-Only" Mode for PostgreSQL
- Waiting for PostgreSQL 15 – Add HEADER support to COPY text format
- Elevation Profiles and Flightlines with PostGIS
- Transaction ID and Snapshot information functions
- Flavio Gurgel
- Rubber Duck Dev Show Episode 27 | Refactoring in the Wild
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Welcome to our 200th episode. I've been doing this for about four years and I don't intend to stop anytime soon, so I hope you do enjoy the content. What I'm doing on a weekly basis is reviewing all the content I can find in Postgres and present, roughly, the top ten articles that are present just to share with you. I also try to do it in a prioritized order.
Our first piece of content is actually the start of what looks like a new video series called "Five minutes of Postgres". This is "5mins of Postgres E1: Using Postgres statistics to improve bad query plans, pg_hint_plan extension". This is from pganalyze.com. So it looks like this is something they're doing on a weekly basis. I have the two episodes for the first and second pieces of content here. This first one talks about where someone achieved a 200 times faster improvement by adjusting the query.
Basically, the key problem that he reviews in the video as well as talks about here, is that it looks like the main problem was that the planner had invalid statistics and it thought there were more rows it had to search through than it did. So it chose an inappropriate plan. Now, when I've encountered issues like this, I have to increase the amount of sampling done for statistics. Or you can create extended statistics looking at more than one field to add relationships to the data. And he talks about this here in this post. He uses other posts as references like this article that shows using Postgres extended statistics to achieve a 3,000 speed up. So this episode goes through this process and I definitely encourage you to check it out.
The next episode was "Using unnest(...), generate_series(), and PostGIS". So he talks about an article with regard to unnest that I'm actually going to cover here. With generate_series we covered last week as well as a PostGIS article that is also in this week's Scaling Postgres, but definitely some more video content if you are interested in checking that out.
Now one of the unnest articles was discussed, and they did discuss others in "Five minutes of Postgres", but this article is "Postgres UNNEST cheat sheet for bulk operations". This is from atdatabases.org and they're talking about using unnest to basically take an array and create rows out of it. That's what unnest does. So he says it is actually faster to insert records when you use unnest. Now, I always like to see performance comparisons and unfortunately, this post doesn't do that, but this sounds like why it's faster because there's less parsing required because this is essentially just a multi-row insert and you avoid parsing all of that by just giving it an array, and presumably this is faster.
He also shows how to do this with UPDATE records. So basically, converting arrays into records that you want to insert. He says the normal update statement only really lets you update multiple records in one go if you want to set them all to the same value. Well, you can do that if you're using a subquery, so you can update based on the value in a table. But I suppose this technique has advantages if you want to present that data as an array. So there is another way to do it. So I think you can do some of the same things, but maybe there are some use cases where unnest has some advantages. But definitely an interesting blog post. I encourage you to check it out.
The next piece of content- "PostgreSQL 14 and Recent SCRAM Authentication Changes - Should I Migrate to SCRAM?". This is from percona.com, and what makes this post relevant is that Postgres 14 has set SCRAM as the default. So some people are running into particular issues when they upgrade to 14. Part of their environment wasn't ready, or maybe some of the libraries weren't ready for it. But basically, 14 is using SCRAM by default now. It's just a default. You can still use MD5, which was the normal way of handling passwords before, but SCRAM is the default. This post talks about some different settings to be aware of in using it, such as setting channel binding, potentially when you're connecting, as well as when you're setting password encryption. Because setting it to on doesn't work anymore, you actually need to say you're setting it to scram-sha-256, where you're setting it to MD5. So if you want to learn more about it, you can check out this blog post.
The next piece of content- "PG Phriday: Replication Engine Potpourri''. This is from enterprisedb.com and they're basically reviewing a collection of different multimaster utilities that work with Postgres. One is Bucardo, the other is SymmetricDS, and another is the EDB Replication Server, SharePlex, as well as BDR, which is bi-directional replication. They talk about each of these solutions and the different ways that they achieve a multi-master Postgres installation. So if you want to learn more about that, definitely check out this blog post.
The next piece of content- "Accelerating VACUUM in Postgres". This is from dataeget.com, and they're talking about some techniques you can use to accelerate your vacuum. Now, whereas I agree with the parameters that need to be changed to accelerate vacuum or autovacuum, I would place emphasis on other areas, like for example, the vacuum_cost_limit, which is not even in the listed values here. To me, this is the most important value to change because this sets how fast vacuum can progress. In other words, it does work until it reaches a particular cost limit and then it stops its work to allow the database system to continue doing its work. So this is a value from 1,000 and I've put server systems on it where it's 8-9,000 because they were very large multi-terabyte systems with a lot of CPUs that had the processing power to spare to run vacuum frequently.
And I would say this is the number one thing to literally make vacuum go faster. Now, apart from that, you can drop the costs that get added to this. So reduction of these different costs that they list here the page_hit, the page_missed, and the page_dirty. You can reduce those costs and that essentially is equivalent to increasing this, but it's basically allowing vacuum to work more. Now their first mention here is increasing the number of autovacuum_max_workers. Well, you have to be aware that this is governed by the vacuum_cost_limit. So if you leave this at its default and just increase the number of max_workers, each worker will be slower because this is a global limit.
So if you go to three to ten, each individual autovacuum worker will now be three times slower because it's measured by the same limit. Now, this will help you vacuum more tables in parallel, but in my experience, the issues with vacuum being slower, it's usually on one table that it's just taking forever to vacuum. So literally what you want to do is have the vacuum run faster. And generally, I usually keep the automax workers at three. I've brought it up as high as six if needed, but that is usually not the area that needs adjustment. And of course, the other things they mention here, reducing the nap time can be important for running autovacuum frequently enough, as well as defining when autovacuum runs, so adjusting the different thresholds that they mentioned here. But if you want to learn more about auto vacuum configuration, you can check out this blog post.
The next piece of content- "Postgres SQL Lessons From Advent of Code Challenges". This is from heap.io and they're talking about some different code challenges they used SQL for and they used particular features of SQL such as window functions to get particular ranges. They used CTEs or common table expressions to create readable application code. One of the challenges, it was beneficial to think relationally instead of iteratively to come up with a particular solution. And lastly, there was one case where the SQL was much more compact compared with a JavaScript solution. So definitely a post to explore different ways you can use SQL.
Also related to this is the post "Not so equal texts in PostgreSQL - and how to compare texts in a more elaborate way". This is from andreas.schrebaum.la and they're trying to do string comparisons where they look equal, but they're actually not. This postcode takes it in a different path of how to actually do the comparison by creating its own data types and operators. So this is definitely an example of how extensible Postgres can be. If you're interested in that, you can check out this blog post.
The next piece of content- "'READ-ONLY' MODE FOR POSTGRESQL". This is from jkatz05.com He's talking about a parameter called default transaction "read-only". If you set this, you can no longer send DML to the server. Basically, you can only send it to select queries. You can't CREATE, UPDATE, INSERT, or DELETE data. This is actually a session variable. So you can set it per session. You can set it for your system, but the thing to keep in mind is that an individual session can turn it back off. So it's not great for ensuring that a database is in read-only mode, but you may find a use case for using it. So if you want to learn more, you can check out this blog post.
The next piece of content- "Waiting for PostgreSQL 15- Add HEADER support to COPY text format". This is from depesz.com and basically, much like the CSV output for COPY, you can add a HEADER. Well, now you can do it with the text format as well, and it gives you a HEADER in that case. So that's apparently coming with Postgres 15.
The next piece of content- "Elevation Profiles and Flightlines with PostGIS". This is from blog.crunchydata.com and this is a scenario where someone was trying to project flight paths at a particular elevation. So if you're interested in geographical information systems, this is a post you'll probably be interested in checking out.
The next piece of content- "Transaction ID and Snapshot information functions". This is from highgo.ca and they're talking about the transaction function txid_current that gives you the current transaction you're on, and a few other functions. Well, they've actually had their names changed. I mean, the old ones still work, but they started creating new ones that are pg_current_xact_id. So there are new functions for these in the newer versions of Postgres, and it explains how each of these are used. So if you're interested in that, you can check out this blog post.
Next piece of content. The PostgreSQL person of the week is Flavio Gurgel. So if you're interested in Flavio and his contributions to Postgres, definitely check out this blog post.
The last piece of content, we had another episode of The Rubber Duck Dev Show this past Wednesday. This one was on "Refactoring in the Wild". Basically, in the real world, how would you refactor your code? So if you're interested in a long-form developer discussion format, you can definitely check out our show.