background

77 Times Faster In Postgres 17 | Scaling Postgres 337

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss how one query runs 77 times faster in Postgres 17, a detailed Postgres 17 performance webinar, using logical replication fail over slots and a discussion on Patroni.

Content Discussed

YouTube Video

Podcast Audio

Transcript

As I'm getting ready to record this hurricane Milton is about to head into Florida and essentially go right over my house. Now I'm not on the coast, I am in the middle of the state, so that affords some protection. But if this episode happens to be late or maybe the next one, you know why. But I hope you, your friends, family and coworkers continue to do well. 


Our first piece of content is “As Rails developers, why we are excited about PostgreSQL 17”, this is from benoittgt.github.io and this person works at a company Lifen where they're using Ruby on rails and they started seeing a performance problem with this query. I don't quite know what version they're using where they started seeing this problem. Maybe it was 16 or 15, I'm not quite sure, but it's basically two IN clauses, one an IN clause with status, one an IN clause with a sender reference and then ordering by a date and limiting it. And to them it was slow as giving this type of explain output where it was using a docs_sent_at_idx, which I don't see the definition of here, but because it says docs_sent_at it sounds like it was an index on this order by so we discussed this a few weeks ago, where sometimes when the planner has difficulty with filtering by aware clause and they even said down here quote postgresql is not optimized for queries with multiple IN or any parameters. So that's this very case where you have two IN queries within the where clause, it falls back to using the sent at. So it's basically using the order by because it says well, I just need to find 20, so I'll just start scanning from that point. And that means that everything in the where clause here is being filtered, so it's not using the index at all for filtering, which is what makes this query so slow. But there's an enhancement in Postgres 17 to optimize the case of having two IN queries. So they did this fabricated example with another docs table and inserted 100 million rows into it, along with adding this multicolumn index, which I'm assuming was present in the example above. So still, even though these two columns were present, maybe it wasn't using it and the performance results were astounding. In other words, on Postgres 16 on their Rail 7.2 app with a warm cache, the query executed in 8.45 seconds, whereas in postgres 17 the query ran in 0.11 seconds. So that's an amazing performance improvement. And basically what it's probably doing is not filtering but using an index condition on all of these columns. So even though we've seen some blog posts talk about, you know, 10% improvement, 20% improvement, there are some really amazing wins that can happen for individual queries by upgrading to postgres 17. So check this out if you're interested in learning more. 


Next piece of content is actually a YouTube video, and it's a webinar recording that Lucas did at PG analyze and it's “Hands on Postgres 17: What's New & How It Impacts Performance” And full disclosure, how I found this first blog post is because it was mentioned in Lucas's webinar here, but this webinar is about an hour, and most of it he does cover the performance improvements. So if you want to do more of a deep dive into learning about the enhancements with regard to streaming IO improvements for in and any lists, the vacuum improvements, especially with regard to memory usage enhancements to the planner as well as explain, including the options of explain to be able to show memory with regard to planning as well as the serialize option to show timing relative to how long it takes to do serialization of the data to be able to send it over the network, as well as the SLRU cache changes. He does follow up at the end and talk about some of the other features, but if you wanted a deep dive on some of the performance enhancements, I definitely encourage you to check out his video. 


Next piece of content “Failover Slots in PostgreSQL-17: Ensuring High Availability with Logical Replication” this is from amitkapila16.blogspot.com and he goes through the process of setting up logical failover slots for postgres and it's a little bit more work than I anticipated to set this up and get it working because with the release notes it said, you know, when you create a subscription, make it a failover slot, but there's actually a lot more work you need to do to get it working. So first is you need to enable the failover property on the slot. So at the point you do create subscription, you need to say failover true, essentially. Next is you need to synchronize the replication slots, so there is a new sync replication slots parameter you need to turn on on the standby. Third, you need to configure the physical replication slot so it seems like physical replication slots are required to use logical replication slot failover. So you need to make sure that you have that as well as they say, to turn hot standby feedback to on to prevent row removal needed by the standby. Fourth, you need to prevent subscriber overrun. So basically on the primary set, the synchronized standby slots to include that physical replication slot. And then lastly you need to modify the primary connection info on the standby. And I think it's its subscription to be able to include the DB name Postgres and that's necessary for the slot sync worker to connect to the primary. So this is the one I was a little bit confused about because when I set up logical replication I usually connect to the database that has the publisher, but it doesn't seem to be the case. In his example he set up here. So he sets up the primary server, which is also the publisher is node a, Node B is the physical standby, and Node C is the subscriber. Now from the start it's subscribing from node a, but after the failover it will be subscribed to node b. So he sets up node a, creates a table and a publication. I don't know which database he created this in. When the subscription is created it's connecting to the postgres database. And historically I've always connected to the actual database that had the publication. But continuing on, once the publication is created on the Node C, you go ahead and set up the database, create the table on the subscriber, and then you create the subscription again. Connecting to the postgres database with failover equals true. Now he verifies that the logical replication is working. So he, he inserts under the publisher and can see the information on the subscriber. Then he sets up node B which is the physical standby. So he basically does a base backup from the primary. Then he creates the physical replication slot which is required on the primary. And on the standby he turns sync replication slots on, defines what the primary slot name is and then turns hot standby feedback on. Then he restarts the standby server and gets everything in sync. And from the PG replication slots you can see the logical replication slot is enabled for failover and it is being synced. Both are set to true. He then tests it out by stopping the primary. He promotes the standby. And then to upgrade the subscription you actually have to first disable the subscription, alter the subscription to point to the new primary, the one that you promoted. Then you re enable subscriptions again and it should continue to replicate. So as you can tell, there's a lot of moving parts to get this working. And I'm definitely gonna have to spend some time testing this out to see how well it works. But if you want to learn more, definitely check out this blog post. 


Next piece of content there was another episode of Postgres FM last week. This one was on “Patroni”, so Michael and Nikolai had Alexander Kukushkin join them, and he is both a postgres contributor and the maintainer of Petroni. So if you're not familiar, Petroni is, I think, the go to automated failover solution for postgres. So postgres doesn't do automated failover out of the box you have to run a command like we just saw in the previous blog post. But Petroni is a tool that sets up an infrastructure to allow postgres to automate failover between different postgres nodes so that you always have a primary and replicas that are following it. Now, they talked a lot about how the tool came to be, how it's being used today, how a lot of postgres operators actually use Petroni internally. One notable exception is that Cloudnative PG doesn't use it because it uses essentially the consensus store that Kubernetes provides. But it was a pretty interesting discussion, and if you want to get more detail about Petroni and how it works and how the project came to be, definitely encourage you to listen to the episode or check out the YouTube video down here. 


Next piece of content “Waiting for PostgreSQL 18 – Add temporal FOREIGN KEY contraints”, this is from dipesh.com and last week we talked about them adding a patch 418 for temporal tables. And this adds the ability to add foreign key constraints as well. So when you define a foreign key constraint, you can specify the column address, but then there's this new command period that addresses what timestamp that particular address ID is valid for. So if you want to get started in postgres tables in the next version of postgres, this is a great addition to have that you'll be able to use foreign key constraints. 


Next piece of content “Announcing ParadeDB partnership: Search and Analytics for Postgres”, this is from tembo.io. Tembo.io offers different stacks of solutions based postgres. So if you want something that offers analytics, they have an analytics stack. They also have apparently a search stack. I think there's a time series stack. And for their search stack, apparently they're including PG search and PG analytics from ParadeDB. And we've talked about this previously, ParadeDB and basically PG search uses an elastic search like search engine and embeds it in postgres, so it should give you greater text searching capability compared to what is typically offered in postgres. In addition, their analytics product actually embeds Duckdb into postgres to offer better performance for analytical queries. So if you're interested in learning more about this new partnership, definitely check out this blog post. 


Next piece of content “PostgreSQL adds the login type for event triggers”. This is from fluca1978.github.io and pretty much the blog post name says it all. There are now event triggers for when people log in, so you can have different actions occur once people log in in. This example here looks like they're using it to build some sort of audit system when different people log in. 


Next piece of content, also from Fluka is “PostgreSQL 17 allow_alter_system tunable”. So basically this, as he says here, quotes, allows the capability to disable the alter system command. Now you may be thinking, why would you ever do that? Well, if you're running hosted postgres, this is something you may want to do, and it actually may be something you want to do if you want to only manage postgres through some sort of configuration management tool. I don't know if you would use terraform for that purpose, but like ansible. So maybe you always want to do the configuration there and use configuration files to keep it up to date and not allow the alter system command. So check this out if you want to learn more. 


Next piece of content “AI-assisted Postgres experiment: number of partitions vs. planning time”, this is from postgres AI, and they ran a benchmarking experiment with their AI system here, and they tracked the increased planning time relative to the number of partitions on a table. And as you can see, there's pretty much a linear relationship between how long planning took and the number of partitions in the table. But there's an actual problem with that because it was actually an issue of being quote it's bad only for the first query in the session. So how they did that test was they had a new connection each time and then they destroyed the connection. But if that connection has persisted and you do multiple queries, then yes, the planning time increases, but it pretty much quickly stabilizes quite quickly up to, as you can see here, a thousand partitions in the table. So basically what this tells me is that if you are using partition tables, particularly with a high number of partition table counts, it's essential to use some type of connection pooler, be that PG bouncer or a connection pooler for your application. Basically, you want to maintain open connections as long as you can and don't keep connecting and disconnecting, because then you're clearly going to crash your performance. But check this out if you want to learn more. 


And the last piece of content “Optimizing Postgres table layout for maximum efficiency”. This is from r.ena.to and this is a quick blog post just talking about how to optimize the ordering of your columns in the table to make the table as small as possible. And that's taking advantage of the data alignment rules that exist in postgres. Now, I've never worried about this too much in my online transactional databases, but if you have a data mart or a data lake data warehouse where data is frequently loaded in the and you've got a lot of massive data, it can be important to get the column ordering right so you can save as much space as possible. But check this out if you want to learn more. 

episode_image