Features To Avoid? | Scaling Postgres 351

In this episode of Scaling Postgres, we talk about some features you may want to avoid like commit_delay, index_cleanup off, & logon triggers, a way to detect a table rewrite and a zero downtime upgrade.

Content Discussed

YouTube Video

Podcast Audio

Transcript

Postgres has a lot of great features, but there might be a few features that you want to be careful about how you implement them or even if you implement them at all. And the first set of blog posts this week talks about some of these features that you may want to avoid. Or just be careful when you use them. Also, be sure to watch till the end of the video to get my Consulting Corner segment where I talk about a of a mystery I experienced, but I hope you, your friends, family and co-workers continue to do well.

Our first piece of content “COMMIT_DELAY for better performance: a PostgreSQL benchmark”, This is from cybertec-postgresql.com and he's talking about not only COMMIT_DELAY, but COMMIT_SIBLINGS. And to be honest, I haven't seen these settings used in in any of my clients that I've worked with, so I thought this was pretty interesting. Basically he was doing some training and he was showing how when you set synchronous commit off, how much faster postgres can do transactions. That's because it's not having to write directly to the disk for every transaction that's being committed. Because normally when you do a commit in a transaction with synchronous commit on, it writes that commit to the wall disk before returning control back to the session that sent the commit to the database. But apparently you can use COMMIT_DELAY and COMMIT_SIBLINGS to actually increase your transaction throughput by, I guess, batching essentially your wall rights. So when you set a COMMIT_DELAY to something greater than zero, “Whenever a transaction reaches the point where it would flush to the wall during a disk commit, it sees if there are other transactions that are currently active and if there are at least COMMIT_SIBLINGS other transactions open and not waiting for a lock, which by default COMMIT_SIBLINGS is set to five. Postgres doesn't flush the WAL immediately, but it waits COMMIT_DELAY microseconds. So basically it's hoping to batch Multiple transactions to 1 IO write if it can”. Now he says out of the gate COMMIT_DELAY is not easy to tune because this delay makes the transaction longer. So essentially you're adding latency to each transaction. But this could be beneficial if you are constrained with your write disk for your WAL. So he basically shows how you can do this. So he set up an example on his notebook and using PGBench. But the problem he encountered is that his NVME drive was so fast he could never saturate WAL writing on this system. So he actually used Linux control groups to throttle the wall writing to 1000i OS per second. So that's the only way he could show this behavior. But when it's set to 1000 IOPS per second as the absolute limit for the disk, he introduced a COMMIT_DELAY going from 0 not on to 10 microseconds, all the way up to to almost 2000 microseconds, which is about 2 milliseconds. So the delays we're talking about are very short. But he was able to get transactions per second going from 1500 at a COMMIT_DELAY of 0 to 2700 at a COMMIT_DELAY of 1000 microseconds, or 1 millisecond, I think that is. So if your WAL disk is bottlenecked, this shows how you could extend COMMIT_DELAY to be able to push up your transactions per second. But the question is, should you use this? I mean, frankly, I would probably opt for just using a faster disk as opposed to having to adjust these values to try and find the sweet spot. But if you are constrained with WAL writing, these are some settings you may want to tweak to get better performance.

Next piece of content. “VACUUM (INDEX_CLEANUP OFF) Considered Harmful”. This is from thebuild.com and when you run vacuum, there is an option to turn index cleanup off. So vacuum's job is basically to clean up dead tuples, among some other responsibilities. But one of the first things it needs to do is remove these dead tuples from the indexes before it can remove them from the heap or the main part of the table. But the thing about it is, most of the time spent vacuuming is done vacuuming up the dead tuples in all the different indexes. And what index cleanup off does is it avoids running that phase of the vacuum. So that allows vacuum to run much faster, but it's not removing those tuples from the indexes. And he says, “This means it also can remove dead tuples from the heap”. So basically, setting index cleanup off defeats the whole purpose of what vacuum is supposed to do. Basically clean up dead tuples. Then of course, the address is, well, when would you use it? And it basically has to do with freezing transactions. So if you think you might be coming up on the 2 billion limit for transaction IDs, you could run a vacuum with index off to just get a fast vacuum done that freezes tuples, so basically gets you a very fast vacuum to alleviate transaction ID wraparound pressure. But it's definitely not something you should run consistently because it's not cleaning up the dead tuples. And he basically said, you know, just keep in mind, this is not a “Make vacuum go faster option”. So again, another example of a feature that exists that you only want to use in specific cases. And you can check out this blog post if you want to learn more.

Next piece of content “Logon trigger in PostgreSQL”, this is from cybertec-postgresql.com and this is a new feature in 17 that there are now event triggers. And you can create an event trigger for when say someone logs on to the system to do some activity. Maybe this is logging, the fact that someone logged in. So he created a database, created a table to track who's logging in. Then he created a function that returns an event trigger. So it needs to return an event trigger and what it just does is insert who the session user is into the table that he created. Then you create an event trigger, give it a name, make sure the event you specify is on login and you execute this function that was just created. Now he says you also have to alter the event trigger and enable it always. I'm surprised there's not a way to enable it when you actually create it. But this is the two step process presumably you have to do. Now you might be thinking, well, that seems pretty simple to set up. What's wrong? Why should this particular feature be avoided? He says, well, you want to be very cautious about having any errors in your function. So he introduces an error, select 1/0;. So you're going to get a divide by zero error. And the problem is when anyone tries to log in, it basically crashes the session, I'm assuming, and you can't log in. And the only way to get around it is to start up postgres in single user only mode. That's definitely something I would not want to do in a production database. So basically login triggers are a great addition, but you just need to be very cautious with how you use them because you could lock yourself out of your server. But check out this blog post if you want to learn more.

Next piece of content. “When Does ALTER TABLE Require a Rewrite?”, This is from crunchydata.com and usually when you're altering the table and say adding a column, it's not going to rewrite the whole table, it just changes the metadata for that table and you can move forward. Now it does require rewrite if you're using a non-static default port like you're trying to set the current timestamp or I think he did an example of showing a random function that rewrites the whole table, but how can you tell if it was rewritten? Well, you can actually look at the file nodes of a particular table to see if they've changed. So if they're the same, no table read write was required. So he's using select pg_rel ation_filenode and giving it the table name. And you can see when you're adding a new column that happens to be an integer, the file node does not change. So it's not rewriting the whole table. But again, when you add a new column with a default with this random function, it rewrites the whole table because the file node has changed. But there are other changes that can cause a rewrite. So for example, if you want to change the column type from an int to a bigint or bigint to int, you'll see that the file node changed, so it had to rewrite that whole table. But this is a very simple, quick technique to know if a particular operation you're planning will require a full table rewrite. Now, he advocates using this on a test table. First, don't do it to your real table, because then you'll actually rewrite it, but do it in a test table and just repeat the operation you're planning to do and you'll see if you'll get a rewrite or not. But if you want to learn more, check out this blog post.

Next piece of content. “A Major Postgres Upgrade with Zero Downtime”, This is from instantdb.com and they planned a major Postgres upgrade with essentially zero downtime or some order of seconds for them. They are using Amazon Aurora. They use some references that other organizations have done to do the migration. They looked into what was available with Aurora. You can do an in place upgrade, but they take 15 minutes plus depending on the size of your database. They looked into blue green deployments, but still the downtime is around a minute or so I think they said. But because they had active replication slots, they couldn't really use the blue green deployment capabilities. So basically they fell back to doing logical replication. They looked at some other solutions that other organizations had tried, but in the end they basically took a dump of the schema, imported it into the new database, created a publication, created the subscription, got all the data in sync, did some validations, ran the vacuum analyze, which is definitely important. And then of course, during the transition, be sure to transfer any sequences that you have, because sequences don't get transferred with logical replication, at least not yet. And they actually developed some code to do a zero downtime cutover because they had some specific requirements, but they were able to do it with very minimal interruption. So if you want to learn more, definitely check out this blog post.

There was another episode of Postgres FM last week. This one was on “Reads causing writes”, Now when you think about this, you can say, well selects can write to disk if a query exceeds its workmen. So it has to write to disk. For example, the sorts or grouping or the hashing is too large for workmen. It can spill over to disk and use the disk that way. But they talked about two rights in particular, even ones that are affecting the buffer pool. So you actually have dirty buffers that happen along with it. One of the first reasons they covered are hint bits. These are a set of bits that kind of help store visibility state apart from the commit log. So it's kind of a little bit like a caching mechanism, but it helps the system know what particular transactions are visible or not. So setting these hint bits can actually cause writes during selects. The other way is something called page pruning. So this is a capability to actually delete or remove transactions in a specific page and actually any kind of query can do this intra page pruning, even select queries. So that's another way you will see writes as a result of selects. But this was a very informative episode. I encourage you to, you know, listen to it here or watch the YouTube video down here.

Now one of the main posts that they highlighted here is this post called “Reads causing Writes in Postgres” by Alex Jesse pal@jessepow.com Forgive me if I'm pronouncing that incorrectly. And he goes into a lot of depth as well about when reads cause writes in postgres. So definitely check this out if you want to learn more as well.

Next piece of content “Indexing Partitioned Table Disaster”. This is from hdombrovskaya.wordpress.com and I don't know if I would call this a disaster. Basically the issue is that when you create a new partition and you have indexes established on the partition table, the parent essentially the names that assigns to the indexes of the partitions are automatically generated. You can't change what those index names are. But during partition maintenance they were running into name collisions because of how they had the indexes set up. So if you want to learn more about that, definitely check out this blog post.

Next piece of content “Postgres Café: Deploying distributed PostgreSQL at scale with Citus Data”. This is from data-bene.io and this is the fourth episode of a new podcast from Databean I.O. and this is the text representation. And then they have, I think, YouTube videos as well. So check this out if you're interested.

Next piece of content. There was a new episode of Talking Postgres called “How I got started as a developer & in Postgres with Daniel Gustafsson”. So you can go ahead and listen to this episode if you're interested.

All right, now it's time for the consulting corner. And I had to deal with a bit of a mystery with a client on this one. So we were observing that a multicore database system would normally run about 20, 30, 40% utilization at the most, CPU wise. But periodically, maybe every 10 to 15 minutes, CPU usage spiked up to 80, 90% at times. So we were thinking, what the heck is going on here? Are there certain queries being triggered at certain times that are causing this? Or is there a flood of activity happening from some source that's generated a bunch of queries? The odd thing is that the number of connections was relatively static. The amount of disk activity was relatively static as well. Nothing significantly changed during these CPU spikes, and these spikes lasted on the order of seconds. So it was hard to determine what was going on. So we started tracking the different queries at the time of these spikes to try and correlate what was going on. And many queries that usually ran in far less than 1 second, probably frequently less than 10 milliseconds. Even these queries during these spikes were running for four to five seconds and they were active. They weren't like idle in transaction or anything. And the wait events were blank. So essentially it wasn't waiting on anything yet. It was taking much longer to run. There was no evidence of blocking. Looked for blocking PEDs for different ones, and nothing ever came up. The only correlation that seemed related is the vacuuming of a particular table concluded during these spikes. So we looked at the system view that shows vacuum's progress to see at what stage this happened at. And it was a little hard to track this because anytime we tried to measure something, the whole system was locked for a number of seconds when these events were happening. But the closest phase that we were able to see was basically vacuuming the heap. So I started looking into what could be lacking about the vacuum process. Is it being blocked by any other process IDs? Couldn't find anything. So right when we said, okay, we need to add even more instrumentation to try and figure out what's going on, someone observed something else running on the system. So there was another process collecting some type of metrics on the system, and once that process was stopped, the whole system normalized. There were no more CPU spikes, queries ran without significant latencies. Everything was essentially back to normal. So I'm sure you might find this a bit anticlimactic, but if you're analyzing your postgres performance, just remember there could be other things running impacting it as well. Just something to keep in mind.

Features To Avoid? | Scaling Postgres 351

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript