Scaling Suggestions, Slot Safety, SCRAM Channel Binding, Effective Cache Size

In this episode of Scaling Postgres, we discuss suggestions for scaling postgres, replication slots safety, implementing scram password authentication with channel binding and setting effective_cache_size.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right, I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Some SQL Tricks of an Application DBA". This is from hakibenita.com and he's giving some scaling suggestions for different ways to use the database. Now, the comic that is embedded here is particularly great because you have a couple of developers who are trying to figure out the best way to optimize a query. Then a magical DBA appears and the query goes from seconds down to 0.033 seconds and they're amazed at the magic that was done. But moving on to the post's actual content, it goes over a series of suggestions for things more performant or more efficient. I'll just mention some of the highlights here like UPDATE only what needs updating.

So don't, for example, run a function against every column in the database. Be sure to use a WHERE to reduce the surface area that you're updating. In PostgreSQL's case, this also limits the amount of bloat you're creating in the table. Next is to disable constraints and indexes during bulk loads. Very important. Next is to use UNLOGGED tables for intermediate data. So UNLOGGED tables don't get written to the WAL and they are much faster. The problem is there's no crash safety. So if you have a crash in the middle of it, the data is essentially lost. It's totally unaware that that table exists and it's not going to try and rescue it. The next one is to implement complete processes using WITH and RETURNING. So basically using CTEs to be able to develop a stepwise process and use the results of a previous query to move on to the next.

He addresses the issue of identifying duplicates and resolving them using a CTE. However, at some point, he does mention some caution when trying to execute DML in a CTE, so need to be aware of that. But I've noticed a lot of programmers like using CTEs. It's easier for them to reason with compared to, say, using subqueries. The next section is to avoid indexes on columns with low selectivity. So basically, in a boolean column where you only have two values, it doesn't make sense to necessarily index that because there are so many values that are the same. So using an index is not efficient. However, the next section he talks about here is partial indexes. So partial indexes are great to use on booleans. So if you're only interested in one of the booleans, you could just create one partial index, or if you're interested in both, you could create two. The other way you could use partial indexes, of course, is if you have, say, statuses like you have four or five statuses.

Sometimes it gives you better performance by creating four or five partial indexes versus one and trying to find all of those statuses with a single index. The next is always load sorted data. So basically when you're loading data in, always do it in a sorted fashion so you have a high correlation between what's on the disk and what's in the tables or indexes. The next section is index columns with high correlation using BRIN. So BRIN is the block range index and it indexes ranges. So one disadvantage of BRIN is that it's not as fast pulling out single values compared to a B-tree index, but it is still relatively fast and it gives you dramatic space savings.

He had an example here where I think the BRIN index was 46 times smaller than the equivalent B-tree index and for that, it's smaller and easier to cache because it's so much smaller, but pulling out a single value is not as efficient. But these are good for things that have a high correlation, for example, sequential IDs or date times. The next section is to make indexes invisible. So basically transactional DDL statements and the last suggestion is don't schedule long-running processes at round hours because nearly everybody tries to record something at a round hour and you basically get spikes in your database usage during that time. So basically offset it a bit. So if you want to review these suggestions in detail, go ahead and check out this post.

The next post is related to the previous one and it's "Scaling relational SQL databases". This is from stribny.name. Now this is not Postgres specific, but so much of it applies to Postgres, but he's considering MySQL MariaDB or PostgreSQL and this is basically a hit list of things to consider to improve your database performance. He talks about step one: update your database. Get on a more recent version to get the best performance. Ways you can scale vertically, look for ways you can potentially leverage an application cache. And he goes on and on with different areas, you can examine to get better performance out of your database.

Now one point that didn't seem to really apply to Postgres, but you need to be aware when you're making changes to your objects in the database. But he says to make ALTER TABLES work. So I don't know if this is a problem with MySQL or not, but I haven't really seen this issue with Postgres. There are definitely issues, particularly with earlier versions when you're adding a column and wanting to make it NOT NULL or set a default. There are certain things you need to take care of when you're doing that. Less so in more recent versions. But I've never had an issue with altering the table wasn't necessarily fast. So again, if you want to check this out as a reference, this blog post.

The next post is "PostgreSQL 13: Don't let slots kill your primary". This is from 2ndquadrant.com. This is referring to you having a replication slot on your primary database for a replica, say. Now if that replica goes down, that slot is going to maintain the WAL on the primary and save it there, and not archive it until that replica comes back. So essentially, you can run out of disk space on your primary database and bring the database down if you don't have alerting set up for your disk space. But with version 13, there's a new setting that's called max_slot_wal_keep_size. So basically define the amount of WAL to keep around if the replica goes down or stops connecting to the primary. So at some point, it's going to deactivate that slot, which means the WAL can then be archived and the primary can continue functioning. So basically it's a trade-off.

Your primary can stay up and running, not run out of disk space. But the disadvantage is this will kill your replication with a replica. Now, by default, they say that max_slot_wal_keep_size is zero. So this isn't active, but you can set this at a particular size of WAL to keep around to enable this feature. And he says there are also two different parameters that have been added to pg_replication_slots that you can monitor. One is a wal_status that gives an indication of whether it's reserved, it's unreserved, whether the status has been lost, so it's lost communication with the replica. The second is the safe_wal_size. So this is a really great feature coming in 13 that I'm definitely looking forward to. I'll probably start using more replication slots because of this because a lot of times I haven't used replication slots because of the disk filling risk. But this seems to mitigate that issue. So if you're interested in learning more, check out this post.

The next post is "Waiting for PostgreSQL 14 - Rename wal_keep_segments to wal_keep_size". This is from depesz.com. So this is the one coming in over a year from now. Basically, because of the previous setting where they're allowing you to define in terms of size how much WAL to keep around, it looks like they're renaming how many segments to keep around to the size of the WAL you want to keep around. So it looks like they're moving to a size parameter to retain WAL versus just an account. So if you're interested in looking at this change, you can review the patch note that is right here.

The next post is "How to securely authenticate with SCRAM in Postgres 13". This is from citusdata.com. Now, we've had a few posts in recent episodes of Scaling Postgres about SCRAM authentication in PostgreSQL. Now this is addressing 13 because in 13, the client allows channel binding to be done, which helps the client actually validate that the server is who it says it is. This post also does a really good job of describing how to set up SCRAM authentication with channel binding in Postgres 13. So it goes over in detail all the steps that you would need to use to set this up and get SCRAM working on your Postgres instance. So if you're interested in doing that, check out this post.

The next post is "EFFECTIVE_CACHE_SIZE: A PRACTICAL EXAMPLE". This is from cybertec-postgresql.com. This is talking about the parameter effective_cache_size, which essentially communicates information to the optimizer to determine how to cost out different queries. Primarily, this helps determine whether index scans should be used or not. Now, as an example, they created a random table with about a million rows and then an ordered table with about a million rows. They set the effective cache size very low to 1 MB, turned off bitmap scans, and then ran a query that got a particular cost estimate. Then they ran it setting the effective_cache_size to 1,000GB, so a lot higher than 1 MB. Here the cost estimate was smaller, so basically larger cache sizes.

The optimizer assumes that more of the indexes will be in memory, so it projects a lower cost for that compared to smaller cache sizes. But now both of this was done using the random table that they created because, with an ordered table, the cost estimates make no difference, no matter the size of the effective cache size. That's because of the correlation between the data on the disk and the index itself. So it already believes it's going to be pretty efficient to pull out that data. In terms of recommendations and in terms of effective cache size, they recommend about 70% of the memory, although I've seen estimates between 50 and 75%. So basically you want to pick one of those and then adjust to see what gives you the best performance for your queries. So if you want to learn more about this setting, check out this post.

The next post is "Migrating to PostgreSQL Version 13: Incompatibilities You Should Be Aware Of". This is from percona.com. They're talking about some incompatibilities moving from 12 to 13, I believe. So for example, some of the ones they mentioned are similar to ESCAPE NULL and the substring of RETURN NULL. The next one jsonb_to_tsvector properly checks the string parameter. The next one relates to ltree. So a lot of these don't seem to be very commonly used, but definitely some issues to be aware of. They run through about twelve different changes here. So definitely before you upgrade to dip 13, you should check out this blog post so you can see if you're using any of these features.

The next post is "Webinar: Being Committed - A Review of Transaction Control Statements 2/3 [Follow Up]". This is from 2ndquadrant.com, and it's a post about transactions related to replication. So for example, how do transactions get replicated to replicas? How do your settings for synchronous_commit impact when data appears on the replica as well as for performance? So basically, the more durable you make your communication with a replica, say synchronous, replication, the higher latency you introduce as well as reducing the throughput. But if you want to go faster and have a higher throughput and less latency, you're going to sacrifice some durability. You're going to use asynchronous replication and maybe not wait as much to have the data up to date on the replica. They address a little bit of vacuum and freezing. So if you're interested in this webinar, you can just click the link here to go ahead and register for it.

The next post is "SQL Style Guide". This is from sqlstyle.guide and it's a published style guide for SQL Code. So if you're interested in adapting a style, this is one interpretation of it. I know this is very subjective, but I saw this post and thought it was interesting.

Now, related to it, Bruce Momjian's blog at momjian.us mentioned "pgFormatter". So basically it takes your SQL code and formats it in a more friendly format. So he has had some good luck with that. So if you're looking for a tool to help you format your SQL files, maybe check out pgFormatter.

The next post is "Connection Pooling in Pgpool-II". This is from b-peng.blogspot.com, and this explains a little bit about what Connection Pooling is and how Pgpool II does it. So if you want to learn more about Pgpool II and how to set it up, I've found this to be a really great post explaining how to do it compared to some others that I've seen. So if you're interested in that, check out this blog post.

The next post is "Announcing pgBackRest for Azure- Fast, Reliable Postgres Backups". This is from crunchydata.com. So it describes how to set up pgBackRest as well as connect it up to Azure to store your backups. So if you're interested in that, check out this post.

The next post is "Backtraces in PostgreSQL''. This is from amitdkhan-pg.blogspot.com, and he's basically describing how you can set up backtraces in PostgreSQL to diagnose errors that you receive. So check this post out if you're interested in that.

The last post, the PostgreSQL person of the week is Gilberto Castillo. So if you're interested in learning more about Gilberto and his contributions to PostgreSQL, definitely check out this blog post.

Scaling Suggestions, Slot Safety, SCRAM Channel Binding, Effective Cache Size | Scaling Postgres 125

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript