background

When Select Writes! | Scaling Postgres 333

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss when select can write, Postgres RC1 is released, Tetris in SQL and copy, swap, drop.

Content Discussed

YouTube Video

Podcast Audio

Transcript

When you're scaling postgres, a lot of times what you want to do is start doing scale out. So for example, maybe you set up multiple read replicas and you want to be able to send read queries to any of those. But still, of course send your write queries to the primary database. And there are some different solutions that do this. I think PG CAT is one pooler that I think it tries to send all the reads to the read replicas and the writes to the primary database. But our first post this week has some warnings, when you're trying to determine what is a write versus a read, and you can't just rely on it's doing a SELECT. But I hope you, your friends, family and coworkers continue to do well. 

 

Our first piece of content is “Why SELECT ALL FROM TABLE is not a read”. This is from cybertechpostgresql.com. and when I first saw this, I was immediately thinking of maybe he's talking about temp tables, where when you're doing a large select, you can write to the disk on the database server if it's generating temporary files. But that's actually not what he's talking about. He has an example where a SELECT can do an INSERT, but before he talks about that, he talks about the wrong way to do load balancing is have some sort of middleware that does the determination of a read goes to a replica and the write goes to the primary, particularly if that determination is only saying if it's a select query, send it to a reader, if otherwise, send it to a writer. So he has a demonstration here where he creates a function that actually inserts a new row as a part of it. So apparently you could just do a select query with this function, and whenever you do a SELECT, it will do an INSERT. But additionally, he wanted to obfuscate it more, and he put it in a view so he could just do SELECT ALL from tab, whatever the table name is. And even though it returns nothing, it has actually inserted a row from this function. So if you query it again, you see what was the results of the previous execution, and now it has two rows in it. And he says, even with doing transactions, this middleware never really knows if you're eventually going to hit a write in it. So, for example, when you start a transaction, you do some selects, but ultimately you may do a write. So you definitely want to make sure this transaction is always going to the writer, so he doesn't feel you can necessarily rely on some sort of middleware to do that decision. And basically his thinking is that it's the application that always needs to make the determination of where particular statements or transactions are sent. You can't rely on a piece of middleware between the application and the database system. So maybe your application has two different connections, one for reading and one for writing, and you send whatever is appropriate based upon what the application is doing to each of those endpoints. But I thought this was a pretty interesting warning, and if you want to learn more about it, you can definitely check out this piece of content. 


Next piece of content “PostgreSQL 17 RC1 is released!”. So we are about two weeks from the release of Postgres 17, which is great. This is the first release candidate. The only unfortunate news, it looks like they reverted the ability to merge and split partitions from this. So basically I think when you have range partitions you could merge and split them apart. Apparently there were some issues with it and they had to pull back that feature. But now's a great time to give Postgres 17 a test if you're looking forward to implementing it relatively soon. 


Next piece of content is actually a GitHub repository, and I don't normally showcase these or different kind of tools, but I thought this was cool. It's “Tetris in SQL” and it runs within postgres. So Postgres is turing complete and someone developed it in postgres. So check this out if you're interested. And the readme goes into a lot of detail about how it was developed. So check this out if you're interested. 


Next piece of content “Shrinking big PostgreSQL tables Copy-Swap-Drop” this is from andyatkinson.com dot and I saw this and I was immediately thinking, well, there's PGRepack and PGSqueeze or vacuum full if you can take a full table lock in order to shrink a table. But he's actually talking about a different use case where maybe people have an event table that's been tracking events for potentially years and there's all this data, but they only ever use like the last 30 days. And for whatever reason they didn't implement partitioning, which I would definitely advocate doing, but they just want to delete everything older than 30 days, so they don't necessarily need that data. So his proposal is to actually create a new table, copy the most recent 30 days of data into it. Then you can swap the table names and drop the old table. So that is definitely more efficient than running a whole lot of deletes and still ending up with a bloated table. So this is a pretty long blog post and he goes into a lot of the detail, but he shows all the code that he's using. But basically he clones the table using a create table, like an existing table, including everything but excluding the indexes, because he doesn't want to have indexes, slowing it down during the table load. He'll add those later, and he wants to copy a subset of the rows at a time. And he's basically using a keyset pagination technique here. So we found the oldest ID he wants to transfer. He transfers 1000 of them. That's just the batch size he wanted to use. And then he used the max id that was transferred from the previous one as the new Floor id for the new query. So he's just sending 1000 records over at a time and then to prepare for the swap. You definitely want to make sure sequences have been updated in the new table as well as adding the indexes back. And then he does the swap procedure within a transaction. So basically within one transaction, rename the original table to retired in this case, and then he named the new table to what the original table was named, as well as grabbing a final set of ids as well if they exist, as well as even a final query outside of the transaction to grab more ids, presumably to grab any transactions that may have been in flight at the time of the transition. He says it is possible to do a rollback, and he gives an example here and then finally dropping the old table. So if you want to learn more about this, definitely check out this blog post. 


Next piece of content there was another episode of Postgres FM last week. This one was on “Skip Scan”, and here Michael and Nikolai hosted Peter Gagan, who's been a major contributor and committer to postgres, and they talked about skip scan. So this is something that Oracle has supported for years, and I think even sqlite supports it. I'm assuming Mysql as well. But it appears like it's getting into postgres with versions 17 and 18. Some initial improvements in 17, but the major skip scan feature to come in postgres 18, and what a skip scan is, it's basically skipping over particular columns in a multicolumn index. That's the most common example I've seen. So normally when you're querying a table and it has multicolumn indexes on it, you'll get efficient scans only if the leading columns are in the where clause of your query. So if you have an index with account id and then some sort of timestamp. As long as you're querying it with the account id and a timestamp in it, that's a very efficient index to use. However, if you queried it where the where clause only had a timestamp, it's either not going to use that index or it will use it very inefficiently, just basically do a full scan of it. But what a skip scan does is it allows you to skip over that first or second column, presumably. So imagine your account id. There are only 100 values in there. So what it does is it takes your query and says “Alright, let me look for that timestamp where the account id equals one and two and three and four”. So basically it does 100 mini queries of that index with your timestamp of interest and you don't have to specify the account. But still it can give you better performance compared to scanning the entire table or scanning the whole index. And it's something with this new feature presumably coming in 18, it can do transparently. So it's not something you have to do as a part of your query, it's just a new feature of the optimizer. So I thought this episode was particularly interesting. You definitely get a lot of info about the optimizer during this episode, but if you want to learn more, you can listen to the episode here or watch the YouTube video down here. 


Next piece of content “We need to talk about ENUMs”. This is from notso.boringsql.com. And he talks about a subject that many people have the viewpoint of, quote, don't use enums. And he's kind of coming around to it because he says there have been some enhancements across different versions of postgres that make them easier to work with. Like postgres 9.1 allowed you to add new ENUMs without table rewrite. As a postgres ten, you can rename the values in the ENUM, and as of twelve you can add a value as a part of a transaction block. And postgres 17 allows you to use new values within the same transaction block. But he talks about enums are implemented and basically enums are internally represented by a number, but you can use them through a text description or a label. But I still don't really like them because you can't really remove them, you can't really reorder them if you ever need to. And what many people do, and I advocate as well, is use check constraints instead of enums. Because in my experience it's much easier to change things if you need to. But he also advocates using reference tables or lookup tables as well, because those give you even more flexibility. So I'm still not definitely a big fan of ENUMs, but he's definitely come around to using them in some cases. But check this blog post out if you're interested. 


Next piece of content “CNPG Recipe 13 Configuring PostgreSQL Synchronous Replication” this is from Gabriele Bartulini it and in this next cloud native PG recipe he's talking about setting up synchronous replication and all the configuration to do that with in your Kubernetes cluster. So if you need to do that, definitely check out this blog post. 


Next piece of content “PostgreSQL snapshots and backups with PG backrest in Kubernetes” so this post talks a lot about their Postgres Operator from Crunchy Data. So basically they're using Kubernetes and they have a way to do snapshots of volumes. And he discusses the two best ways to do these snapshots in a Kubernetes environment. Option one is using the Delta Restore Capability of PGBackrest, primarily to avoid the risk of corrupted blocks when you're doing a snapshot, or secondly using a standby replica. And usually I've seen ones that have been shut down. So they take a replica that's in sync, they shut it down and they do a snapshot of that. But if you want to learn more, check out this blog post.


Next piece of content “Point In Time Recovery and Streaming Replication Environments” this is from dataegret.com, and they basically set up a way to do streaming replication from a primary to a replica, and then perform a point in time recovery on the primary. And then the question is, how do you resync the standby? The first way, of course, is to delete the replicas database and basically build the replica from scratch, again using PGBase backup. But he uses a technique where he actually restores the primary database to it and then gets it in sync that way. Although he's using a technique where he restores the primary, he grabs the LSN and then he recovers the database from the backup using that LSN as the recovery target. But my concern is, “If it's an active database, is there data missing between them somehow? So I'm not quite sure how reliably this would work on an active system, because the other way to do this is PG rewind, which is a utility that synchronizes the data directory between the primary and replica as a way to get things back in sync. But this technique may work for you, so you can check out this blog post if you're interested. 


And the last piece of content “pg_auto_dw: a Postgres extension to build well-formed data warehouses with AI”, this is from timbo.io, and I hate to say it, but that's part of the title. Seems like a little bit of an oxymoron. Can AI build well formed data warehouses? I don't know. I mean, it's a cool concept. Basically, this extension inspects your transactional postgres schemas and creates a data warehouse based upon that. So it's pretty cool, but I don't know how effective it would be. But this is a new project, you know, they're saying we're just getting started, but it might turn into something, so check it out if you're interested. 


 

episode_image