Things I Don't Do In Postgres | Scaling Postgres 346

In this episode of Scaling Postgres, we discuss things I do not do such as use default partitions, play column Tetris, ELT using special data types or create 7+ millions tables in a database.

Content Discussed

YouTube Video

Podcast Audio

Transcript

So this week we're going to do something a little different. We'll call this, I guess, one of my theme shows. And the theme is what I wouldn't do with Postgres. So normally I look for content that's the most insightful or teaches me something I didn't know. But more than one piece of content this week I was saying to myself I wouldn't do that. So about the first four pieces of content this week are things I wouldn't necessarily do, but I hope you, your friends, family and co workers continue to do well.

The first piece of content is “Postgres Partitioning with a Default Partition”. This is from crunchydata.com and I know there's a lot of people that use pg_partman and it's recommended. It's not something I've really used for my own use cases. I've used it for clients, but not something I generally do myself. But this particular post is talking about default partitions, and he says here why default partitions are important. And I actually don't like to use default partitions. Because, I much prefer visible errors being presented to me. So if there's a problem in the program that I've written and it's trying to insert data in an invalid way like 100 years into the future or too far back in the past, something that's not expected, I want it to throw an exception and to fail so that then I can fix it so that it's not doing that anymore. Whereas what default partitions do is they give you a dedicated table to catch any rows that are trying to be inserted where a partition doesn't exist yet. So say you have 10 years of partition tables and you try to insert a row 100 years into the future. Well, that doesn't exist in your partitions tables. So it's going to be put into the default partition. It's that if there's no rule that specifies what partition it should go into, it goes into that one. And he says, quote, maybe there's just some bad data getting created. Your default partition can help spot that. And this actually hasn't been my experience because I've gone to clients and they have default partitions enabled. And I said, do you know you have some data in here? They're like, what? Why is data in there? How did that get in there? So people think everything is fine, but they aren't checking these default partitions to make sure nothing unexpected is there. Because as he says, you don't actually want to leave data in there. So now this is something additional you have to monitor to make sure there's no data going into your default partitions, because now you need to move it out and address it. To me, it just seems like more work to do. I much prefer the visibility of getting an exception to saying, uh, oh, there's a problem with the program, let's fix it, as opposed to something that can fail silently. Now, if it's critically important that you cannot lose any data like a failed insert that results in an exception is unacceptable, then okay, I can understand using default partitions. So he talks a little bit about adding a default, and if you actually look at the constraint of a default partition, you can see it's basically what he calls an anti-constraint. So it lists every constraint of every partition table here, so you can see a whole set of ors, because it's everything that does not fit into an existing partition. So this can really make it hard if you have a lot of data that winds up in the default partition to clean things up, because again, you're not really supposed to store a lot of data in there. It's only meant for temporary purposes and you need to move it elsewhere. And he talks about some different ways to deal with that. And because he is the main contributor to PG_Partman, he talks about the different tooling that's been added to pg_partman to handle default partitions. And it does create default partitions by default, pun intended. But if you want to learn more about default partitions and specifically about PG_Partman and how it can help managing those, you can definitely check out this blog post.

Next piece of content There was another episode of Postgres FM last week. This one was on “Column Tetris” and this is another thing that I don't do, or another game I don't play. And from the episode I think Nikolai and Michael are pretty much of the same thought. And what they mean by column Tetris is the fact that your order of your columns in tables based upon the data type, you can actually save some space if you order them appropriately because so much of the data gets padded. Like they were saying that a boolean actually gets padded to 1 byte, so it's not just a bit. And an int4 gets padded to 8 bytes, etc. So because of this padding, if you arrange things efficiently, you can actually save some space in your tables. But the reason I don't play this game is because I'm constantly adding new columns to my tables. So I start off a feature very small and efficient Small table with maybe five columns in it, and then over time as new columns get added based upon each new feature. So trying to readjust and rearrange, it's just not worth it for the benefit. Now, there's two scenarios that I would use to change the column ordering. One is if you have a super large table and you think you can get some efficiency just by ordering the columns better, well, you can go ahead and do that. You'll have to rewrite the whole table. But if those space savings are worth it to you, you can do that. The other way is if you have some sort of a data mart, a data warehouse where maybe you're constantly refreshing the data, well then it makes sense for you to go ahead and spend time working on the column order to make it as efficient as possible. But for typical online transaction processing apps, I personally don't worry about it. Now the thing I thought was super interesting that Nikolai mentioned is a great feature for Postgres to have is if it maintain the physical order different from from the logical order you specify. So if you specify the order of the columns the way you want to, but then Postgres stores them differently and it automatically chooses the most appropriate layout. So if I go to add a boolean on an established table that has maybe 20 columns in it, maybe there's a byte somewhere that it can insert it in to padded space. And now essentially I haven't grown my table at all physically, but I have another column that is being utilized logically. So personally I think that would be a great enhancement to have. But if you want to learn more, you can listen to the episode here or watch the YouTube video down here.

Next piece of content “Performing ELT Operations Using DOMAINs”. This is from percona.com and with regard to ELT, they're talking about something you may have heard of, which is etl, which is extract, transform and load. Well, ELT is basically extract the data, load it, and then transform it. And I would say this is another thing I don't use is basically domains for this purpose. So this blog post is arguing using domains as one way to load data into postgres. So his scenario is that you have a CSV file from another system and you want to load it into postgres. And maybe you're trying to load a date and the issue is it has an empty string. But of course if you're trying to fill a date field, you want to insert a null value for that. So one way to resolve what he says is you could put a trigger on it to change it of a column type date. The second he says craft a custom PostgreSQL command that performs that operation. I've never heard of that. And then the third one is creating a custom data type that permits loading of empty strings into a date column. And this is basically using domains. So he goes through the process of showing you how to do that. Now, the reason I don't do this, and it's probably because I've converted from database systems over the years that had really atrocious data quality and trying to put it into postgres was a bear. I still do the extract load transform, but I load everything into text columns. Even if it's an integer, even if it's a float. Everything goes into a table that has text columns. And then I have a separate process that runs to transform the data into the proper data format. So the loading process is super simple. It's just taking whatever data I'm given, throws it into text columns. It's the transformation step where all the complexity is. So there's one program there that runs through the data and converts it into appropriate data types in the final destination table. That's personally how I like to do it. But if you want to look into this alternative way, you can definitely check out this blog post.

Next piece of content is actually a YouTube video ”[YouTube] 7+ million Postgres tables | Kailash Nadh | IndiaFOSS 2024 | FOSS United”, and this is about a 20 minute presentation. And he's not necessarily advocating this, but he's saying they developed this solution that is working great for them. But yes, it is using over 7 million tables, maybe sometimes up to 20 million tables in a single postgres instance. So what they're doing is they have a middleware that does report processing. So they don't want to have a user directly contact the database and run a report and get that query back immediately because that's going to overload the database if suddenly thousands of users request a report. So basically they set up a queuing system and the report request goes to the queuing system. The queue eventually runs the report for the user and it places the results into a result database, which is postgres. And every report result is its own independent table. So when those results need to be delivered to the end user, they just select star from that uniquely described table that has all the data that was queried. Now, I don't know if I would necessarily do that, but I can see some of the benefits of this implementation. Now, the reason why this works a little bit easier for them is because they can flush their whole database on a daily basis. And how they flush it is they don't go through and drop every table. They actually stop postgres, detach the file system, attach an entirely new file system and bring the system back up. So basically that wipes away all those tables. So I don't know if I'm necessarily advocating this, but I thought it was a pretty interesting way to handle this use case, so feel free to check it out. And that does it for things that I probably wouldn't do.

So moving on to the next blog post is “Achieving a zero-downtime Postgres major version upgrade”, this is from medplum.com and this is really a comprehensive example of how to do a migration that I really liked reading through. Now they are on RDS, so they do have essentially three options for doing the upgrade. They could do an in-place major version upgrade, but the unfortunate part of that is it can generally take 20 to 30 minutes of downtime. RDS now has blue green deployments to do a major version upgrade. They looked into that, but it can still result in they said less than a minute of downtime “but it can be longer depending on your workload”. So it's an indeterminate amount of downtime. They're not quite sure how long it's going to be. And then they said well we could always do a self managed blue green deployment. And this is basically using logical replication to transfer the data over to a new RDS instance. And that's actually what they chose to do because they felt they could minimize downtime as much as possible, if not eliminate it. Now what's awesome about this is that they also have a link to the Step by Step postgres upgrade runbook. And here's the link here that shows all the process that they went through, all the different things they checked the commands they ran. So this is awesome if you're looking to do something like this. But going back to the blog post, they covered a little bit of some of what was in the runbook. They did use PgBouncer as a means to pause and resume traffic. So they don't normally use PgBouncer, but they brought it into play here for doing the upgrade, talked about setting up their logical replication, doing the switchover and post switchover work that needed to be done. So again, I thought this was a pretty good blog post. Definitely. Check it out if you're interested.

Next piece of content “Where are all the PostgreSQL users?”, this is from peter.eisentraut.org and he's basically lamenting the fact that years ago, up to 20 years ago, there was a lot more activity on the mailing list servers where a lot of the developers and users of postgres were hanging out. But now there are so many places and so many different separate communities for postgres. You know, there's the Slack channel, there's Stack Overflow, Reddit, Telegram, Discord, Mastodon, etc. And even individual YouTube channels or podcasts or conferences that it's hard to get a sense of how big the postgres community is, and particularly because he's a postgres developer getting feedback from those users. It's quite hard to do given how widespread the community is. So we didn't really have a lot of conclusions, but it's basically, you know, asking the questions on how we can do better in terms of understanding how people are using postgres in the system. So check this out if you're interested.

Next piece of content “CNPG Recipe 15 - PostgreSQL major online upgrades with logical replication”, this is from gabrielebartolini.it and this is an enhancement to the Cloud native PG Postgres operator where it's adding in declarative support for logical publications and subscriptions. So basically logical replications. And when you have that, you can then do a major version postgres upgrade as we've just discussed. So if you want to learn how to do that, you can check out this blog post.

Next piece of content “Explaining ABI Breakage in PostgreSQL 17.1”, and this is the change that was done in 17.1 where the application binary interface size was changed between those versions, which caused some issues for extensions and resulted in basically an out of band release for it as well as another issue that was discovered. But this goes into a little bit more detail about it, particularly from a code basis. So if you want to learn more about that, you can definitely check out this blog post.

Next piece of Content “Smarter Postgres LLM with Retrieval Augmented Generation”, this is from crunchydata.com and this is another post of using a local LLM like Ollama and Crunchy Data's new OpenAI API access extension for Postgres and a Llama3-8B model and adding in some of your own data to better train responses coming from it. So if you want to learn more about that, definitely check out this blog post.

And the last piece of content, there was another episode of the Talking Postgres podcast. This one was on “Leading engineering for Postgres on Azure with Affan Dar”. So if you want to learn more about that, definitely check out this blog post or you can watch the YouTube version.

Things I Don't Do In Postgres | Scaling Postgres 346

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript