Locking Tables, Foreign Key Issue, Slow Decimals, Memoize & Joins | Scaling Postgres 249
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss issues with locking tables, foreign keys and slow decimals. We also cover when joins use memoize.
Content Discussed
- OK, sometimes you can lock tables.
- A foreign key pathology to avoid
- How slow is DECIMAL, anyway?
- Using Memoize to speed up joins between two Postgres tables
- usql: universal psql?
- How to Solve Advent of Code 2022 Using Postgres - Day 13
- Building a PostgreSQL Wire Protocol Server using Vanilla, Modern Java 21
- Debugging PostgreSQL CI failures faster: 4 tips
- Auditing
- Miroslav Šedivý
- Rubber Duck Dev Show Episode 70 | Teaching Young Developers with Melissa Amaya
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family and coworkers continue to do well. Our first piece of content is "OK, sometimes you can lock tables". This is from thebuild.com and it looks like the blog is "postgreSQL when it's not your job". And this is a very short post that's actually talking about something he wrote before where you should never lock tables. Basically, it's definitely something that's going to hurt your performance. And he actually, in this post, suggested using advisory locks instead of other types of more heavy locks. But he says there is at least one condition where you need a lock. And that's basically if you're going to modify part of the table like you want to add a column to the table. To actually make changes to that table, you need an ACCESS_EXCLUSIVE lock, even a brief one.
But the problem is that if the table is so busy, even if it just needs this brief hold, once it says it needs this other activity for this table, it's going to start queuing behind it. That's known as a lock queue and that could cause some big problems potentially. Now his recommendation on how to handle this is to use this code here where it looks like he's trying to acquire a lock. And if it's not available, try again in about a second. But frankly, what I use is just set a lock timeout and put it in a loop and maybe you set your lock timeout to 5 seconds, 10 seconds, whatever you feel is most appropriate for your database and the load on it and then just put it in a loop trying to apply that change. Or you could try using this code that he's provided. But if you want to learn more, you can check out this post.
The next post, also from thebuild.com, is "A foreign key pathology to avoid". And he came up with a scenario where you have, say, a streaming service and you have a number of streams, you have a number of users, and then you want to link those users to those streams. And he called the table stream_viewer. And because you want to maintain data integrity, a foreign key constraint is added on the Stream Viewer table to users and to the stream ID. But the problem is if some popular person joins your service, suddenly you could have millions of new people watching an individual stream. Now you have to add all of these millions of records to the Stream Viewer.
Well, you're going to have tons of foreign key relationships to that one stream that has to be maintained, which is adding a FOR_KEY_SHARE lock on that record. But you're going to get diminished performance and you're going to start spilling to disk once you have an excessive number of these locks on the system and he proposes two different ways to get around this. One is not to use the foreign key. Of course, you're going to lose referential integrity. Another way is to potentially batch up the join operations. So don't do an individual insert for each one, but batch them up in maybe 100, 1,000 per. But if you want to learn more about that, you can check out this blog post.
The next post, also from thebuild.com is "How slow is DECIMAL, anyway?". So he did a test comparing the NUMERIC data type, which is essentially a DECIMAL versus the DOUBLE PRECISION floating point type. He found doing a straight sum was twice as slow as the float and it went up to four times slower if there was a simple calculation taking the number and multiplying it by an integer. It was five times slower when you multiplied the number by itself and did a summation of that. Even just inserting records, inserting 100 million rows, took twice as long to do the NUMERIC versus the DOUBLE PRECISION. So it's just something to keep in mind if you're choosing to use NUMERICs: there is a performance penalty for using them. And if you want to learn more, you can check out this blog post.
Next piece of content- "Using Memoize to speed up joins between two Postgres tables". This is from pganalyze.com and this is the next episode of "Five minutes of Postgres". Lukas is referencing the presentation that Bruce Momjian released that we mentioned last week. And out of the 40 or so different nodes that Postgres can use in doing query planning, one of them is Memoize. We actually discussed this in Scaling Postgres about a year ago when this was introduced in Postgres 14. And it can happen where, as he says here, quote "...two tables that are being joined using a Nested Loop Join and the inner plan node uses a parameterized index scan".
For those cases, Memoize can be used. And it's a bit like a cache of sorts, like when it tries to calculate a function and it keeps getting the same value, it can use a cached value in those cases. And there's another quote here- "Effectively, what Memoize does is it speeds up the loading of the data because it doesn't have to keep going back to the original Sequential Scan in this case". The post that he covers here, which is a little bit of an older post, shows where someone was using a lateral join and they saw a thousand times faster performance. So if you want to learn more about that, definitely check out Lukas's piece of content this week.
Next piece of content- "USQL: UNIVERSAL PSQL?". This is from cybertec-postgresql.com. So usql is a new command line interface that was based upon psql, but it's designed to be used against any database, I believe Oracle, Microsoft SQL Server, MySQL, etc. And of course, you can use it against Postgres. So in this post, they try using usql to connect to a database and do some exploration of it. And not all of the familiar commands work in usql, so your mileage may vary a bit. But if you use different databases on a regular basis, perhaps you would want to explore using usql instead of psql. So I definitely encourage you to check out this blog post if you want to learn more about that.
Next piece of content- "How to solve advent of Code 2022 using Postgres- Day 13". This is from crunchydata.com and Greg explores his solution to Day 13 of the advent of code.
Next piece of content- "Building a PostgreSQL Wire Protocol Server using Vanilla, Modern Java 21". This is from gavinray97.io and here he is exploring using Java, something I'm not very familiar with, on how to talk to PostgreSQL using its Wire protocol. So this is definitely a more in-depth review of how to work with Postgres. And if you're interested in learning more about this level of detail, definitely encourage you to check out this blog post.
Next piece of content- "Debugging PostgreSQL CI failures faster: 4 tips". This is from citusdata.com and they're talking about the CI environment that is used to build Postgres on multiple operating systems. And they're expressing that when building on multiple operating systems, you can run into errors. And what are some tips on how to determine where the errors are coming from? The first tip here is "Connecting to the CI environment with a terminal" to see interactively how things are processed. Tip two is "Enabling debug build options using them on CI", so adding more options to get more debugging information.
Tip three is "Gathering Postgres' logs and other files from CI runs", so you can actually see what's in the log files. Tip number four is "Running specific commands on failure". So if something fails, are there commands you can run to gain more insight into what's going on? And of course, they go into much more depth about each of these tips and how to use it with the Cirrus build system. So if you're interested in learning about that, definitely check out this blog post.
Next piece of content, Postgres FM did have another new episode this week. This one was on "Auditing". So if you want to learn more about different ways you can audit your database for compliance purposes or even security purposes, definitely check out this piece of content. They definitely covered a lot of different solutions during this episode, as you can see here.
Next piece of content. The PostgreSQL person of the week is Miroslav Ŝedivý. If you're interested in learning more about Miroslav and his contributions to Postgres, definitely check out this blog post.
And the last piece of content, we did have another episode of The Rubber Duck Dev Show this Thursday afternoon. This one was on "Teaching Young Developers with Melissa Amaya". So we not only talked about someone who's a new developer coming onto a team, but we also discussed how to teach development to someone who's completely new to it, like a student in a secondary grade level. So if you're interested in this type of long-form developer-based discussion, we definitely welcome you to check out our show.