Adopting PgCat, Time Bins, work_mem Settings, Bad Constraints | Scaling Postgres 257
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss Instacart adopting PgCat, binning or bucketing your data by time, the best settings for work_mem and how to avoid bad check constraints.
Content Discussed
- Adopting PgCat: A Nextgen Postgres Proxy
- PgCat
- Easy PostgreSQL Time Bins
- Everything you know about setting `work_mem` is wrong.
- Breaking your PostgreSQL database with bad CHECK constraints
- Inside logical replication in PostgreSQL: How it works
- System roles – what, why, how?
- How collation works
- The Wonders of Postgres Logical Decoding Messages
- UUIDs vs Serial for Primary Keys - what's the right choice?
- Postgres 15 highlight - Custom WAL Resource Managers
- Table Maintenance after Bulk Modifications
- PostgreSQL Patch of Interest (To Me): Documentation Linking
- Building an AI-powered ChatBot using Vercel, OpenAI, and Postgres
- pg_timetable v5.3 is out!
- Upgrades
- Vincent Picavet
- Features Your Editor Should Have With Amir Rajan | Rubber Duck Dev Show 78
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Adopting PgCat: A Nextgen Postgres Proxy". This is from tech.instacart.com. And they're talking about a new connection pooler called PgCat. And it does a little bit more than connection pooling. It also does load balancing and replica failover. Now what's significant about this is that this is a not insignificant company, Instacart, is starting to use and has contributed improvements to PgCat. So historically, I've always used PgBouncer, but I'm really interested in this post here with all the information that they provide.
So they were using PgBouncer but they wanted to get that load balancing and replica failover capability. So they started doing an evaluation using a table similar to this where they looked at PgBouncer, PgBouncer with a sidecar to handle failover, Yandex Odyssey, and PgCat. As you would expect with these types of tables, PgCat was a yes in all the categories with regard to being multi-threaded. Which is great because that's one thing I don't necessarily like about PgBouncer is that it is single-threaded. So if you put it on a multi-core machine, You're only going to be using one CPU core.
And you have to do different techniques to actually use all of those cores if you want a dedicated PgBouncer box. Whereas PgCat is multi-threaded out of the box. So it'll immediately use whatever cores are available which is awesome. The other thing it has is PgBouncer controls. So PgCat is meant as a drop-in replacement for PgBouncer. You can use tools such as reload or show the pools so those commands work in PgCat. Again, that's awesome. It supports a replica failover, and query load balancing which of the things they were most interested in. Also has some different behaviors for handling poorly behaved clients.
And then they also list the language here and as you can see the other ones listed here are written in C and C++ whereas PgCat is written in Rust. So they talked about their evaluation phase and they first looked at latency. So what kind of performance are they getting out of it and it was pretty much identical. I mean it wasn't quite as fast as PgBouncer, but very very minimal difference; statistically there may be no difference at all for some of these. So the performance is good. Oh, the other thing I should mention is that one of the key differentiators for PgCat is that it supports session and transaction modes that PgBouncer does as well.
They talk a little bit about their deployment layout and how you can use it simply for a proxy in front of a Postgres instance, as well as what they're doing here. They're using a containerized deployment of PgCat where the traffic can speak to any of the databases here and load balance between them. And they said they used random load balancing as well as least connection load balancing, I believe. And how it can handle replica failover so it can stop traffic to a replica until it comes back online. And in terms of the misbehaving clients, they say with a PgBouncer it normally just breaks the connection. Whereas PgCat actually does a rollback and is able to add the connection back to the connection pool.
And they say here a quote "PgCat has been in use for some of our unsharded production workloads for the past 5 months". So definitely a lot of production usage here. And that they've quote "... migrated one of the largest databases to PgCat and has yielded significant benefits by adding load balancing services...". And they're in the process of moving more of their production workloads to PgCat as well. And the last thing they say in the last paragraph here is "PgCat is targeting a 1.0 release in the coming weeks!". So again like I mentioned, PgBouncer has been my go-to but PgCat looks pretty interesting to me now and I'm probably going to be looking at it and evaluating it in the near term. So if this is of interest to you, definitely check out this blog post as well as the GitHub page for PgCat. The tagline is "PostgreSQL at petabyte scale".
Next piece of content- "Easy PostgreSQL Time Bins". This is from crunchydata.com. They're talking about when you're analyzing data, you want to bucket it, or bin it, to do a histogram of sorts to see how many values are in a particular bucket or a particular bin. And that's what they're doing here. So they're using earthquake data published by the USGS. So they created a table to accept it. They're using PostGIS to get the geometry data type as well as they're using PL/Python because they are using their technique of using a Postgres function to actually pull all the data in and format it and place it in the table.
You don't need this to do the binning. This just basically pulls the data that they're going to be analyzing. And at the end, they had just over 11,000 records here and they did a quick query just to show you the magnitudes of the earthquake and how many counts were reported. So that's a very quick way to do a type of binning by the magnitude of the earthquake. But then they go into more histogram-based summaries based on the date. So you want to see how many magnitude six earthquakes happened on particular days or particular weeks. So the first simple way to do it is just to cast the timestamp as a date and you immediately get the data that looks like this.
Now of course the disadvantage is it doesn't show the zero counts. So as you can see it's skipping dates and ideally if you want to graph something, you are going to want to show all the dates and zeros where there is no data. So how you can handle that is you could do a join to a generate_series command that generates all the dates and then you use the Coalesce() function where the count is NULL to just make a zero and then you get consistent zeroes where no data is present and then the actual data where it is present. And then they got a little bit more sophisticated and this requires a Postgres 14 to do it, but there's a date_bin function.
So basically this allows you to set how much stepping by date range. So in this case they use the date_bin function to bin their dates by 2.5 days. And of course when they do the generate_series, they do it in 2.5-day steps. As you can see each point is 2.5 days and it is binning appropriately. Then they say 'Okay, now you can do it with arbitrary bins of any size'. To do this, first, they create an array aggregate of a generate_series command and they're actually using one-week intervals. But again, you can make these one-month intervals at whatever interval you want. Then you select from the quakes table and you do a cross-join to that array aggregate and then you use the width_bucket function.
And basically, it takes that timestamp and looks in that bin for the appropriate range to use as the data. And you get something like this. So it is in the appropriate bin and you get the count, but you're not getting the actual week or date range. Nor are you showing all the bins. So with the next iteration, he does a join that unnests all of the bins and uses, with ordinality, as well as the coalesce function for the count, to be able to show every bin or every week along with the count and handling zeros as well. So it gives you a properly formatted histogram. So this was a great blog post about how to bin data in different ways using Postgres. So if you're interested in that, definitely check out this blog post.
Next piece of content- "Everything you know about setting 'work_mem' is wrong". So that's the perspective of this post; every single one is wrong. This is from thebuild.com. The reason being is that he says work_mem is mostly workload-dependent. So you want to set it high enough so that you're not doing disk merges. But you don't want to set it high enough to where you're going to run out of memory. So his perspective is that the best thing to do is to use a formula similar to this. So take 50% of the memory, add your file system buffers, and divide that by the number of connections to your database. With that, the chance you're going to run out of memory is relatively low.
But the issue is that you also may be leaving memory on the table because it's not going to be utilized. So then his recommendation is to basically empirically test it. He uses that formula and then says to set your log temp files to zero and run the system on a production load. If everything works fine, you see no performance problems, you are pretty much done. But if you are seeing a performance issue, then you look for temporary file creation messages. If there aren't any, you should be done and it's not temporary file creation but if there are, his recommendation is to set the work_mem to two times the largest temporary file.
But again you want to be careful of that because you don't want to run out of memory and you don't want something ridiculous, he says here around two terabytes. Then he has another recommendation. If you really need to use a formula to do it take average freeable memory times four and divided by the max connections. Now one thing I will say here is that another thing that you could do, he doesn't really list here, is leverage the fact that you could set work memory per session or per user. So for example, if you have a web application and most of the interactions, the data that people are pulling, are super fast queries, maybe you were looking for a few records at a time, you don't need a large work memory.
So maybe the user, that the web application connects to, doesn't need a very large work memory. Whereas if you had a reporting infrastructure, maybe use a different Postgres user for that and increase its work memory for that user or for that session you're connecting to the database. So that way each connection type gets its own custom work memory for the job that it needs to do to hopefully try to minimize some of the disk access. But if you want to learn more, definitely check out this blog post.
Next piece of content- "BREAKING YOUR POSTGRESQL DATABASE WITH BAD CHECK CONSTRAINTS". This is from cybertec-postgresql.com. And they're talking about a situation where you can set up a constraint that actually gets you into trouble. So ideally, you want your check constraints to be immutable because they can cause problems when it is mutable. Now Postgres has some facilities to prevent immutable functions, for example where they're not supposed to be present. But the problem with check constraints is that capability needs to be there for cases when you're doing a check constraint or checking a timestamp, for example.
So you want to enforce some sort of constraint with the date you're putting in and you want to use the current date to evaluate that. Well, by definition, this is a mutable function, it's going to have different inputs every day but can still be a valid check constraint. So you don't want to enforce immutable expressions in your check constraints but you still need to be careful how you handle it. And they show some cases here of how it can break things. So the first example is where they are keeping a history of data entered into a database. So a change was made to their table called data and it stores the row information casted to text. And it handles that by doing a trigger.
So it takes all the rowed data, casts it to text to store it in the column, and a trigger that updates that history table. Then they have a check constraint on that history table that says is not NULL. But they have a comment in there that says "never mind the IS NOT NULL, the type cast is the test". So when they insert data and then delete data, you can see their table history has been updated. But when they add a column to the data table, and then try to update that data, you get an ERROR: malformed record literal. And the detail is too few columns. So basically, you have a data mismatch because of that additional column that you've added to that table.
So he says a better solution would have been a trigger to be able to handle this. The other issue that they talk about is a bad check constraint is where you have dependencies between tables. So they have vehicles and clients and they're going to be renting these vehicles. So they set up an exclusion constraint in the rented table and they created some functions to be able to add the check constraint to make sure that the number of seats in the car matches the group of people who are going to be renting the car. They inserted some data, you get an error when you have a violation of this check constraint, but your data gets inserted correctly when the check's constraint is not violated, so everything looks good.
But the problem is when you do a backup and then do a restore, because of the ordering of the restore, you can actually run into errors. Because you have dependencies between tables, it may restore in an improper order that can result in this issue. And again their solution for this again is to use triggers to do it. So I definitely encourage you to check out this blog post to go into more detail about the reasons why these fail and some alternatives. But it definitely enforces the fact that you need to be careful when using check constraints and ideally only use immutable expressions in them. The only probably safe mutable ones are again the timestamp example that they show here. But if you want to learn more, check out this blog post.
Next piece of content- "Inside logical replication in PostgreSQL: How it works". This is from postgresql.fastware.com. And this is a great post exploring how logical replication works in Postgres 15 and all of the different features. And they go through the process of setting it up. They show you the different tables you can look at to examine the state of it. They cover the subscriber side and the publisher side, showing the replication launcher and talking about the table workers and apply workers. They talk about the new features to be able to filter things based on row filters and column lists. So this is a really comprehensive post and I definitely encourage you to check it out if you want to learn more about how logical replication works.
Next piece of content- "System roles- what, why, how?". This is from depesz.com. This is a pretty interesting post because he's listing all of the system roles and shows the version in which they were introduced. So these are essentially groups that you can add users to. They exist by default in every version of Postgres that's listed here. So for example, if you wanted to give someone read-all access to all databases on a Postgres instance, as long as you're on Postgres 14 or higher, you could just add them to the group pg_read_all_data. So he goes over each of these different roles and shows an example of their use case. So if you're interested in that, definitely check out this blog post.
Next piece of content- "How collation works". This is from petereisentraut.org And he's going into a deep dive into explaining how collations work with some specifics on Postgres. Now I know that they made some changes to Postgres 15 with regard to ICU coalitions. And I'm super interested to see what's coming in Postgres 16 now and he does promise to do that in a future blog post. So we'll have to be on the lookout for that as well.
Next piece of content- "The Wonders of Postgres Logical Decoding Messages". This is from infoq.com. And they're talking about how you can use application tools to actually read the WAL stream and logically decode it. But specifically this blog post discusses how you can actually emit your own messages from Postgres using the pg_logical_emit_message. This message doesn't have to do anything with the data contained within Postgres. It could be totally custom what you're emitting. So usually you're logically decoding data changes in Postgres, but with this emit message, you can emit anything you want. So they talk about that here and they talk about the infrastructure they set up for decoding it, primarily they're using the Debezium as their change data capture tool.
And a stream processing tool like Apache Flink. They talk about some of their use cases for using this setup and that's for auditing metadata application logging, microservices data exchange, and the key thing to keep in mind as you're looking at this is that this pg_logical_emit_message. There's no fixed schema for these messages basically. If you're going to start using this, you're going to have to have your own contract with what data you're sending out so that you know how to receive it. But this was a very comprehensive blog post. So if you want to get more detail about it, definitely check out this piece of content.
Next piece of content- "UUIDs vs. Serial for Primary Keys- what's the right choice?". This is from pganalyze.com and this is the next episode of "Five minutes of Postgres". Lukas covers the article that was done by Christophe Pettus and we covered it a few weeks ago on Scaling Postgres. Should a UUID or a bigint be a primary key? So he goes into more detail about this, talking about Christophe's perspective. He also reviews how performance can vary between random vs. sequential identifiers. It also talks about the new UUID type version seven which may be coming that has a, I believe, time-sequential component to it. So maybe that could be a really good solution and how there is a talk on the Postgres mailing list, I believe, about potential implementation as well. So if you're interested in that, check out this piece of content.
Next piece of content- "PostgreSQL Patch of Interest (To Me): Documentation Linking". This is from softwareandbooze.com. And this actually isn't a Postgres change but a change to Postgres' documentation. Where they actually show visible links for the individual segments of documentation that you can link to for discussion purposes. So I've actually had to use this a few times, so I'm super interested in this actually getting pushed live into the next version of the documentation. But if you want to learn more about this, definitely check out this blog post.
Next piece of content- "Building an AI-powered ChatBot using Vercel, OpenAI, and Postgres". This is from neon.tech. And this post covers a lot of technology but specifically with regard to Postgres, they're talking about the pgvector extension and being able to use it for storing embeds from OpenAI's API. So if you want to use AI with Postgres, maybe you'd be interested in checking out this blog post.
Next piece of content, there was another episode of Postgres FM this week. This one was on "Upgrades" where they say they discuss what they are, how often they come out and how regularly we should be upgrading. Check out their show on YouTube or the audio link if you're interested. And I see they're now embedding the YouTube videos as well, so great.
Next piece of content. The PostgreSQL person of the week is Vincent Picavet. If you're interested in learning more about Vincent and his contributions to Postgres, definitely check out this blog post.
And the last piece of content, we did have another episode of The Rubber Duck Dev Show this past Thursday afternoon. This one was on "Features Your Editor Should Have Amir Rajan". So Amir had a very specific set of requirements for the editor that he chose. And I think these are requirements that we should all consider in choosing our IDE or editor. So if you want to listen to that discussion, we welcome you to check out our show.