background

Sometimes It Is Slow? | Scaling Postgres 332

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss what can happen when queries get slow, backup best practices, Postgres emergencies and the state of Postgres survey.

Content Discussed

YouTube Video

Podcast Audio

Transcript

So imagine your database is humming along, things seem to be operating normally, and then maybe a developer comes by and says hey, you know, I've got this query that sometimes it's slow and maybe it's a pretty simple query, so it should always be returning fast. But for some reason, a small percentage of the time it's returning very slowly, at least from the information they're providing you. Well, that's what we're going to be covering this week in the first post, but I hope you, your friends, family and coworkers continue to do well. 


Our first piece of content is “Sometimes, It’s Slow!”. This is from hdombrovskaya.wordpress.com and she says this is the thing that she dislikes the most with regard to trying to find performance problems is when it's inconsistent. And that's definitely true. It's much harder to figure out. But from my experience, whenever something’s slow sometimes happens. It's usually due to some locking, at least if it's within Postgres. But she was investigating and trying to figure out what the cause of this sometimes slowness was. She said there were two contributing factors. One is the tendency for people, if they need to load a lot of data into the table, following the advice to drop indexes and constraints before loading the data and then recreating them afterwards. And the second thing that contributed to it is how most ORM’s wrap their statements in transactions. So they do a begin, they send the statement they're going to be sending to the database system, and then they do a commit or rollback. And this is even for SELECT queries as well. And both of these cases contributed to what she was seeing. So she says, why is it a bad idea to delete indexes and constraints on a table that you're trying to load in? Well, if you have a case where a table already has 10 million rows and you just need to add 50,000, it makes zero sense to drop the indexes and the constraints. Number one, you're losing your referential integrity with the constraints, and number two, losing the indexes. If there are any queries that are ongoing, they're going to be really poor performing during this time. So I would never drop constraints and indexes if I was bulk loading unless that table was empty. I might consider doing some form of partitioning to be able to get to that single empty table that I'm loading a lot in, and then add the indexes and constraints afterwards. But I wouldn't drop indexes and constraints from a table that already has so much data in it. The second issue with regard to moving constraints such as foreign key constraints is that it requires a lock on both tables that are involved in that relationship. And even though it's fast, it is still an exclusive lock on both tables. So basically nothing needs to be going on in those tables for it to acquire that exclusive lock. The third issue is there happened to be an open transaction reading from the parent table. Therefore this exclusive lock couldn't be obtained until the open transaction was closed. And lastly, there were apparently more than one table that had a foreign key referencing the same parent table, and it was tracking the files being loaded in, which was doing selects, which presumably were wrapped in a transaction and they were dropping constraints. So there was basically a whole lot of locking going on that apparently resulted in this. Sometimes it was slow. Now apparently she's going to be talking about this more at an upcoming presentation, but I'll also recount an issue I ran into where a developer brought something to me and said hey, this query is sometimes slow, why is that? And I went and looked in the database at PGSTAT statements and everything seemed to be super fast. It was a real outlier how slow this query was compared to how PG stat statements was reporting it for its like average execution time. So as I looked into this further, it was actually going through PG bouncer that allowed a whole lot of client connections, but its server backing connections were relatively constrained. And once I looked at the PG bouncer stats, I realized that there is some client event waits happening. So basically when things got busy, there weren't enough connections available in the server pool on the server side of PG bouncer and those client connections had to wait. And essentially that was introducing greater latency, execution latency for these queries because they were using an APM to track the performance of those queries. So they were seeing that full latency, but Postgres reporting that everything was fine because the latency was actually happening in PG bouncer. So definitely another factor to consider when you're seeing things are sometimes slow. But if you want to learn more, check out this blog post. 


Next piece of content “PostgreSQL Backup Best Practice”, this is from enterprisedb.com and the first recommendation he gives is that basically your backup strategy should be part of a more complete disaster recovery plan. So basically plan for the disasters that you believe could potentially happen and design your backup strategy around that. But the particular best practice he highlights is basically use well known tools for managing backups and the particular tools he has in mind are Barman and PGbackrest. Now I definitely agree these are the most often used comprehensive tool sets for backing up postgres, but they also come with a lot of sophistication. So I think if you have a large complicated database that isn't being managed by a third party service, and they're doing the backups for you, these two tools are great options. But if you're just starting out with a small database and your recovery needs aren't that robust, frankly, I don't think there's necessarily anything wrong with using PG dumps. And then when you get more sophisticated, you could move up to PG based backup, or when you want that point in time recovery, you can move up to that as well. He definitely doesn't advocate going this route because he says you don't want to try to write essentially the equivalent functionality of these two using these two postgres backup tools. But I would go based upon what is your needs, what is your recovery requirements, as he states in the first paragraph here. And if PGdump gets you what you need, it's very simple to do, because to me the main best practices can you restore like whatever tool you use, as soon as you back up your database, you do a restore of it to make sure you can do that. That's the main best practice is restore after your backup. And if you can restore successfully using PGDump PG based backup, you know, any of these other tools, then that's the route I would go. But what do you think? What tool do you use to back up your database are using PGbase backup? Barman PG backrest, I guess. Let me know in the comments section. 


Next piece of content. There was another episode of Postgres FM last week. This one was called the “Postgres Emergency Room”. So basically Nikolai and Michael got together and they discussed when there's some sort of emergency going on with postgres, it could be the system is down, it could be you're trying to restore from backup and it's not working. You could have things like TxID wraparound issues or an integer being overrun, or heaven forbid some type of corruption issue. Or maybe there's a bad deploy and now suddenly your performance has tanked across the board. Basically, what do you do in emergency situations? And they talk about it both from a technical side and from a psychological side. So for example, if you're having some sort of emergency, who is the best person to handle that or call the shots? Who has the least stress dealing in stressful situations? Maybe they should be the ones who make the call with regard to what you should do to resolve a particular issue. It may not be always be the person who's the most technically minded, because maybe they're not the best person to make a call when they're under stress. He talked about the importance of monitoring situations and then once you've actually resolved it, thoroughly documenting what happened and how to address it. And they also listed some resources as guidebooks or handbooks, such as the site reliability engineering resources from Google or the GitLab handbook SRE. But if you want to learn more, you can definitely listen to the episode here or watch the YouTube video down here. 


Next piece of content “The 2024 State of PostgreSQL Survey Is Now Open!”. This is at timescale.com so you can definitely click here to take the survey this year, and it will be up until September 30. So basically you have almost a whole month to go ahead and take it, but probably best to do it sooner rather than later so you don't forget. 


Next piece of content there's another survey, this one by Redgate, and it's “The 2024 State of the Database Landscape Survey”. And this isat softwareandbooz.com so you can click here to take part in the survey. So this one doesn't look like it's necessarily postgres specific, but you can check this out if you're interested. 


Next piece of content vacuum managing and detecting bloat this is from cybertech postgresql.com, and this is a pretty comprehensive post about vacuum. Basically the purpose of vacuum is due to postgres MVCC, basically its Multi Version Concurrency Control, and it's how it allows different processes connecting to the database can see different versions of the data based upon what's going on. And because it stores that in the actual rows of the database, it doesn't have a rollback area like Oracle or Microsoft SQL Server. It stores each of those versions in rows within the database. So once those versions of data are no longer needed, they need to be vacuumed. So that's the big point of vacuum, is to clean up these dead rows. And because of how it operates, keep in mind that long transactions that are keeping a connection open and rows being accessed can prevent rows from getting vacuumed. So basically you want to avoid long transactions as much as you can. The other thing he talks about is that when you vacuum something, it actually doesn't return space to the operating system. There's a chance at the end of the files for some space to be returned, but nothing in the middle of the file will be returned, and he shows an example of being able to look at that here. So how do you actually free up that space? Well, he mentions one utility, PG squeeze, which apparently uses logical decoding to basically copy one table to another new table and then change the names. Another utility you can use is PG repack, so that's available as well. You can do a vacuum full, but of course that locks all access to the table. So you want to be careful about using vacuum full as a means to reduce bloat of the table. And lastly he covers how to detect bloat, and mainly it's using the PGStat tuple extension here to be able to track how much bloat is present. And he closes out talking about hmm, how can bloat be helpful? Well, when you set up a table, you can define a fill factor, meaning how empty are the data rows within it. And this can actually help with updates, particularly heap only tuple updates, to give you faster, efficient updates to that table. So it's something to keep in mind if you're going to be updating frequently. A table with a little bit of low could actually be an advantage, but if you want to learn more, definitely check out this blog post. 


Next piece of content “Understanding the Postgres Hackers Mailing List” this is from crunchydata.com and the postgres hackers mailing list. This is where development is discussed and takes place with regard to postgres, as he says here is very, very busy and basically seems like it's drinking from the fire hose. So he has some guidance here. You know, don't try to read everything, just skim the subject lines. Make sure using a good email client to go through the mailing list, set a routine. Maybe once a day you're checking out particular portions of messages. Be willing to let threads go, and if your backlog gets too big, just say, all right, mark all of them as red. But then he gets into the different jargon that's present here. So first he talks about version control jargon such as diff and commit ids, what rebasing is, and head different language with regard to building, testing, and patching such as GCC and clang. What's a commit fest? And then all these different acronyms like CF is the commit Fest, CFM is the commit fest manager, and a ton more. And then he talks about acronyms used specifically in postgres development, and there seems to be about 50 to 60 different ones here. I'll encourage you to check out this blog post if you want to learn what those mean, as well as common postgres acronyms that you may experience just in talking about the product itself. So this is a pretty great reference to have if you're going to be looking at the postgres mailing list to try and translate what some of those things mean. So go ahead and check this out if you're interested. 


Next piece of content “What is pgvector, and how can it help you?” This is from enterprisedb.com and this is a quick summary post of what are vectors? What is the PG vector extension? How do you get started using it as well as generating embeddings to be able to do semantic like search with postgres? So this is another basic introductory post about that. So check it out if you're interested. 


Next piece of content in browser semantics AI search with PGLite and transformers jsdev this is from supabase.com and this is putting a bunch of things together. They're using the embedded database PG lite that can run in a browser. They're combining it with the built in PG vector, of course, as well as transformers JS to do embedding generation and they're wanting to build an in browser product search. I don't know if I would necessarily implement this, but it is a proof of concept that could use more resources on the client system as opposed to using server resources to do AI based searching. But check this out if you're interested. 

 

And the last piece of content is “Implementing Filtered Semantic Search Using Pgvector and JavaScript”. So again, using vectors for doing semantic search work, although this one is implementing it in JavaScript so check it out if you're interested.
 

episode_image