Show Plans, WAL Monitoring, Using UUIDs, Default Logging | Scaling Postgres 109
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss how to show live plans, how Postgres 13 allows WAL monitoring, how to use UUIDs in your app and how logging is setup by default.
Content Discussed
- New in pg13: WAL monitoring
- Detailed look on the new PostgreSQL troubleshooting extension – pg_show_plans
- New in pg13: Monitoring the query planner
- Empowering a Rails Application With UUID as Default Primary Key
- Debian Default Logging Explained
- Stored Procedures in PG 11 – Better late then never
- Parallel Query Inside Function
- Optimize PostgreSQL Server Performance Through Configuration
- [YouTube] Postgres Pulse Insights: How to Solve Bloated Postgres Database | Improve Database Performance
- Michael Paquier
- Polygon Averaging in PostGIS
- How to use the Random Forest Machine Learning Model with 2UDA – PostgreSQL and Orange (Part 3)
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "DETAILED LOOK ON THE NEW POSTGRESQL TROUBLESHOOTING EXTENSION - PG_SHOW_PLANS". This is from cybertec-postgresql.com. We had discussed this before. This new extension called pg_show_plans allows you to see what plans live queries are using. So this goes into a little bit more detail about the extension. Basically, the idea is kind of like a pg_stat_activity, but for each of those queries, see what the plan is. It says, what does this do exactly? Quote "...it copies query execution plan info from the private session/backend context to shared memory context when the query starts so that it is accessible to other users". Now, in terms of the installation, you can install from source code, but they also look to be offering RPM/DEB packages.
And it works from 9.1 up to 13 Beta. Now of course, once you have it installed, you do need to add pg_show_plans to the shared preload libraries. It does have a few different configuration options that you can set for it. And of course, it will require a restart of your database. So here they show an example of an inaction. They've set up pgbench. Here, they've created the extension and then run the query to be able to look at the output. And here you can see the plan for this query here. So the intent is, if you have really long-running queries, you could use this to be able to see, okay, what plan is this taken? Why is this particular query taking so long? Now in the previous announcement, they mentioned something about performance.
And here they're saying that for very parallel short statements say less than 0.1 milliseconds, there's a penalty of about 20%, which is not insignificant. But for normal, more slowish real-life queries, the performance hit was small enough to be ignored. So based on this, it looks like your mileage may kind of vary. But it also looks like they have a function that allows you to disable or enable pg_show_plans. So perhaps you could have the extension installed and then dynamically enable it or disable it, depending on if you have problems you need to investigate. So again, some more details on this new extension. And if you want to learn more, definitely check out this blog post.
The next piece of content is "New in pg12: WAL monitoring". This is from rjuju.github.io. So basically in PG 13, it looks like they are starting to track WAL generation statistics. It says, quote "...each backend will track various information about WAL generation such as the number of WAL records, the size of WAL generated and the number of full-page images generated". Now this is great because I've had to do some analysis of WAL files to determine essentially why they were so large. And being able to track this per backend is a huge benefit. Then you can do a per query WAL activity with pg_stat_statements. So essentially, some of these columns will be added to pg_stat_statements. So this is huge. This is a great feature I'm looking forward to. And he goes through how some of it is implemented. So if you want to learn more about that, definitely check out this blog post.
From the same site is "New in pg13: Monitoring the query planner". So most of the counters that are in pg_stat_statements, they just give you execution times. But with this addition, you can also get planning times. So for example, it looks like you can enable pg_stat_statements.track_planning, and that will track the planning time for your queries. So if you have queries that are quite complex and take a while to do the planning stage, you can actually collect that information now with this new feature. So if you're wanting to learn more, check out this blog post.
The next piece of content is "Empowering a Rails Application With UUID as Default Primary Key". This is from medium.com/better-programming. So we talked a little bit about BDR in the last episode and how they had the concept of a shadow master. And I said, well, one thing you're going to have to do is centrally create any primary keys be they IDs UUIDs. So this is a way to use a Ruby on Rails application to set up UUIDs. They show you an example of how you would typically pull up a given article using an ID as the primary key, whereas you can switch them to UUID. So it looks like this. So one of the benefits he's mentioning is that you get to obfuscate information in the URL. Perhaps it's a little bit of a security layer, although it's security by obscurity, of course. Then the benefit of UUID is that you can actually generate them from client applications. It doesn't all have to be generated within the database.
So he shows you how to actually do what they call a database migration in Rails to be able to set up the extension. Here he's using the pgcrypto extension to do it, although there is another you can use as well, as well as setting the primary key type to be UUID. Now, what he also mentions is that this framework, Ruby on Rails, also has the concept of being able to pull up the first record from a given set you're pulling or the last record. Generally, it orders that by the primary key. Well, that may not be what you're intending if you switch over to using UUIDs, but he actually has a way to do this where you can specify a different implicit order within the application framework to, say, be created at date. So then the first and the last record kind of make more intuitive sense. And of course, he makes this disclaimer: be super careful if you're going to be transitioning from, say, integers to UUIDs for existing projects. But if you're looking to make this change, here's an article on how you could potentially do it.
The next post is "Debian Default Logging Explained". This is from mydbanotebook.org and this kind of goes into how Debian tends to set up a PostgreSQL when you're using their packages. And this would be for example, for a Ubuntu server. So when you actually do get the pg_current_logfile from Postgres, it's actually blank even though you know it's logging somewhere. And by default, it logs into var/log/postgresql. So it goes over a little bit of an investigation on how to find it. So for example, when checking the logging settings in postgresql.conf, log destination was not set, meaning was set to standard error and the logging collector was off. So according to Postgres, there's no logging to this location.
So how is it doing it? So he looks into the I believe it's a Perl wrapper: pg_ctlcluster that they set up on Debian systems. So he gets the right one and then he looks at this particular code and it looks like it launches the systemd Postgres service. Here's the service definition for it, or I should say the template unit file for it. Basically, it works because it's using the -l or the log switch of pg_ctl because essentially this calls pg_ctll. And he said of course that means if you use the -l switch of pg_ctl logfile, the pg_current_logfile function won't be able to help you find where your log file is. So if you use PostgreSQL on a Debian system, you may want to check out this blog post to kind of get a better understanding of how Postgres is set up for the Debian packages.
The next post is "Store Procedures in PG 11 - Better late than never". This is from highgo.ca and they're basically explaining store procedures and what they are. He goes over the difference between functions and stored procedures. Basically, it's executable code and can run in different languages. Functions cannot have transactions, whereas store procedures can. You execute functions just by running the function, whereas for a stored procedure, you actually call it. He goes into a little bit of discussion of how you can return a value from a stored procedure, how it gets invoked, its general structure which is very similar to a function and it even supports overloading. So if you want to get more insight into the differences between stored procedures and functions, definitely check out this blog post.
The next piece of content is "Parallel Query Inside Function". This is from abdulyadi.wordpress.com and he's talking about where he was running a query here that had a bunch of unions. When he just runs the Raw SQL function, it runs in parallel, something similar to this, and then appends everything together and gives the ultimate result. That takes around 1.9 seconds. But if you put this exact SQL code in a PL/pgSQL function or in a pure SQL function, it actually takes much longer, about 3.4 seconds here and again about 3.4 seconds here. So the question is why?
So we actually looked at his CPU output and you can tell the plain SQL function looks to be running in parallel, it's using all the CPUs. Whereas the PL/pgSQL function only seems to be using one CPU core. Same thing for the SQL function. So how do you get to be parallel? Now, he actually did kind of a hack to get it to work, whereas he said that instead of returning the query, the result set is saved into a temporary array in memory and then exported out once finished. However, in the comments, they also mentioned that you can do a parallel safe to your function to get it to be able to work in parallel. So this is a quick post, but it's just something to keep in mind is that if you're using functions and you want them to run in parallel, maybe you need to use this parallel safe.
The next piece of content is "Optimize PostgreSQL Server Performance Through Configuration''. This is from crunchydata.com, and they're basically going over some of the areas that you most want to change as you're optimizing server performance as you increase the size of your hardware. Namely, the four that they've chosen are shared_buffers, wal_buffers, effective_cache_size, and maintenance_work_mem. So they go over each of the configuration options and what they do and some recommendations for settings. So if you want a refresher on that, definitely a blog post to check out.
The next piece of content is actually a YouTube video: "Postgres Pulse Insights: How to Solve Bloated Postgres Database | Improve Database Performance". This is from the EnterpriseDB YouTube channel and it looks to be about on a weekly basis. They're going to have this live stream, I believe, where they answer questions that they find on various online forums. So if you want to have a live stream or a video covering questions that the community has about Postgres, definitely a piece of content to check out.
The next piece of content, the PostgreSQL person of the week is Michael Paquier. So if you're interested in learning more about Michael and his contributions and work in PostgreSQL, definitely check out this new post.
The next piece of content is "Polygon Averaging in PostGIS". This is from crunchydata.com and they show you how to average polygons in PostGIS. So if you're interested in that, check out this blog post.
The last piece of content is related to machine learning. "How to use the Random Forest Machine Learning Model with 2UDA - PostgreSQL and Orange (Part 3)". This is part three of this article series, so if you're interested in that, definitely a blog post to check out.