Slow Queries, pg_auto_failover, Roles, Hyperscale | Scaling Postgres 63
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering how to identify slow queries, the pg_auto_failover utility, working with roles and Hyperscale.
Content Discussed
- How to identify slow queries in PostgreSQL
- Introducing pg_auto_failover: Open source extension for automated failover and high-availability in PostgreSQL
- PostgreSQL: Using CREATE USER with caution
- The role of a role within another role
- Introducing Hyperscale (Citus) on Azure Database for PostgreSQL
- [YouTube] Building PostgreSQL apps at any scale with Hyperscale (Citus) - BRK3094
- [YouTube] Unleash analytics on operational data with Hyperscale (Citus) on Azure Database for PostgreSQL
- PostgreSQL v12 new feature: optimizer support for functions
- pgBackRest – A Great Backup Solution and a Wonderful Year of Growth
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about slow queries, PG auto failoverroles and hyperscale. I'm Kristen Jameson, and this is scaling postgres episode 63.
[00:00:21] All right, I hope everyone's having a great week. So our first piece of content thisweek is how to identify slow queries in PostgreSQL. And this is from Dev two pythonmeister.And this is a very short post, three minute read, but it tells you exactly what you need to do to beable to find slow queries.
[00:00:42] Number one thing he mentions here is adjust your long men duration statement inyour PostgreSQL configuration to be some time period. It could be 500 milliseconds a second.He suggests 5 seconds here so that in your logs you see your slow queries. And of course, hesays you need to reload your configuration after you do that and even shows you how to test itby doing using a PG sleep. So if you're unfamiliar with some of these functions, you can seethem right here. And then the next thing he says to do is to use PG Stat statements. So this is aextension that you put in the shared preload libraries configuration of PostgreSQL. Then youcreate an extension in the database that you're interested in and it tracks all of these statisticsfor queries that are running. And he gives some example output of what you can look at to get agood sense of all the queries that are running their average time frame for you to identify thoseslow queries. And then he also does, which I haven't seen a lot of blog posts do, is mention thatyou can because these are aggregate the PG Stat statements, you can reset them and this isthe function to reset them when you feel as if you need to on a periodic basis or if you'rediagnosing some issue. So very short, but a very good post telling you exactly how to find yourslow queries.
[00:02:01] The next post is introducing PG auto failover, open source extension for automatedfailover and high availability in PostgreSQL. Now this is actually on the Microsoft.com blog andthey've developed this open source tool that enables you to do automated failover. Now I'mgoing to look more into this, but it's pretty interesting what they're mentioning here. Now there'sother tool sets that do that, but I'd be interested to see how this works. And basically this toolallows you to set up, for example, a primary and a secondary. And it says it does, it automaticallysets up streaming between the primary and the secondary. And it institutes a monitor that doesstate checks for each of the databases to make sure each one is up. And this is actually on theCitus data GitHub archive. So the first thing they do is they install the packages, assume theuser postgres, and they create the monitor that's going to be monitoring everything. Then they inthis example created a small disks for their first node, node A. And they created a databaseagain using this PG Auto control command and they defined the monitor for this particulardatabase that they created. So this would be their primary basically. And then they say they setup to run a Keeper here, which the Keeper is what runs on the database to monitor that it is upand active. And then they create their second node and run the Keeper on the second node. Butin this case this is all happening in one instance. In their example in here you can look at, theysay a watch replication with this command here and you can add data to the primary and thenyou can see it on the secondary. And then they caused a failover by filling up the disk essentiallyin their node one or node A. And they say after a few failed attempts, its Keeper signals that thenode is unhealthy and the node is put into the demoted state and the monitor promotes node Bto be the new primary. And that node B is not considered in full primary state since there is nosecondary present. It is marked as wait primary until a secondary appears. And the otherinteresting thing they list here is something that was done with Postgres ten is that the psql clientand others based upon Lidpq, you can set up more than one connection to a database anddefine which one is read write. So they're using this ability to be able to seamlessly switchbetween node A or node B. So this is a potentially very interesting tool and I'm actually going tobe checking this out and looking at it. One thing that came to my mind is PG rewind, would PGrewind work in a scenario like this or is that a feature that's being added? So I'm --definitely going to look into that, but definitely pretty interesting, definitely encourage you tocheck it out.
[00:04:56] The next post is PostgreSQL using Create user with caution. And this is againanother very short post, but it's talking about distinguishing, they say the golden rule ordistinguishing between users and roles. And basically PostgreSQL just has the term of role. Youcan make it a user or you could essentially make it a role and have other roles inside of it orusers. But the issue they say that you can run into is that when you assign a user so let's sayyou created a table A and you assign the user Joe to it. So Grant select on a to joe if you needto then drop that user, you're going to get an error that says error role joe can't be droppedbecause some objects depend on it. In other words, there's privileges on the table A and thatthere's no such thing as drop user cascade. And the reason being is that these users arecreated at the instance level, not on each database. So how you get around this is you actuallyassign roles to objects and then put essentially users in those roles. So, for example, they usecleaning staff, for example, they have a role cleaning staff with no login. So it's not meant to be auser, it's meant to be a role that others are placed into, for example, and you grant the select onthat role. And then you can grant or add Joe to the cleaning staff role using this command. Grantcleaning staff that role to Joe. And then lastly, they also mentioned they also have a utility thatthey developed called PG Permission that allows you to look at all the permissions that exist onyour system. Now, related to this post, there's also a second one that did a follow up called TheRole of a Role within Another Role. And this is from Luca 1978 GitHub IO, or I should say FLuca, because it's Luca of Ferrari and he's talking about the know roles. Are they users orgroups? And really they are both users and groups. And a role is an abstraction for saying it's acollections of permissions to do some stuff. So basically it's a collection of permissions.
[00:07:03] A role can be logged into or cannot be logged into. And this goes into more depthabout inheritance and non inheritance and how things work. So he goes over a lot of differentexamples and mentions how PostgreSQL by default now actually grants inherit to users addedto a role so they have the permissions of that role. But you can also do no inherit as well if that'ssomething you desire. So if you're wanting to get more in depth knowledge about roles and howthey work with users, definitely two blog posts to check out. And I should mention the first blogpost about roles is actually from CyberTech postgresql.com.
[00:07:43] The next post is introducing Hyperscale Citus on Azure database for PostgreSQL. Sothis is a post on the Citusdata.com blog. And a number of months ago, Microsoft acquired Citisdata. So this looks like one of the first big things they're introducing as a result of that purchaseis that they've introduced a new product called Hyperscale Citus that's part of their Azuredatabase platform and essentially takes Cytus and makes it more convenient easy to use withAzure. So this was just basically an announcement post to that fact. I'm talking a little bit abouthow Citus works. It allows scale out of hundreds or more nodes of PostgreSQL and do scale outand execute queries in parallel. Now they go into more depth about that in this YouTube video.That's called building PostgreSQL apps at any scale with hyperscale citus. And this is on theMicrosoft Developer Channel. Now, this is actually a variation that Cytus has done presentationson before called PostgreSQL at any scale. So this is a variation of that talk. But they'reincorporating it as the final step using Hyperscale with Cytus so it talks a little bit about Cytusand how it does scale out. It talks about the Hyperscale offering that's part of Azure, as well assome additional features and things they've built into it with just a little bit of general advice onhow to get started with PostgreSQL. So if you're interested in learning more, this is apresentation to check out. There's also a shorter video that's only about four minutes in length,also related to Hyperscale and Citus called Unleash Analytics on Operational Data withHyperscale. Citus on Azure database for PostgreSQL. So again, another piece of content ifyou're interested in checking out t --hese new offerings.
[00:09:40] The next post is PostgreSQL version twelve new feature optimizer support forfunctions. So this is from Cybertechn Postgresql.com and they're talking about functions asblack boxes, meaning the optimizer. They say, quote here, that PostgreSQL optimizer can'treally do a lot about functions, so it can't understand basically what they're doing. And forexample, if you do an Explain select all from unnest an array, it can't look inside the array to seehow big it is. And it's just the planner is just giving you an arbitrary amount of 100 here. Butversion twelve has added support for this support function. Now, I believe, if I'm interpreting thiscorrectly, is that you actually have to use some C code to kind of do this function. So at thisstage it seems like this may be of benefit for internal usage, like internal functions forPostgreSQL. I mean, you could clearly probably build your own, but it looks like some of theadvantages could be done with the internals tool. So for example, that same example we justlooked at in version twelve, you actually get the correct estimate. So again, we have three unitsin a row doing an unnest and you can see the estimate is three rows. So being able thesesupport functions to be able to look inside of what functions are doing from the planner'sperspective could be of further benefit for performance for version twelve and moving armedfrom there as different functions are looked at and optimized. So if you're interested in that,definitely a blog post to check out.
[00:11:16] The last post is PG backrest. A great backup solution and a wonderful year of growth.So this is talking about essentially PG Backrest backup tool for PostgreSQL. They go over theinstallation, how you from packages or from source if you want to, how to configure it, do abackup and then do a restore of that backup. So if you're wanting to get started with PGBackrest, definitely a blog post to check out.
[00:11:43] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --