Showing Plans, Atomic, Configuration, Migrations | Scaling Postgres 107
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss how to show live query plans, the importance of atomic operations for scaling out, configuration options and DBaaS migrations.
Content Discussed
- pg_show_plans: Watching execution plans in PostgreSQL LIVE
- Optimizing SQL – Step 1: EXPLAIN in PostgreSQL – Part 1
- Atomic commit and Atomic visibility for PostgreSQL. Explained!
- What's wrong with Postgres?
- [YouTube] How to use postgresql.conf to configure and tune the PostgreSQL server
- A Tale About Migrating a PostgreSQL Database Between Different DBaaS Providers
- Embedded SQL in C for PostgreSQL with ecpg
- Logical Replication Between PostgreSQL and MongoDB
- Developing PostgreSQL for Windows, Part 3
- PostgreSQL GSSAPI Authentication with Kerberos part-2: PostgreSQL Configuration
- Dave Cramer
- Tile Serving with Dynamic Geometry
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PG_SHOW_PLANS: WATCHING EXECUTION PLANS IN POSTGRESQL LIVE". This is from cybertec-postgresql.com and they're talking about a new extension that I believe they've released called pg_show_plans. It allows you to see active query plans. They mentioned there will be some overhead to this, but the performance overhead of pg_show_plans will be discussed in a future blog post. But this tells you how to get started using the extension. So you have to clone the repo from cybertec-postgresql.com. It shows you how to set an environmental variable and do a make and install. Basically, they give you all the commands. And then you need to add it to your shared preload library. So it will require a restart. But once you get that setup, you can look at query plans.
So they did an example here where they had to set up a long-running query with a pg_sleep. And then they queried the view, essentially, pg_show_plans. So you see the actual query you're running. Here is the SELECT ALL from pg_show_plans. That's why this record is being returned. But then the next one is this, SELECT ALL from pg_stats. So you can see the exact query that's being used for this. So in the long-running query, you can get what that plan is for that query using this extension. So this is really great. I could see a lot of use cases where this could be beneficial. The only question is what's the performance hit like that they're going to cover in the next blog post. What would be interesting is that adding it to the shared preload libraries requires a restart, I wonder if there's a way to activate it or inactivate it that would just require reload, but a very interesting new extension. So if you want to learn more, go ahead and check out this blog post.
The next article is "Optimizing SQL - Step 1: EXPLAIN in PostgreSQL - Part 1". This is from highgo.ca. This is a basic post explaining how EXPLAIN works. So if you're using the extension from the previous post, this can tell you how to interpret the results. So they start from a very basic start explaining how the query processing works, running through each of the stages. It shows you a basic EXPLAIN command and then goes over and explains the different outputs and what they mean. They also cover things like you need to be sure that your table is analyzed to get the most accurate reporting when doing an EXPLAIN.
And also that EXPLAIN ANALYZE actually runs the query for you. But what's also interesting in the last part of this blog post is that they're building a query from a given plan output so that they're looking at the plane and then reverse engineering what that query is and telling you how you go through and do it. So I've never seen this used before, but it is an interesting technique to really kind of get to understand EXPLAIN output. So if you're interested in learning more about this, definitely check out this blog post.
The next piece of content also from highgo.ca is "Atomic commit and Atomic visibility for PostgreSQL. Explained!". So this is talking about the future where we want to move towards a scale-out scenario where there are multiple PostgreSQL instances that essentially are acting as a common database and you're using Foreign Data Wrappers and storing part of the data on each of those PostgreSQL instances. Well, in that scenario, you're going to want to have an atomic commit. So basically you have a commit that occurs across multiple foreign servers, essentially.
So they go through the process and explain why that's important, where you're going to have to do a prepared transaction in each of them. They're calling participants, but they could be the foreign servers. They're going to return a result and then they're going to go ahead and commit or roll back the terms of the decision resulting in the result of that transaction. So essentially a two-phase commit and how you need that to happen at an atomic level. All or nothing. Then he talked about atomic visibility, meaning when you do a query, you want atomic visibility across the different foreign servers, and how this is a feature that's going to be important.
Again, they graphically demonstrate you have one DB node here, a second here, and then depending on where different things happen, you could have a commit occur at a different timeline. You want this visibility to be atomic. So you need to know whether something has been committed or not across the whole cluster of machines and then they go into a little bit about global snapshots. So again, this is kind of forward-looking and planning for this feature. So if you're interested in that, definitely check out this blog post.
The next piece of content is "What's going on with Postgres?" This is from mydbanotebook.org and this is a follow-up post to the one that was mentioned last time: "Where is Postgres running?". This one is basically how he interprets determining what's wrong with Postgres if you look in the log file. He says on Debian and friends, like Ubuntu, it's quite easy. You just run this command to find out where the log file is. It gives the output of pg_lsclusters and gives you exactly where the log file is. However, for Red Hat and friends, it's a little bit harder and it goes through some different techniques you can use to determine where the log file is. And he even has this flowchart helping you to determine it. So it definitely seems to be a lot easier to do this with a Debian. So if you're interested in learning more, check out this blog post.
The next piece of content is actually a YouTube video. This was posted on the EnterpriseDB YouTube channel. "How to use postgresql.conf to configure and tune the PostgreSQL server". So it covers postgresql.conf as well as a great many different configuration options that are available. Then finally it also goes into actual tuning recommendations as well, and some different tuning tools you can use. Now the tuning recommendations start at around the 38-minute mark, but prior to that, they go through these other topics in this webinar. So if you're interested in that, go ahead and check out this YouTube video.
The next piece of content is "A Tale About Migrating a PostgreSQL Database Between Different DBaaS Providers". This is from percona.com. So apparently a customer wanted to migrate from one database as a service provider to another, and because these are essentially walled gardens or platforms, they couldn't do a direct replication. So what they did is they created an intermediate database instance. So basically they had the source database as a service database. They replicated to an intermediate server, basically using the archived WAL files to replay what it was here, and then they used that to actually finally load it into the target database as a service provider. So it's an interesting discussion of the process they use to do that. So if you have a potential need to do that, definitely a blog post you may want to check out.
The next piece of content is "EMBEDDED SQL IN C FOR POSTGRESQL WITH ECPG". This is from cybertec-postgresql.com. So you can within the C language embed SQL and they give an example of a sample embedded SQL program here. So this bypasses libpq and uses this embedded SQL technique, and it uses the ECPG capabilities within Postgres. So as it says down here, the preprocessor is called ECPG and part of the PostgreSQL core distribution, and I assume this means embedded C for Postgres of sorts. Then in the conclusion, they talk about some of the disadvantages of this technique versus the advantages. So if you have a desire to potentially do this, definitely a blog post to check out.
The next piece of content is "Logical Replication Between PostgreSQL and MongoDB". This is from highgo.ca. It looks like they're using a change data capture method and a MongoDB decoder plugin they developed to be able to stream data from PostgreSQL servers into MongoDB. So if you have this particular use case, you may want to check out this blog post.
The next post is "Developing PostgreSQL for Windows, Part 3". This is from 2ndquadrant.com. This is part three of the post about developing for PostgreSQL on Windows.
The next post is "PostgreSQL GSS API Authentication with Kerberos part-2: PostgreSQL Configuration". This is from highgo.ca. Now, last week we presented part one, which described just setting up Kerberos on Ubuntu. The second one actually sets up the PostgreSQL configuration. So they're talking about setting up the key file, doing the PostgreSQL client authentication, the username maps, and user authentication. So if you have a desire to use Kerberos with PostgreSQL is definitely a blog post to check out.
The next piece of content, the PostgreSQL person of the week is Dave Cramer. So if you're interested in learning more about his work and contributions to PostgreSQL, definitely check out this blog post.
The last piece of content is "Tile Serving with Dynamic Geometry". This is from crunchydata.com. Again, this topic covers the pg_tileserv. They're a microservice for interfacing with Postgres. This one talks about building dynamic geometry such as the series of hexagons here and how you can do it using this tool. So if you're interested in this content, definitely a blog post to check out.