background

Picturing Vacuum, Recovery Changes, Cert Authentication, Function Replication | Scaling Postgres 74

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss vacuum through pictures, upcoming recovery changes, certificate authentication and how functions get replicated.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is a PostgreSQL award and this was listed on 2ndquadrant.com. But basically, PostgreSQL was presented the 2019 O'Reilly Open Source Award for lifetime achievement. And it looks like Mark Wong, Bruce Momjian and Christophe Pettus accepted the reward on PostgreSQL. So definitely quite a recognition.

The next post is actually a webinar that you can sign up for called "Webinar: Postgres Vacuuming Through Pictures [Follow Up]". This is from 2ndquadrant.com. You can click the link here to get access to the webinar, you need to submit some contact information. It gives a good overview of vacuum talking about its purpose, given that Postgres uses MVCC and therefore must be able to see multiple versions of rows. Once some of these old versions are done, vacuum is used to clean out those old rows. That's why they call it a vacuum. He talks also about the importance of configuring autovacuum for being able to handle bloat and some numerous settings that you can make to configure autovacuum or vacuum for your environment. Now there weren't quite as many pictures as I thought there might be, but it does give a pretty good overview of vacuum and why and how it works. So definitely a piece of content I encourage you to check out.

The next post is "Postgres 12 highlight - Recovery changes". This is from paquier.xyz and he's talking about the changes coming to how you handle recovery or standby databases with Postgres 12. Basically, there will no longer be a recovery.conf file, but all those configuration changes are now taking place within the postgresql.conf file. Some of the benefits, we actually mentioned some of these, but what he mentions here is the ability to reload some of these parameters, to be able to change them, to monitor the values that show, and to apply changes with ALTER SYSTEM. But when you move to 12, you definitely need to be ready for these changes because again, if the recovery.conf file is present, PostgreSQL will not start. So you definitely need to review these additional changes and make changes to your environment when you're planning the move to Postgres 12. So definitely another piece of content to make sure you're prepared.

The next post is "Certificate Authentication Recipe for PostgreSQL Docker Containers". This is from crunchydata.com. Now actually this isn't only for docker containers. Now they're describing how to do it within a docker container, but a lot of the information is beneficial even if you're not using docker. Basically, how do you set up certificate authentication for PostgreSQL? Now they have a lot of code here that you can directly use, but here's pretty much the process. So step one is to create a root certificate authority. So you need some common certificate authority to sign the certificates that are being used by the server side and the client side. So usually you'll want to create your own certificate authority to do this, but you can choose to purchase the certificates if you want. 

Instead, the next step is to generate the PostgreSQL server key and its certificate, then configure PostgreSQL to enable TLS and certificate-based authentication. Again, this is within the configuration area as well as changing the pg_hba.com file. And they also mentioned a few other docker commands here, but you can apply those to any system environment. Step four is generating the client key and the certificate. So these would be used on the clients to be able to connect and then they show you how to connect. Basically, they're exporting a couple of different parameters that your client uses to be able to connect to a certificate authentication-secured server. 

So if you're interested in using certificate authentication, even if you're not using Docker, this is definitely a great post to check out. Now, there was also a video linked here called "how to set up an enterprise certificate-based authentication system". Now this link goes to a YouTube video talking about advanced replication. So they talk about using all sorts of other authentication methods, but during the last part, they do describe setting up certificate authentication. So this video is another piece of content you can check out to help you set up certificate authentication.

The next post is "POSTGRESQL: HOW ARE FUNCTIONS AND STORED PROCEDURES REPLICATED?". This is from cybertec-postgresql.com. And the first thing they cover is how functions and procedures are stored. Basically in system tables. And there's a system view pg_proc that you can take a look at. So he created just a basic function here and you can see it actually stores in the source when using this procedure language 14 as SQL. It just stores the code right in the source field. But if you have procedures stored in other languages, it actually represents it a bit differently. So if your code relies on libraries, it stores it a bit differently. 

So for example, he created the pg_trgm extension and then he looked that up in the pg_proc view or system table and you see the source is actually a name, but then the library is referenced here. So now in terms of replication, the system tables of course get replicated over to replicas. So you don't need to do anything about that. But if you have a library dependency, those need to be placed on every replica you're going to use. And he says here, quote "If you create a function on the master written in C and your slave does not have the desired library, the changes in the system tables will be present - there's just no way to call these functions on the slave" or replica. So basically, the system tables get replicated, but any libraries you need to make sure are on each instance that you're replicating to. So definitely some great information to be aware of.

The next post is "yum upgrade postgresql11 panic!". This is from fluca1978.github.io. He wanted to upgrade a database system and he just used Sudo Yum upgrade PostgreSQL. But unfortunately, this database server did not start and he couldn't connect to it. Looking at it, when trying to start, it said basically the data directory is missing or empty. So he was like, what? So he said systemd was trying to launch PostgreSQL with an empty PG data directory. So he says apparently quote "...yum upgraded my systemd configuration for PostgreSQL to the CentOS default". So basically the service file was overwritten without any advice. So basically what he eventually did was use the command systemctl edit postgresql-11 and added an Environment PGDATA to state where the data directory is, and that helped it find it. So, I haven't run into this when doing upgrades, like I tend to use Ubuntu, but this is definitely something to be aware of. So definitely a blog post to check out if you ever run into this particular type of situation.

The next post is "Automatic Index Recommendations in PostgreSQL using pg_qualstats and hypopg". This is from percona.com, and they had done a previous post that we covered on Scaling Postgres, talking about hypopg, which sets up hypothetical indexes. Basically, this post walks through a process of using hypopg and pg_qualstats in order to predict or make a recommendation for what indexes should be used based upon queries that are running against the system. So ideally you would look at some queries here and it would say, okay, we recommend creating this index and it should give this percentage improvement. They say the pg_qualstats is an extension developed by the POWA team to uncover the need for storing predicates or quals. 

This is important because pg_stat_statements doesn't store the predicates, you only get the variables. So they basically walked through how to set up pg_qualstats. So for setting up then to do these types of index predictions, they needed to install these three extensions hypopg, pg_stat_statements, and pg_qualstats, and add to the shared preload libraries the pg_stat_statements as well as pg_qualstats. Then they created this quite complex function that they called find_usable_indexes to be able to consult the different tables to make the recommendations. In the end, they get some output like this, which basically gives them for particular queries what the index recommendation is. So if you want to use this potentially automated way of determining what additional indexes could help your application or system, definitely a blog post to check out.

The next post is "jsquery versus SQL/JSON". This is from thombrown.blogspot.com. Also, the name of the blog is Thom's Memory Leak and I wasn't necessarily aware of this one but their jsquery is actually an extension that you can install and he says, quote here it "was introduced as a non-core extension in PostgreSQL 9.4 and higher...". Now with PostgreSQL 12 coming, it provides a native way to query JSON data. He compares and contrasts both of these. So with the changes that are coming with SQL/JSON and Postgres 12, the reason why I'm highlighting this post is because it gives you another opportunity to kind of learn the JSON path syntax. Now he compares it to jsquery but this is basically another opportunity to learn the JSON path syntax. So if you're interested in doing that, definitely a blog post to check out.

The next two posts are related to PostGIS and as I've said I'm not an expert on that but I wanted to make them aware. Both of these are from crunchydata.com. The first post is "Waiting for PostGIS 3: ST_AsGeoJSON(record)". So apparently this is a new function that is available. So if you're potentially interested in that, definitely a blog post to check out.

The second post is "Waiting for PostGIS 3: Hilbert Geometry Sorting". They say order geometry columns will return rows using a Hilbert curve ordering and do so about twice as fast. So if you're interested in that, definitely another blog post to check out.


 

episode_image