pg_receivewal, Application Failover, pg_checksums, pgBouncer | Scaling Postgres 87
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss the benefits of pg_receivewal, how to setup simple application failover, new pg_checksums and setting up pgBouncer.
Content Discussed
- Never lose a PostgreSQL transaction with pg_receivewal
- Seamless Application Failover using libpq Features in PostgreSQL
- Auto Failover with PostgreSQL 12
- pg_checksums 1.0 released
- PgBouncer Tutorial: Installing, configuring and testing persistent PostgreSQL connection pooling
- Monitoring PostgreSQL Clusters in Kubernetes
- A Beginner’s Guide To Formatting Dates In SQL
- PostGIS 3.0.0
- Using PostgreSQL to Create an Efficient Search Engine
- The RUM index and Full-Text Search Using PostgreSQL
- PUsing the Built-In and Custom Ranking Algorithms of PostgreSQL for Full-Text Searches
- Term Frequency Statistics in PostgreSQL Full-Text Searches
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "NEVER LOSE A POSTGRESQL TRANSACTION WITH PG_RECEIVEWAL". This is from cybertec-postgresql.com, and this article talks about talks. Basically, your PostgreSQL database is constantly producing WAL files or write-ahead log files. You need to keep those around so that you can restore your database up to a point in time if you have enabled archive log mode. Now, the typical way to do it is to just archive it with the archive_command. And that command, as they mentioned here, is to copy to some local drive and then send it somewhere else, or to send it to some network attached storage like NFS or secure copy it somewhere, or sync somewhere, or even use it for some backup software. Basically, you eventually want to get it off the database into some other storage medium, so that way it's safe in case something happens to the database that it was generated on.
Now, one thing they mentioned here under the section of when archive_command isn't good enough is that you can still lose some committed transactions because it waits to fill up the whole log file, which by its default is 16 megabytes worth of each WAL file. It waits till that WAL file is full before it archives it. So you could have that partially filled, so you could be missing transactions if your system goes down and you only have what the archive_command has produced. But that's something that the pg_receivewal utility can do. Now, it's actually designed to stream the WAL files to a separate location. So it actually generates WAL files. It is not in and of itself a database, but it kind of uses streaming replication to read the WAL stream and produces the WAL files where it's running. But an added benefit of it is that it also writes partial files.
So it has an extension.partial as it's filling up a particular file as it's being streamed through the replication. And you should also note that it's called pg_receivewal in version 10. But prior to version 10, it was called pg_receivexlog. This section talks about this ability to re-partial files in that pg_receivewal is an alternative to WAL archiving that avoids the gap between the current and the archived WAL location. So that is one extra benefit. I've used it to help easily get WAL files off of the primary database server because you can set up a dedicated system to run pg_receivewal and that automatically gets the WAL files off of, say, your primary database system. It enables your primary database to have more resources to use for other purposes.
So that WAL archiving is handled by a separate system. And they say by default, the replication with pg_receivewal is asynchronous, so you can enable it to be synchronized, but if you do, it has to be written to the pg_receivewal location before that transaction can be committed. So you can slow down your write performance. If anything happens to this pg_receivewal, your system won't accept any commits at all, which essentially means you're down. So what they suggest here is to avoid this type of problem, you need at least two synchronous pg_receivewal processes. So this is a great post that talks about pg_receivewal and also some of the benefits it has over the archive_command. So if you're not yet using it, definitely a blog post to check out.
The next post is "Seamless Application Failover using libpq Features in PostgreSQL". This is from percona.com. This is talking about the ability of libpq and, by extension, the psql clients for you to insert multiple hosts that you can connect to as a part of the connection string. And it will actually try each one in turn. So it kind of gives you a way to do application failover using these because a lot of application environments use libpq to be able to connect to Postgres. You can also specify whether those connections you're making should be in a read-write state or a read-only state. So they're giving you an example here of where they have three servers, a primary, and two standbys. You can actually make a connection string like this, where you put a comma between each host you want to contact.
Then you can specify target session attributes and equal read-write. So what it's going to do is run this command on the first read-write server it finds, so you can see it's connecting to .10. The first one listed in this string here is .20, and the second one is .10, which is the read-write, the primary, so that's how it knows to go ahead and connect to it. Secondly, you can connect to any server for reads. So this one is using the target session attributes ANY and it's selecting the first one in this list and the first one in this list. So if something happened to go wrong with one of these servers, it would of course connect to the second one and then they show how they stop PostgreSQL on the .10 server.
And with this connection string, you can see it will automatically connect up to .20. Now they're saying you're going to have some sort of delay that happens as it tries to connect to that first one, but it'll eventually connect to the second one. Then they give an example using Python and PHP, how you can use these connection strings with these application environments. And of course, it exists for others as well. In this one, you can see they're specifying target session attributes being read-write as well. So this is a native function as I believe in and around PostgreSQL ten, so it gives you a bare-bones way to do application failover. So if this is of interest to you, definitely a blog post to check out.
The next post is "Auto Failover with PostgreSQL 12". This is from techcommunity.microsoft.com and they're talking about pg_auto_failover. And so with the release of PostgreSQL 12 and the changes to the recovery.conf file, utilities such as these need to be updated. So basically they've said they have updated it and their version 1.0.5 will work with PostgreSQL 12. So this seems to be a relatively simple solution for doing auto_failover. They talk a little bit about what it can do currently, as well as what they have planned for the future for pg_auto_failover. So if you're looking for a failover solution, definitely a solution to check out that is compatible with PostgreSQL 12.
The next post is "pg_checksums 1.0 released". This is from credativ.com and this is a utility that verifies, activates, or deactivates data checksums in PostgreSQL instances. So there is already a pg_checksums utility in PostgreSQL, but this one allows online verification of checksums. That's pretty much its primary feature, so you don't have to bring down the database to do the checksum process. Of course, it's compatible with all supported versions of PostgreSQL as opposed to just the more recent one where they've made some changes to the pg_checksums as well as activation of progress reporting during operation and IO rate limiting. So if you want a more robust version of doing checksums with PostgreSQL, definitely a utility to check out.
The next post is "PgBouncer Tutorial: Installing configuring and testing persistent PostgreSQL connection pooling". So this is from enterprisedb.com and this is a first, what they say is going to be a multi-part series on using PgBouncer and how to handle failovers of the database and things of that nature. So this first post is relatively simple, it's just installing a PgBouncer, connecting up to a Postgres instance, and then seeing if you bounce the network connection. What impact does that have? Can the PgBouncer still reconnect to it without losing its connection? So, a relatively simple getting started post, but I'm interested in seeing what's going to be coming in the future with the second and third editions of this blog post series.
The next post is "Monitoring PostgreSQL Clusters in Kubernetes". This is from crunchydata.com. They're talking about using the PostgreSQL operator, which manages PostgreSQL clusters on Kubernetes developed by Crunchy Data, and how they tell you how you can set it up using pgMonitor, which is an open source solution using Grafana and Prometheus as a data store to be able to track reporting information of PostgreSQL. This post goes into how to set this up for a set of PostgreSQL clusters in Kubernetes, so if that's of interest to you, definitely a blog post to check out. If you're looking at monitoring solutions in general, they do have a link to their pgMonitor as a way to build something up relatively quickly, and I believe by default it works more easily with CentOS and Red Hat Linux.
The next post is "A Beginner's Guide To Formatting Dates In SQL". This is from the blog.panoply.io. It's a very simple post, talking about timestamps, years, how to store certain dates, converting dates into a string, extracting certain parts of a date as well as truncating to certain parts of a date. So if you're interested in learning more about the date-time functions in PostgreSQL, definitely a blog post to check out.
The next post is "PostGIS 3.0.0". This is from postgis.net So now that PostgreSQL 12 has been fully released, they have released PostGIS 3.0.0. So if you're interested in this, definitely a blog post and a project to check out.
The next series of posts are a little bit older, but they're talking about a text search of Postgres. The first post is "Using PostgreSQL to Create an Efficient Search Engine". They talk about comparing the different ways you want to do a search. So for example, if you're looking for a prefix in a fuzzy query, you can use a B-tree index. Here's how you can specify your searches. If you're looking for a suffix and fuzzy query, you can do your searches this way using the reverse column expression and a b tree index. If you want to do a prefix and or suffix and a fuzzy query, you can use the pg_trgm extension and gin indexes, or you can use gin and RUM indexes for full-text search. This post goes into a very comprehensive discussion of full text searching in general, how to break apart different words, defining search, syntax, sorting algorithms, and other types of information. So if you want to get more information about text searching in Postgres, definitely a post to check out.
Follow-up posts that are related to it are "The RUM index and Full-Text Search Using PostgreSQL". This does a comparison of using storing data as an array or in a text field and using a gen index or a rum index and see how the performance changes with different types of queries.
The next post in the series is "Using the Built-In and Custom Ranking Algorithms of PostgreSQL for Full-Text Searches". So this goes into ranking algorithms using a tsvector or using a multidimensional array.
Then the last post is "Term Frequency Statistics in PostgreSQL Full-Text Searches". So how frequent or infrequent a particular term appears in a series of texts. So if you're interested in these types of Postgres, definitely ones to check out.