Replication Conflicts, Postgres Governance, pg_crash, MobilityDB

In this episode of Scaling Postgres, we discuss replication conflicts, how Postgres is governed, a pg_crash utility and tracking temporal spatial data with Mobility.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right, I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is that new versions of PostgreSQL have been released from 9.5.24 to 13.1. This post is from postgresql.com. Now, it seems like the main reasons that these updates were published were due to three security issues that were mentioned at the top of the newspost here. But of course, it also includes a number of bug fixes and improvements that are as well listed right here along with the links to the release notes. So it looks like due to the security issues you should probably upgrade sooner as opposed to later.

The next piece of content is "DEALING WITH STREAMING REPLICATION CONFLICTS IN POSTGRESQL". This is from cybertec-postgresql.com and they're talking about replication conflicts. Some conflicts that can occur when you have replication set up are snapshot replication conflicts. An example that they mentioned here is that you have a vacuum process on the primary and it removes dead tuples and that action is replayed on the standby. Now any queries will have to be canceled. There are also lock replication conflicts. For example, the primary takes an access exclusive that interferes with other queries that are ongoing on the standby. Therefore they have to be canceled for that to take place. Then they also mention buffer pin replication conflicts. So those are the three most common that would cause query cancellations on a standby.

But they also mention a few other rare ones such as deadlock replication conflicts, table space as well as database. Now they do mention there is a view on the standby where you can look at these database conflicts and it's called pg_stat_database conflicts that you can query so you can determine what kind of database conflicts are happening for your standby. Now, in terms of controlling these query cancellations, one parameter you can adjust is the max_standby_streaming_delay. This is how far the standby is allowed to be out of sync from the primary in terms of the number of milliseconds. They show an example that a query gets canceled and it gives some details that the user query might have needed to see row versions that must be removed and that's an example of a snapshot replication conflict. Now, this max_standby_streaming_delay has a default value of 30 seconds.

So the standby has 30 seconds to apply those changes from the primary. By adjusting this value, you can determine whether you want the standby to keep in close proximity to where the primary is or to have it be a little bit further out so it can handle longer queries. They talk about some of these use cases such as high availability. So that means you have a standby that is running very close to the primary and in that case, you want to keep the streaming delay low. Another scenario is you just want to offload big queries that may take a long time to run. Well then, you probably want to have a higher max standby streaming delay for that replica or that standby. Another scenario they mentioned is horizontally scaling. This is where you have one primary that does read-write and you have multiple standbys that you can do read-only queries against.

Now, unfortunately, they mentioned with these particular settings, there are really no good settings and if you want to do this, perhaps you want to look into using synchronous replication with a remote apply set on your standby. So that way you'll be confident that as soon as data is inserted in the primary, it is reflected in all of the replicas. Now, the downside of that is it's a performance set because it actually waits for everything to be in synchrony across multiple databases before it returns a valid confirmation to the client. So doing something like synchronous commits with a remote_apply definitely hits your performance but it definitely assures that the data is in multiple places at one time. Now, they don't cover that too much here, but that is an option.

Now, they say one way to avoid some of these replication conflicts is by disabling the hot_standby. So that means you don't have any queries hitting the database, essentially. Another way is avoiding lock conflicts. So basically avoid some of these operations on the primary when you don't want any queries to be canceled on the standby. They also mention for completeness that vacuum will also take an ACCESS EXCLUSIVE lock to see if there's any pages that can be given back to the operating system. You may want to set vacuum_truncate off for tables because that avoids this truncation step which is another ACCESS EXCLUSIVE lock that needs to be set, albeit temporarily. So that's something you may want to consider doing.

Things you can do to avoid snapshot conflicts is actually turn on hot_standby_feedback. Although the disadvantage of course is that this can cause bloat on the primary because you're essentially having the standby communicate to the primary 'Hey, don't get rid of this. Yep, because I'm still using it and I need it for a query'. It's also possible to increase the vacuum_deferred_cleanup_age but again, this could also lead to table bloats as there are some disadvantages with setting these two. Then they have a conclusion basically that one of the best ways to avoid these replication conflicts is to have dedicated standby servers. One for high availability purposes and one for offloading queries and or backups. So if you're interested in learning more about some of these features and settings that you can make, go ahead and check out this post.

The next piece of content is "IS IT TIME TO MODERNIZE THE PROCESSES, STRUCTURE AND GOVERNANCE OF THE POSTGRESQL CORE TEAM?". This is from postgresql.fund and in light of the new members that joined the core team and as a reflection of EDB acquiring 2nQuadrant, this article goes into more of a discussion about the governance of PostgreSQL as a whole. Now, it mentioned some things that I found beneficial, and part of the reasons that they want to potentially examine this is looking at how Postgres is being governed from a company influence perspective, from a diversity perspective, a democracy perspective, and a transparency perspective. So I definitely felt the article is a good read and I think examining how the community is structured could always be beneficial and how we can make improvements. So I definitely encourage you to read this blog post and think about how maybe the PostgreSQL community could be improved from a governance perspective.

The next piece of content is "PG_CRASH: CRASHING POSTGRESQL AUTOMATICALLY". This is from cybertec-postgresql.com and is talking about an extension that purposefully crashes the database. So you load it into your shared preload libraries and you emit signals to the background worker to set a crash with a particular delay. Now, of course, they advocate this for QA environments, but it's basically a way to test out your, say, high-availability infrastructure to make sure that you can handle downtime gracefully. So, this is definitely an interesting tool I had not heard of. So if you want to learn more, definitely check out this blog post.

The next piece of content is "Analyzing GPS trajectories at scale with Postgres, MobilityDB, & Citus". So this is from citusdata.com and they're talking about MobilityDB, which is a software tool that they say supports temporal and spatiotemporal objects and works with the Postgres database and its spatial extension Postgres. So it kind of works with Postgres and PostGIS and essentially, what it does is it not only judges distances from one point to another but also at a point in time making those calculations.

Now, they made a point that you can just, with pure PostGIS, answer the same types of questions. So for example, a bus is traveling this route. At what point will it see, say, this billboard or this billboard, and during what period of time and for how long? But doing it in that way has a very complex query to get that answer. With the MobilityDB tool added to it, the queries become much easier to work with and you can get something like this so much simpler than the string of CTEs that were presented above. So this article goes through the use cases and how they're using it to track the spatial temporal visibility using MobilityDB. So if you're interested in that, definitely check out this blog post.

The next piece of content is "Postgres Constraints". This is from mokitkhare.com and this is a concise post, talking about the different constraints from a primary key constraint, foreign key constraint, check constraints, unique constraints, NOT NULL constraints, and exclusion constraints. For each one, he gives an example with code about how it is and what it does. So if you're interested in that, definitely check out this blog post.

The next piece of content is actually on the Percona YouTube channel. Again, they continue to post videos to their YouTube channel with regard to Percona Live Online they have a number of PostgreSQL posts you may want to check out such as WAL Commit and Synchronization Optimization Opportunities, pg_stat_monitor, as well as others. So if you're interested in more video content you can check out this link.

The next piece of content is another YouTube video called "Transaction Isolation Levels with PostgreSQL as an example". This is from the mkdev channel. So it goes through the different transaction isolation levels using Postgres as an example. So this is a very short video at only eight minutes, so if you want to learn more about that you can check out this video.

The next piece of content is "Postgres Streaming Replication on Windows: A Quick Guide". This is from blog.crunchydata.com. So most of the examples are done in some form of Linux, but this one talks about how to do it on Windows. So if you want to learn how to do that, you can check out this post.

The next piece of content is a webinar from 2ndquadrant.com. It's called "COMMIT Without Fear - The Beauty of CAMO". So CAMO refers to Commit At Most Once. So this is talking about Postgres synchronous commit, how it works, and working with it. They compare and contrast it to 2ndQuadrant's BDR product or their bi-directional replication, basically, their multimaster product that implements this Commit At Most Once. So if you're interested in learning more about this product, you can definitely check out this webinar by clicking the link here.

The next piece of content, the PostgreSQL person of the week is Kaarel Moppel. So if you're interested in learning more about Kaarel and his contributions to Postgres, definitely check out this blog post.

The last piece of content is a link to the PostgreSQL weekly news posted on the postgresql.org website. So in addition to listing some product news, this is definitely the place to get all the different patches that have happened with Postgres over the past week. So if you're interested in that, definitely check out this blog post.

Replication Conflicts, Postgres Governance, pg_crash, MobilityDB | Scaling Postgres 140

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript