Logical Replication, Prepared Transactions, Ansible, Partitions

In this episode of Scaling Postgres, we review articles covering logical replication, prepared transactions, Ansible Postgres deployment Ansible and tidy partitions.

Content Discussed

YouTube Video

Podcast Audio

Transcript

[00:00:00] In this episode of Scaling Postgres we talk about logical replication, preparedtransactions, ansible and tidy partitioning. I'm Kristen Jameson and this is Scaling Postgres,episode 43 Postgres.

[00:00:21] I wanted to make an announcement first that Scaling Postgres will not be presentednext week due to the holidays. So I hope everyone has a great holiday next week if you arecelebrating it and we'll catch up on the content the week after. In terms of this week, our firstpost is using PostgreSQL logical replication to maintain an always UpToDate read, write testserver. And this is from the Several nines.com blog and the first part they talk about is justlogical replication, what it is, how to set it up. They go into a little bit of the logical replicationconcepts and then they talk a little bit about caveats. And this is the part that I found interestingbecause if you set up logical replication, the subscriber for that logical replication can essentiallybe read, write and you run the danger of running into conflicts. Maybe you are using sequencesbut the problem is that one of the restrictions they show here is that sequences are notreplicated. So you could run into say a primary key conflict and it talks about the ramifications ofthat is that actually logical replication stops and you could start building up wall files in thepublisher to the point where you run out of disk space. So logical replication is still relatively newas of version ten and improvements have been made to eleven. But there's still some care youneed to take, particularly if you're going to be trying to use one of the subscribers for writing tothe exact tables that it's replicating. So this post talks about some of those issues to be aware ofand it goes through the process of setting up logical replication and how you want to be aware ofpotential errors that happen and fix them as soon as you're able to. To avoid problems. It goesinto a little bit how to deal with primary keys with sequences and also discusses some goodpractices including how to monitor the solution. So if you're just getting into logical replicationthis would be a good blog post to check out.

[00:02:27] The next post is Be prepared for prepared Transactions. And this is from theCybertechgresql.com blog. Now they're talking about prepared transactions. These are notprepared statements where essentially you parse a statement and then you can just execute itlater to make queries or statements faster. This is talking about prepared transactions which areactually two phase commit transactions. Now according to the PostgreSQL documentation theyhave a note here prepare Transaction is not intended for use in applications or interactivesessions. Its purpose is to allow an external transaction manager to perform atomic globaltransactions across multiple databases or other transactional resources. Unless you're writing atransaction manager you probably shouldn't be using Prepare transaction. So a lot of warningswith this. And it seems like 99% of us will never use this particular feature, only if you're doing adistributed transaction manager. So that's just something to keep in mind. Now, this particularblog post is talking about being prepared for prepared transactions. If those happen to be turnedon and enabled, they are disabled by default. How can you get around it? So this blog post talksabout, okay, what are prepared transactions, what's the use case, which I kind of mentionedsome of it here the problems that can happen, and talks a little bit about the implementationdetails because actually some things for these transactions get written to disks. So therefore it'shard to get rid of these orphaned prepared transactions. And they give an example. Here howyou can lock up your database with prepared transactions. So basically, you can start atransaction, create a lock on the PG Auth ID table, and then prepare transaction locked and thendisconnect from the database. Now, PG underscore Authid contains database users, and sinceit's locked, all future connection attempts will hang and restarting the database won't helpbecause the prepared transaction is still retained. They even tried starting single user mode, andthat won't work either. And basically the solution that he came to is actually Deleting, where itgets committed in the database files in PG data to remove this prepared transaction. So again,this should be a super rarely used --feature because getting out of problems with it seems like a big issue. But if this is somethingof interest to you, definitely a blog post to check out.

[00:05:09] The next post is Video ansible and PostgreSQL. And this is from the secondquadrant.com blog. And this is a presentation they put up about 30, 40 minutes in length talkingabout using Ansible to deploy PostgreSQL. Ansible, if you're not familiar with it, is aconfiguration management tool similar to Chef or Puppet, but I find it a lot easier to use. Andactually, I've been using it for a number of years, and it's how I deploy my systems, includingconfiguring PostgreSQL. Now, this presentation is a little bit on the basic side, but it goes overusing ansible and how you would potentially deploy a set of, say, five different servers, a primary,a Replica with a backup server, and then another Replica with a backup server, and potentially,how you can distribute that across multiple data centers or availability zones. So if you're lookingfor a potentially different deployment solution for your PostgreSQL database instances, maybecheck out Ansible and check out this presentation. The next post is keeping postgres tidy withpartitioning, and this is from the Dataegrit.com blog. And he's basically talking about howpartitionings can help you with data management. So if you have say a history log or an eventlog and you periodically want to remove or delete that data or it has some sort of telemetry to itand in this case they have a looks like a 2 billion rows maybe and you want to clean out the oldevents using a delete by a certain date. It's going to take forever to run. And he says, quote, thequery would take twelve minutes to complete and it would generate a lot of write ahead log files.Now, even after you delete it, you still have a huge table on your hand. So all those deletestatements really haven't freed any space. And you'd have to use one of the tools that enablesyou to compact the table, such as you could use vacuum full but that locks the table, or one ofthe other third party tools that enable you to do it while the table is still accessible. But if youwould use partitioning and partition by month or by year in this case month would probably bethe best thing, then you could simply drop that partition and multiple partitions and all that datawould just go away and you would reclaim all of your space. So definitely something to keep inmind on how partitioning or partitions can help you keep postgres tidy. The next post ismonitoring PostgreSQL wall files and this is from the PG IO blog and they talk about what wallfiles are. They're the write ahead log. It's basically a log of every activity occurring, talking aboutwhy you should you monitor them. And basically the greatest risk is running out of disk space.So basically I don't really monitor the number of wall files like they are talking about here.Generally what I monitor is the disk space. So where are those wall files getting created, whereare they being archived to? And that is what I monitor to notice things. Some of the problemsthey're talking about here, such as archival failures, replication failures, or even a lot of longrunning transactions can all cause wall files to build up and use a lot of disk space. But they dohave some interesting things here where you can get a query to get account of wall files. So Ithought this was particularly interesting. So if you're interested in that, definitely a blog post tocheck out. The last post is getting started. Running PostgreSQL on Kubernetes. And this is fromthe Crunchydata.com blog. Now last week in the previous episode of Scaling Postgres, wetalked about using Kubernetes to build your own database as a service essentially forPostgreSQL where you can run and monitor multiple PostgreSQL database systems. And therethey were basically building it out themselves. But Crunchy Data here has a few tools thatthey've come up with to help with this process. So again, if you're interested in using Kubernetesto potentially manage and deploy multiple PostgreSQL database servers, then definitely a blogpost to check out.

[00:09:25] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingposgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

Logical Replication, Prepared Transactions, Ansible, Partitions | Scaling Postgres 43

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript