Logical Replication, Monitoring Queries, SQL Beautifier, Incomplete Startup Packet | Scaling Postgres 93
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss logical replication aspects, how to monitor queries, investigating a SQL beautifier and removing incomplete startup packet.
Content Discussed
- Additional Advisory to 2019-11-14 Cumulative Update Release for Debian and Ubuntu Users
- Interesting aspects of logical replication at a glance
- Understanding PostgreSQL Query Performance
- Abusing PostgreSQL as an SQL beautifier
- PostgreSQL 12 Improvement: Benign Log Entries “Incomplete Startup Packet”
- PostgreSQL 12 EXPLAIN SETTINGS
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is actually an advisory that was released on the postgresql.org website titled "Additional Advisory to 2019-11-14 Cumulative Update Release for Debian and Ubuntu Users". There is a disclosed vulnerability and they suggest updating your PostgreSQL common packages as soon as possible. The vulnerability is that the PostgreSQL superuser could escalate to root using a deficiency in the pg_ctlcluster command. So if you run PostgreSQL on Debian or Ubuntu, go ahead and update that PostgreSQL common package as soon as you're able.
The next post is "Interesting aspects of logical replication at a glance". This is from rafiasabih.blogspot.com and basically, he's talking about a few things to be aware of when you're using logical replication versus physical streaming replication. One of the first things to remember is you can do writes at the secondary. So not only can you replicate to it, unlike a read-only database doing streaming replication, with logical replication you can have rights going to where you could only be copying over one table or a subset of tables, or even copy over data from one database and also accept writes to that same database you're replicating to. The second thing to keep in mind is with regard to the schema in that it is not automatically all copied over once you start the replication. So basically you have to create the tables you want at the secondary before you start the subscription.
In addition, any schema changes aren't going to be replicated over either, so you need to handle those separately. So for example create them on the destination database and then make those changes to the primary database. Another thing mentioned is attribute comparison. So basically the attributes of a table are matched by name, so the column order does not have to be the same, it can be different. In addition, they mentioned here the data type can also be different as long as the text representation of the type is the same as the secondary. Plus the destination target can also have more columns than exist in the source target. So just something to keep in mind as you're thinking through how you're using logical replication sequences.
So sequences do not get copied over to the destination database. So this is something to be aware of if you're doing writes to the destination database separate from what's being replicated because you could run into an ID collision if you're using sequences. So it's just something to keep in mind. The next thing mentioned is privileges. So he says the permissions for the tables are checked only at the time of the creation of the subscription and never afterward. So something to keep in mind. Lastly, partition tables. So partition tables, the root or the parent table, are not going to be able to be replicated over, but you can create a publication for each child table. Now hopefully in future versions of PostgreSQL, they will be able to handle that as well. But overall a great list of things to be aware of if you're starting to use logical replication.
The next post is "Understanding PostgreSQL Query Performance". This is from pgdash.io and this covers a great review of things you can do to monitor your queries. The first one is you can use pg_stat_activity to see what all the current queries are running against your system, particularly if you're trying to find some slow queries. You can even see if there's certain waits that are happening or what's being locked or delayed by looking at the wait event type and the wait event in pg_stat_activity. Now that gives you a current view of what's going on in the database. If you want to look at more historical activities, there are two different things you can do. The first one is using the extension pg_stat_statements and they go through the process of how you would be able to install this extension and get working for your database. But after you do that, it does CREATE STATISTICS of queries against the system and can see how many times a particular query was called, what the query was, the total time to execute, and the minimum amount of time.
The maximum amount of time, and the mean time, and also gives you statistics with regard to how memory and disk were potentially used for those queries. So you can see how resource-intensive they were. This blog post goes through and does a good job of explaining some of these different metrics that are collected on pg_stat_statements. Now another option for capturing historical queries is turning on the slow query log by setting the configuration parameter log_min_duration_statement and putting an amount greater than -1. So in this case, if you set it to 1,000 because it's in milliseconds, any query longer than a second will be logged to the log file. The last thing they mentioned is if you want to look at execution planes, you can enable the auto_explain extension. Because then if something is triggered longer than a particular period of time, for example, 1,000 seconds, it will automatically do an auto_explain in the query log so you can look at the EXPLAIN plan for why a particular query was slow. So overall, a very good blog post explaining how you can monitor your queries.
The next post is "ABUSING POSTGRESQL AS AN SQL BEAUTIFIER". This is from cybertec-postgresql.com. They're talking about formatting a query that looks like this to something more friendly and how you can do this in PostgreSQL by designing a function that creates a temporary view. Then use the built-in function pg_get_view definition to get a friendly printout of that query. So for example, they created this function and what they do is they create a temporary view and then they print out the view and then they drop that temporary view. So for example, when they input this type of SQL, it will output it in this friendly format and it prints out all of the column names from what the asterisk is represented here. So if you're interested in potentially doing that, definitely a blog post to check out.
The next post is "PostgreSQL 12 Improvement: Benign Log Entries 'Incomplete Startup Packet'". This is from percona.com. So what they say is that there are some monitoring tools that query PostgreSQL to ensure it's running, but they're not actually doing any work. This can cause a lot of incomplete startup packets in the log. So this is an example of what it looks like in version 11, but in version 12 they say it detects that it is a zero size packet and just ignores it. So that way you can avoid these entries in your log. Now, if a nonzero packet is written, it still logs this invalid length of the startup packet, but as long as it's a zero packet size, it will still remove that log. So for example, in version 12, you're still probably going to see information like this for some tools where it says connection reset by peer. Whereas in version 11, you'll get that error, but you'll also get the incomplete startup packet, but in version 12 that will go away. So another reason to upgrade to PostgreSQL version 12.
The last post is "PostgreSQL 12 EXPLAIN SETTINGS". This is from fluca1978.github.io. So again, this is a new feature that has been added called SETTINGS. When you do an EXPLAIN plan, anything that you have set at the session level will print out that's different from what the default is. So for example, this is running an EXPLAIN plan here, not using the SETTINGS, and you see this output. He happened to print it out in YAML format, but when he added SETTINGS ON, it produced the same output. But if you change the session SETTING by setting seq_page_cost TO 3, for example, or setting the random_page_cost TO 1, and now you run that query again, these two additional entries will appear. So it shows you that something has changed at the session level. So, yet another great addition to PostgreSQL 12. If you're interested in learning more about that, definitely check out this blog post.