background

Postgres 13 Released, What's New In Postgres 13, Debugging PL/PGSQL, Monitoring | Scaling Postgres 133

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the release of Postgres 13, what is new in Postgres 13, how to debug PL/PGSQL and Postgres monitoring best practices.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your family, friends, and coworkers continue to do well. Our first piece of content is "PostgreSQL 13 Released". This is from the postgresql.org website. So it was released on the 24th. This news announcement goes over some of the new features such as deduplication in B-tree indexes, an extended statistics system to better communicate statistics, dependencies between columns for OR clauses and ANY all lookups. Improvements with hash aggregate functionality that allows large aggregates and grouping sets to be done to disk as opposed to just to memory. Partition table enhancements with regard to improved pruning, as well as being able to do more partition-wise joins, a paralyzed vacuum for indexes. Again, not on autovacuum, but when you manually run vacuum. Replication slots allow you to define how much of the WAL to retain so you don't run out of disk errors on your primary. 

Then they mentioned some application development conveniences such as the daytime function to SQL/JSON path support and included gin random UUID function so you don't need an extension partition. Supporting logical replication as well as before row-level triggers and then the FETCH FIRST with ties as well in addition to some security enhancements. Talking about trusted extensions, being able to require channel binding with SCRAM authentication and additional features such as that. So I encourage you to check out this post as well as the release notes link here. That of course goes into detail with all the different features that have been released with PostgreSQL 13. Now, there are a number of other posts I'm going to follow up that cover essentially what's new in PostgreSQL 13.

The first one is "Lessons Learned from Running Postgres 13: Better Performance, Monitoring & More". This is from pganalyze.com. So they did this in the staging environment, the beta, and I believe, the release candidate. With regard to index deduplication, they found about a threefold improvement in space savings, which again could be a huge win for performance because with smaller indexes, there's a higher probability they're going to be cached and of course smaller, so it's much easier to search through it. Then they talked about the extended statistics improvements and some of the benefits 13 brings with them. They talk about the parallel vacuum and again how this doesn't apply to autovacuum, but it does work when you run a manual vacuum because they assume you want to get the maximum performance by running that vacuum fast. 

Although I wonder if they're going to make autovacuum support parallel vacuum in the next version or the version after. They talked about the incremental sorting improvement, where if you presort something and then you limit it. Or do another sort and have sub-queries, the planner will detect it's already been sorted and then won't try to resort to the whole thing again. So it supports an incremental sorting to give you better performance. They mentioned some of the monitoring changes where you can actually track WAL usage to see how much WAL a given query is using. So for example, here they did a DELETE FROM users and they asked for the WAL and you can see how many wall records were generated, how many were full-page images, and the number of bytes involved. Then, in particular, pg_stat_statements can also track this information as well. 

So now you can kind of get insight into what's written on all of these occasions. And with Postgres 13 it should be much easier to find out this information. Then they talk about the sampling statement logging that is supported now. So you can take a sample, not just log all slow statements, but it allows you to take a sampling and then even log parameter changes for failed statements so you can get more insight into why a statement failed. Then when you're analyzing, you can also include buffers to see, for example, what has been hit with regard to shared buffers. And this is also reflected in the pg_stat_statements as well. Then they cover the inclusion of the gloss suri, the UUID automatic support, and some psql improvements. So again, another great post talking about the new features in Postgres 13.

The next one is a post "What's New in PostgreSQL 13" from enterprisedb.com. They talk about the faster vacuum because of being able to do the indexes in parallel. They talk about the security features of avoiding man-in-the-middle attacks. Now that the client can do channel binding with SCRAM authentication, being able to reduce errors in production. So they have a pg_catcheck check utility to be able to check your catalog and make sure everything is good. Being able to verify backups because of the new included pg_verifybackup utility that's included so you can verify you have all the information required to be able to restore a full backup. Going back to the joins of partition tables; this can happen more frequently now, as well as logical replication for partition tables. Then they also mentioned the deduplication of B-tree indexes as well as other security enhancements.

So the next post also covering new features is "New Features in PostgreSQL 13 [Full Webinar Video]". This is from 2ndquadrant.co. There's a full webinar video, I included the video in the links for YouTube. It's about an hour and 18 minutes in length. It covers things like partitioning enhancements, B-tree duplication, incremental sorting, hash aggregation enhancements, the backup manifests which enable that verify backup, trusted extensions, replication time size limits to avoid running at disk space on your primary, the parallel vacuum capabilities, security enhancements, as well as general improvements for large databases. So a lot, a lot of content related to all the new features in Postgres 13 that you can check out and analyze and see which ones you want to use when you decide to upgrade a Postgres 13.

The next piece of content is "DEBUGGING PL/PGSQL: GET STACKED DIAGNOSTICS". This is from cybertec-postgresql.com and they're talking about you having a function and they called it a broken function which does a divide by zero which results in an error. This is called from within another function but you can't really know why it crashed and you don't get a backtrace or a debug of the code. He says one way you can do this is to create another function called get_stack, although you can name it whatever you want to and it runs the function of interest. Then it has an exception where you do the get_stack diagnostics and then print out information related to it. So when you wrap your function in something, as he has done here, you get this type of output and you can see what the message was. It was a division by zero, the context related to that division by zero, and then what line numbers in terms of the stack trace things occurred on. So this is a great way to help you debug your PL/pgSQL code. So if you're interested in that, check out this post.

The next piece of content is actually a YouTube video and it's "Monitoring PostgreSQL- PostgreSQL Talk". This is from the Percona YouTube channel and it's given by Jason Yee at Datadog. So they talk about different aspects of monitoring your PostgreSQL installation. In addition to iterating, some things you should be monitoring. It also talks about more best practices with regard to doing it. So if you're interested in learning more about monitoring, you can check out this YouTube video.

The next piece of content is "Simple Anomaly Detection Using Plain SQL". This is from hakibenita.com and he's talking about a technique he used to detect anomalies in a set of log entries. Like they were looking for status codes and they wanted to be, say, alerted to it. So he goes through the process of doing that first. How does he detect anomalies? So he looks at it and then does a calculation in SQL to get a standard deviation. Then from the standard deviation, he then gets a z-score. So a z-score helps you determine how many standard deviations each data point is from the mean. As you can see in this post here, in the z-score, the only one that is less than or greater than one in terms of standard deviations is this 12 value. So clearly it is the outlier. So essentially he's using outliers as a term for an anomaly. 

Then he goes into analyzing the server log and the different status codes and he goes through the process of getting those standard deviations, calculating a z-score form so that you can then identify the status codes. And then the question becomes what z-score should you use? Should you use one standard deviation from the mean? Two, three, or four? What makes sense? And he advocates doing backtesting. So backtest previous log information that you've been collecting to determine what you're seeing and what you want to alert on. What do you call an anomaly or not? And he goes through the process of determining that for his set of data and how to eliminate essentially repeat alerts. Then he goes into talking about increasing accuracy through weighted means, potentially using a medium or the median absolute deviation. So this does involve a little bit of statistics, but a lot of SQL code in terms of doing this type of anomaly detection. So if you're interested in that, you can check out this post.

The next piece of content is "In-Memory Table with Pluggable Storage API". This is from highgo.ca, and this is a process they're developing that I assume might be an extension to Postgres or a pluggable storage solution that only uses memory to store tables and information. They basically want to have the buffer become that memory storage mechanism, or at least intercept its right to the disk and store it in this memory structure that they're creating. So they've gone through the process of developing an initial prototype for it and for 20 million rows, they're seeing a 50% performance increase for INSERTS, a 70% increase for UPDATES, a 60% for DELETES, and a 30% increase in vacuum operations. So that's a pretty good improvement, but it's something they're working on. If you're interested in learning more, check out this post.

The next piece of content is "Announcing Crunchy Bridge: A modern Postgres as a service". So blog.crunchydata.com has released a service called Crunchy Bridge, which they're doing a Postgres as a service that provides production-ready Postgres out of the box, does automated backups, a high availability point in time recovery, and then encryption in transit and at rest. This is supported on AWS and Azure right now, and they can basically work between different clouds if that's something of interest to you, or on-premise solutions. So if you're interested in a managed solution that is cloud-agnostic, maybe you want to check out this solution.

The next piece of content is "Postgres Data Types in PL/Python". This is also from blog.crunchydata.com. They're talking about translating Python data types to Postgres data types used in PL/Python. So if you're interested in that, you could check out this post.

Next piece of content is "Using Postgres and pgRouting To Explore the Smooth Waves of Yacht Rock". This is from blog.crunchydata.com. Basically, pgRouting is typically used for geographical information system calculations, but you can use it when you have any sort of interrelationships between pieces of data, so anything that defines relationships. So you can look at one artist here at one hop, and then go to two hops, three hops, four hops. They're using the pgRouting extension to do that. So if you're interested in doing this type of analysis, check out this post.

The next piece of content is "Diary of an Engineer: Delivering 45x faster percentiles using Postgres, Citus, & t-digest". This is from citusdata.com. Now, this particular post is talking about a use case with Citus, which is an extension for Postgres that enables you to scale out Postgres. But they're talking about an extension called t-digest for doing estimates of percentile calculations and how with it they were able to achieve a 40x improvement in performance. So if you calculate these types of percentiles, maybe you want to check out this post.

The next piece of content is "How we are building a self-sustaining open-source business in the cloud era (version 2)". So this is from blog.timescale.com, and it's for TimescaleDB, an extension that does time series data in Postgres, and they've basically altered their license to make it more applicable to more use cases. So if you're interested in a time series database using Postgres, maybe you want to check out this post to see what they're planning.

The last piece of content, the PostgreSQL person of the week is Dan Langille. So if you're interested in learning more about Dan and his contributions to Postgres, definitely check out this blog post.

episode_image