arm64 with apt, Contributors, Backup Manifest, Now Functions

In this episode of Scaling Postgres, we discuss arm64 package support for apt, annual Postgres contributors, backup manifests & verifications and different now functions.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "arm64 on apt.postgresql.org". This is from Christoph Berg's blog and basically, the Apt repository for PostgreSQL now covers the ARM64 architecture. So these are for ARM processors and the Raspberry Pi is a popular one. So they're not supporting all ARM processors, but they are doing ARM64 which is the 64-bit version. So these are now available and they're targeting Debian buster, bullseye. sid, as well as bionic and focal for Ubuntu. They also mentioned that the focal repositories for 24 Ubuntu are now available on the repositories as well. And a note about old PostgreSQL versions. So if you potentially want to use an ARM processor, maybe you want to check this out now.

Now, normally this has been for smaller devices, but I know that Amazon EC2 instances, not that I'm necessarily advocating them, but I know they've had A instances for a while that have a custom-built Graviton processor that uses the Arm architecture and they also have a newer m6g for medium workloads that get up to 256GB of memory. They're using their second version of the Graviton processor and they state that they deliver up to 40% better price performance over the current generation M5 instances which generally use intel chips. Now I see that it looks like there's a greater than 20% savings, so maybe there's more performance with these, I'm not really sure, but definitely having another processing architecture available is definitely interesting.

The next post is "Who Contributed to PostgreSQL Development in 2019?". This is from the Robert Haas blog. This is as he says, the fourth annual post on who contributes to PostgreSQL development. They are ranked here by looks like the number of lines contributed, the percent lines are also represented, and the number of commits. So you can see who's taking the top spots here as well as committers of non-self authored code and those people with at least 80 emails to the pgSQL hackers list. So definitely thank you to everyone who contributes to PostgreSQL and makes it what it is today.

The next post is "Backup manifests and pg_verifybackup in PostgreSQL 13". This is from 2ndquadrant.com. This is something to look forward to in 13 where they are now providing a backup manifest to track what files have been backed up as well as a means to verify the backup. So this post walks through this new feature set. So basically what is the content of the backup manifest file here? Basically, it's a JSON object with the following keys: manifest version of all the different. Files that are included in the backup. WAL ranges that include information such as the timeline, the LSN of the backup start and backup end as well as a checksum. So he goes through and runs through the process.

He does a pg_base_backup. He's not streaming here and specifying the directory. He looks at the content of the backup directory and there's a new file called backup_manifest and here's what the JSON file looks like. And then of course with this file you can then verify the backup with pg_verifybackup and he goes through and does that validation. Now he took this backup without the WAL files because he was doing those separately. That's what the X none means, do not backup the WAL files. So it's going to result in an error when trying to verify it and he actually skips the validation of the WAL files using the -n option and you can see it was verified. My assumption is that you could specify the location of the WAL files if you didn't record them, but that wasn't mentioned in this post. Here, he actually goes in and modifies a WAL file to essentially corrupt it. And you can see it does result in an error when doing a pg_verifybackup. So, definitely, an interesting feature coming in Postgres 12, and if you're interested check out this blog post.

The next piece of content is "POSTGRESQL: NOW() VS. 'NOW'::TIMESTAMP VS. CLOCK_TIMESTAMP()". This is from cybertec-postgresql.com. So NOW() is a function that returns what the current time is. Now what the 'NOW'::TIMESTAMP is actually a constant. So whenever you execute now it will return the timestamp. But this is a constant that will store the value of what the current timestamp is. The first example you show here is when it's part of a transaction, it will still return the same value for both of these methods. The function and the constant returns the exact same timestamp. Then still within this transaction, if you sleep for 10 seconds, it will still return that exact same timestamp. So essentially, time according to NOW() is frozen within this transaction. So where do you see differences in the constant?

Well, if you use it as a part of creating a table and using a default, for example, if you say DEFAULT now, it will always return what the current timestamp is. So if you examine this table you can see the default is NOW(). Whereas if you specify this constant, it will take the time now and store that for that default. So here you can see it's actually storing that time. So I think this is probably more of the limited use case for 'NOW'::TIMESTAMPz. Now, the difference between NOW() and the CLOCK_TIMESTAMP() is that the CLOCK_TIMESTAMP() gets constantly updated. So for that previous example in the transaction, the CLOCK_TIMESTAMP() would continue to be updated with the current clock time. Whereas NOW() maintains the same value. So this is a pretty good review of these different timing functions. And if you want to learn more and look more in-depth in his example here, definitely check out this blog post.

The next piece of content is actually Bruce Momjian's site at momjian.us where he has nuggets of wisdom that he posts from time to time. Really very short, but one was particularly noteworthy this week that I wanted to mention. With PostgreSQL 12, CTEs or WITH clauses are now inline by default and there is a keyword you can use, WITH MATERIALIZED, in order to determine whether to inline those or not. So this is one of the few optimizations that you can add to an SQL query within Postgres. Normally, there are no ways to add optimizer hints, but with your CTEs, you can specify whether it should be materialized or not. So that's just something to keep in mind: once you eventually do upgrade to Postgres, if you have some CTEs or WITH queries that aren't as performant as you think they should be, check out the materialized keyword to see if you should potentially not materialize them or not to get better performance for your queries.

The next post is "Index Corruption in PostgreSQL: The Hidden Cost of Your Queries". This is from enterprisedb.com. So basically it's talking about index corruption, how to potentially identify it and see it, and then of course, what to do about it. Now, I personally haven't encountered this, but the main way to see it is that basically queries aren't using the index for whatever reason, and a way to resolve it is to rebuild the index. And if you're on version 12, you can reindex concurrently or previous versions, just create the index again and then you can later drop it once that index is active. So if you're interested in learning more about this, definitely check out this post.

The next post is "The Best Medium-Hard Data Analyst SQL Interview Questions". This is from a document that's on quip.com. I don't know if there's a related post somewhere, but it goes over some different interview questions and how to approach those from SQL. So for example, number one is a month-over-month percentage change. Number two is tree structure labeling. Number three is retained users per month. Number four is cumulative sums. Number five is rolling averages. Number six is multiple join conditions and then several window function practice problems as well. So if you really want to understand more about SQL and again these medium hard problems, definitely check out this piece of content.

The next post is "Local Persistent Volumes and PostgreSQL usage in Kubernetes". This is from 2ndquadrant.com and they're talking about using Kubernetes and the type of storage. So they're talking about setting up a local persistent volume to check the performance of that versus the network storage. They believed that direct storage would result in better performance. So they wanted to check this. They have their test system. They set up here and then ran different performance benchmarks. Here are the results. So, looking at sequential reads and writes, these are the disks as declared by the manufacturer. And then this is their bare metal results.

And I believe they did Raid these with a Raid1. So that explains why the reads are twice as fast as what the disk normally is. So this is about twice as fast. So using local persistence volumes, you can tell it's within a few percentage points. Just a hair slower than bare metal. Same thing for the writes. Whereas if you're looking at the OpenEBS CStore pool volume doing a more network solution, you can tell the performance dropped dramatically, essentially 1/20th of the performance and 1/10th of the write performance. So basically this communicates to me that if you want to use PostgreSQL and Kubernetes, you should focus on using local persistence volume versus the CSTORE pool volume. But if you want to learn more about that and the methodologies they use for testing, definitely check out this post.

The next post is "Multi-Kubernetes Cluster PostgreSQL Deployments". This is from crunchydata.com. So this is talking about deploying PostgreSQL in multiple regions or data centers using Kubernetes and using Federation and it runs through the process of doing it for two Kubernetes clusters. Now they're having to use an active standby scenario, but it goes through and describes how you can do this using their Crunchy data operator. So if you're interested in learning how to do that, check out this post.

Next post also from crunchydata.com is "Deploy pgAdmin4 with PostgreSQL on Kubernetes". So if you want to do that to help manage the multiple instances you're going to be running on Postgres, if you're using Kubernetes, you can check out this blog post.

The next piece of content, the PostgreSQL person of the week is Jimmy Angelakos. So if you want to learn more about Jimmy and his contributions to Postgres, definitely check out this post.

The next piece of content is "A Review of 4 Top PostgreSQL Books". This is from the enterprisedb.com blog, and I mentioned a YouTube video previously. Well, this is basically just a text version of it. It mentions the different books that were covered as well as synopsis. So if you want a more text version of that content, you can check out this blog post.

Last piece of content is "Routing with PostgreSQL and Crunchy Spatial". So this is from crunchydata.com and they're discussing the pgRouting extensions to do routing on dynamically generated graphs. So if you want to calculate routing using this in conjunction with PostGIS, definitely check out this blog post.

arm64 with apt, Contributors, Backup Manifest, Now Functions | Scaling Postgres 113

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript