Insert vs. Update, Select * Performance, Debezium Set Up, Standardizing Data | Scaling Postgres 142
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss inserts vs. updates, select * performance, how to set up Debezium and methods to standardize data.
Content Discussed
- Insert-only data modelling to smooth peaks on slow disks
- Reasons why SELECT * is bad for SQL performance
- Setting up PostgreSQL for Debezium
- Using Postgres for Statistics: Centering and Standardizing Data
- Postgres: The batteries included database
- PostgreSQL High Availability: Setup and Uptime Considerations
- Making Postgres stored procedures 9X faster in Citus
- Combining pgBackRest dedicated repository host and Streaming Replication
- PgBouncer 1.15.0
- Waiting for PostGIS 3.1: Grid Generators
- Pavel Stehule
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Unattended Upgrades, Ubuntu 18.04, and PostgreSQL 10: The Perfect Storm". So this is from seiler.us, and he's talking about an issue where his production Postgres database decided to upgrade itself one evening. Now, I've actually seen this occur, and in this post, he describes exactly how it happens and why. So if you use Ubuntu, I highly suggest you check out this post. Now, normally when I do an installation of Postgres, I'm using the package repositories at apt.postgresql.org, but depending upon the version of Ubuntu, it includes certain versions of Postgres. So for example, 18.04 included Postgres 10 in their package repository, and in Ubuntu 20.04, they included Postgres 12.
So what this means is that if you are on Ubuntu 18.04 you could be automatically upgraded to a dot release of version 10, or if you're on Ubuntu 20.04, you could be automatically upgraded to a dot version of Postgres 12. The reason is that unattended upgrades trust the Ubuntu package repository and they contain those versions based upon your versions of Ubuntu. So if you look in this file, you can see that the Ubuntu package repositories are trusted for unattended upgrades and they happen automatically, but the Postgres repositories are not. So you're relatively safe if you're using one of these repositories as long as Ubuntu does not have the same version package as the one you're using.
So for example, if you're using version 11 of Postgres, it doesn't matter if you're using 18.04 or 20.04, it's not going to automatically upgrade that to the next point release. However, if you're on 18.04 and version 10, you could be upgraded to the next point release during an unattended upgrade. Or if you're on 20.04 and you're on Postgres 12, it could automatically upgrade you to the next point release. So how he advocates getting around it is to add the Postgres packages as a blacklist, so they are not going to install them using an unattended upgrade method from the Ubuntu package repository. Basically, you blacklist all the different packages that you're using that are Postgres related. So definitely if you use Postgres on Ubuntu, I would check out this blog post and follow the practice that he highlights here to make sure that your Postgres versions don't suddenly upgrade themselves one evening.
The next piece of content is "PostgreSQL Benchmarks: Apple ARM M1 MacBook Pro 2020". This is from blog.crunchydata.com. Now, in a previous post, they did a pgbench tools analysis of different MacBooks across the years, and it looks like they included the data here from 2011 to 2019 and then used one of the new M1 MacBooks containing these new ARM processors to look at the performance. As you can tell, it's dramatically higher and it looks to be double some of the 2019 numbers in terms of performance with regard to pgbench. So that's a pretty great performance. Then he has the raw numbers here, and then he wanted to say, okay, how does this compare against some desktop processors?
So he also included a Ryzen 2700x and a Ryzen 3950x, one from 2018, and one from 2019. Now you can tell this processor actually has a lot of CPU cores. It may be a 16-core processor, I'm not sure about that, or maybe it's a 12, but you can take a look at the performance of the M1 is pretty significant. Now, having this on a laptop is not too much of an interest to me, but this makes me wonder what kind of performance could you get on ARM servers such as the Graviton server, say at AWS? And I haven't seen any benchmarks on those particular servers. But if these results are potentially indicative of performance you can expect I might check out Postgres performance on some of these ARM processors. So definitely an interesting blog post about ARM performance with PostgreSQL.
The next piece of content is "Exploring a new Postgres database". This is from craigkerstiens.com, and this post describes when he steps into a new database. What does he typically use to check it out and analyze it? Basically, he uses psql, the Postgres client interface. He goes over some of the settings he likes to make to his psqlrc file, such as automatically formatting the output with \x auto, defining some prettier NULLS, saving the history file based upon the database name, and turning on query timing of course. Then in terms of analyzing what objects exist, using the \d command or \dt to only look at the table relations in the database, as well as examining particular objects by just doing a \d to describe, say, the user's table. You can get a sense of the table as well as the indexes, and then just select one record from the particular table to analyze what the data looks like. So this is a quick way to explore any new databases that you are introduced to.
The next piece of content is"Waiting for PostGIS 3.1: Performance". This is from blog.crunchydata.com. He's talking about some different performance improvements that have been made to the upcoming PostGIS 3.1. First, talking about large geometry caching header. Only geometry reads faster text generation, and then he says how much faster? So he did an analysis using this example here, and as he says, over five runs, PostGIS 3.0 ran in 23 seconds, whereas PostGIS 3.1 ran in less than a second. Now he says this may be a bit of an outlier, but that's a pretty large performance improvement over 20 times. So it looks like with 3.1 there's going to be more performance coming to Postgres.
Now a related post from cleverelephant.ca talks about "Waiting for PostGIS 3.1: Vector tile improvements". So it talks about again more performance improvements and that some of these vector tile improvements have resulted in a 30% to 40% faster performance with three one, as well as reduced memory to approximately a third of what was used before. So definitely some improvements coming in 3.1.
The next piece of content- "Solving the '3rd arg isn't within [01]' problem in pgRouting". This is from elephanttamer.net and it's a brief post describing how to get around this issue he described. So if you're interested in that you can check out this blog post.
The next piece of content is "IS OF". This is from commandprompt.com. They're talking about the IS OF operator which helps you determine the type. So for example you can do SELECT some number IS OF type text and it will say false or SELECT two is of type integer and it will say true for example. They have a bunch of examples including the comments of how to use this. So just a brief post about using IS OF to determine data types within Postgres.
The next piece of content is actually a YouTube video and it's "Webinar: Using SSL with PostgreSQL and pgbouncer by Andrew Dustin". This is on the 2ndQuadrant YouTube channel. Now, this is a bit of an older presentation, I believe from November of 2019, but it's still relatively relevant. Some of the software has been upgraded of course with Postgres and new versions of PgBouncer. But if you're interested in setting up SSL authentication with these, you can check out this webinar.
The next piece of content is "Writing a Postgres Foreign Data Wrapper for ClickHouse in Go". This is from arunsori.me. Basically, he describes setting up a data wrapper and writing it in Go as opposed to C and some of the issues he had to get around with that and writing a Foreign Data Wrapper to an OLAP database called ClickHouse in order to pull data from it. So if you're interested in content such as this, you can check out this blog post.
The next piece of content is an announcement that 3.0.3 PostGIS has been released as well as 3.1 .0alpha3 has been released as well. So if you're looking for those performance improvements, you may want to check this out on postgis.net.
The next piece of content is "What's new in the Citus 9.5 extension to Postgres''. This is from citusdata.com. So this is the Citus extension that enables you to scale out of your Postgres instances. This is the open-source version and it lists a number of bullet points of changes in terms of Postgres 13 support, adaptive connection management for COPY, so you don't utilize too many connections across your cluster farm. It lists functions to change a distributed table to a local table and a number of other enhancements. So if you're interested in the Citus extension, definitely check out this blog post.
The next piece of content is "Queryid reporting in plpgsql_check". This is from. rjuju.github.io. Plpgsql_check is actually a static code analysis tool for analyzing essentially your procedures or your functions. They've recently added a new feature being able to determine the Queryid to help you further optimize your code. So if you're interested in that, you can check out this blog post.
The next piece of content is "PG_TIMETABLE V3 IS OUT!". This is from cybertec-postgresql.com. This gives you essentially a cron like interface to do scheduled tasks within Postgres.
The last piece of content, the PostgreSQL person of the week is Markus Wanner. So if you're interested in learning more about Markus and his contributions to Postgres, definitely check out this blog post.