Popularity, Load Testing, Checksums, pg_hba | Scaling Postgres 59
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering Posgres' popularity, conducting load testing, checksums in PG12 and pg_hba configuration.
Content Discussed
- Postgres is the coolest database – Reason #1: Developers love it!
- [YouTube] PostgreSQL Hyperconverged DBMS that Fully-Virtualizes Database Management
- Building a PostgreSQL load tester
- Postgres 12 highlight - pg_checksums
- Optimizing storage of small tables in PostgreSQL 12
- PostgreSQL security: a quick look at authentication best practices [Tutorial]
- Continuous Replication From a Legacy PostgreSQL Version to a Newer Version Using Slony
- Fast Upgrade of Legacy PostgreSQL with Minimum Downtime Using pg_upgrade
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about postgres's popularity, load testing,checksums, and host based access. I'm Kristen Jameson and this is Scaling Postgres, episode59.
[00:00:20] All right, I hope everyone's having a great week. Not a lot of content this week. So thisweek's episode may be pretty short, but the first piece of content we have is Postgres is thecoolest database. Reason number one, developers love it. And this is from the SecondQuadrant.com blog. And this is a very short post, but it just reiterates some of the things thathave been mentioned previously. Like DB engines for two years in a row has awardedPostgreSQL their database management system of the year. So 2017, 2018. And they do this, Ibelieve, at the end of the year. So it was just recently awarded this reward. And this was out of atotal of 343 databases that they track. So pretty good indication of its popularity. Also mentionedthe stack overflow where PostgreSQL came in second to Redis. So definitely the most popularrelational database system. And we'll see a presentation here. They really also classify it as themost popular comprehensive database. In other words, it's very good at doing a lot of differentthings. The next closest relational database I see is MariaDB. Now, I'm not including where itsays Amazon RDS because is that Postgres RDS or is it only Aurora? So I'm not quite sureabout that. And they list Azure and Google cloud storage, whereas you can run PostgreSQL onthese services. So, I mean, for relational database, I'd consider MariaDB the next popular one.And then just looking, this is trends over the last eight years in hacker news. And you can seethe popularity and these are PostgreSQL increasing. And you can see these are four differentrelational databases except for MongoDB. But even that its trend is slowly going down. Sodefinitely a couple of different indications of the current popularity of PostgreSQL.
[00:02:20] The next post is actually presentation that was done recently called PostgreSQLHyperconverged Database Management System. That fully virtualizes database management.And this is by Simon Riggs, the CTO at second quadrant. Now, I did watch this presentation byFully Virtualized is not quite sure what was meant by that in the title, but Hyperconverged, he'ssaying if there's some task you want to get done, pretty much PostgreSQL can probably meetthat need. Now, there may be more specific databases that excel in particular areas, butbecause of all the features, it's super comprehensive in its feature set to be able to handle allsorts of data analytical needs, from handling JSON data types, to full text search, to even doingsome GraphQL queries using with recursive. And it's a whole extension ecosystem that addsadditional features to PostgreSQL apart from the core. So this presentation didn't have anythingnecessarily new, but it does give you a perspective of how many different areas PostgreSQL canhelp fill your needs. So if you have interest in. That definitely a YouTube video to watch.
[00:03:36] The next post is building a PostgreSQL load tester. And this is from Lawrence Jonesand I believe he's at Go cardless. And basically he had a need to simulate production load on, Ibelieve, a test environment or a stress test environment. And he had previously used a toolcalled Pgrepay, which basically reads a PostgreSQL log. Now, not the wall, not the write aheadlog, but the actual what you're logging to a text file on what queries are happening or statementsare happening. It can read that log, parse it and then replay it against a database system. Nowhe had some problems when he was trying to do that where the database, the stress testenvironment would just stall because some queries weren't executing the same as they were inproduction. And he says here the new cluster was sufficiently different that several queries werenow much slower than they had been in the original log capture. PG replay would ensurequeries execute in the original order. Any queries that take longer to execute in the replay willblock subsequent queries. So basically he had stalls when trying to use it. And his idea was,well, since this tool doesn't quite do it, maybe I'll make my own. And the most important changehe sought to make is to not have all queries back up behind a stalled query, but do it byconnection. So for each connection, if one of the queries in a connection gets stalled, --all the other connections queries can still happen when they were supposed to, but it's justthat one connection that gets delayed if one of the queries is slower than it was when it ran inproduction. Now he calls this new tool Pgrepayo because he used the Go language. So this postgoes over how he basically wrote the first part in five days. So he talks about parsing the logs tobe able to interpret it streaming from the log, to be able to replay it, the debugging process hewent through and then having to go in and analyze some issues he was dealing with. So if youare interested in getting a little bit more insight to PostgreSQL or in Go in general on how hedeveloped this, this might be of interest. But the other part I felt was of interest is that if you'rewanting to use a tool like this to load test your PostgreSQL instance for stress testing purposes,maybe you can check out both PG replay, the tool he had used previously, as well as his newiteration, Pgrepayo.
[00:06:16] The next post is postgres twelve, highlight PG checksums. And this is from theMichael Peculiar blog. So in postgres version, I believe it's 9.3, they introduced the concept ofchecksums where you can enable checksums when you create a database to be able to checkfor things like a file or discorruption. And then in postgres version eleven they introduced a toolcalled PG Verify Checksum. So this is something that can run on a database that has been shutdown to verify that there's no corruption in that stopped database system.
[00:06:54] But apparently for version twelve they're renaming it to the PG Checksums because itactually has three different modes of working. The first mode is Check, which is basically what itdoes. Now it does what PG verify checksums already does. It scans a stop database to ensurethat file integrity is where it needs to be. But it also adds an enable and a disable of checksumsbecause historically when it was introduced in 9.3, it can only be set at database creation time orcluster creation time. But with these new switches you should be able to enable it or disable anexisting PostgreSQL cluster. So definitely an interesting feature to add.
[00:07:39] Now you still have to stop it, but he also mentions a way you could potentially do it ifyou have a primary and replica setups, a process to go through to be able to enable checksumsfor your primary and replicas. So if that's of interest to you, definitely a feature to watch for inPostgreSQL twelve.
[00:07:59] The next article is related to PostgreSQL twelve as well and it's optimizing storage ofsmall tables in PostgreSQL. In here they did an example of creating a table and just inserting asingle text value into it and they looked at the different files that created. One is the free Spacemap, the visibility map, some file areas related to the Toast table size, the toast index size, aswell as the heap size. And they say with this one table with one value in it, you need 48 space.Due to all of this supporting files for PostgreSQL. Now many people, their tables are huge, soyou're not going to be feeling this. But the scenario that apparently they have seen is thesituation where an application uses schemas for multitenancy. So there you could have tons ofsmall tables and perhaps this could be an opportunity for space savings. And what they saidhere, and if you look at this, the largest area that takes the most space is the free space Map.And they say with PostgreSQL twelve you only get a free space map if they exceed four pagesor 32 KB with the default page size of 8. If you have a lot of small tables, this could potentially bean opportunity to save some disk space. With this new feature in PostgreSQL Twelve, the nextpost is PostgreSQL Security. A quick look at authentication best Practices tutorial. This is fromPackethub.com and this is an excerpt from a book called Learning PostgreSQL Eleven, the thirdedition by Andre Volkov and Salahadin Juba. Now this post basically talks about the Pghba file,the host based access file for allowing in network traffic to access the PostgreSQL database. Soit goes over the different settings that you can make for it, and some suggested best practiceson how to set up your HBA comp file. They also cover a little bit about listened addresses aswell as some best practices. So if you're wanting more guidance with setting up your Pghba file,definitely a blog post to check out.
[00:10:15] The next post is conti --nuous replication from a legacy PostgreSQL version to a newer version using Slony. And thisis from the Percona.com blog. Now, this is part of their series where they're using differenttechniques to upgrade PostgreSQL. And here they're saying if you have an existing PostgreSQLdatabase that is older than 9.4, this is a technique you can use to do the upgrade with zero oressentially near zero downtime. And it's using Slony to do it so it goes over what Slony is. It'sbasically using triggers to try and replicate and send the data over to other database systems.So again, this is potentially a good scenario if you have a PostgreSQL instance older than 9.4, ifyou have one that's at least 9.4, generally the recommendation is to use PG logical to do logicalreplication if you want to try a zero downtime upgrade. But if you're interested in this, definitely ablog post to check out. Related to it is a next blog post called Fast upgrade of legacyPostgreSQL with minimal downtime using PG upgrade. So PG Upgrade is basically, unlessyou're using PG dump, PG Upgrade is generally the recommendation to do an upgrade, and youcan basically use PG Upgrade to move up to different versions. And this does require somedowntime, but you can drastically minimize that amount of downtime if you're using hard linkswhen using PG Upgrade. So if you're wanting to learn more about PG Upgrade, here's ourrecent blog post you can check out.
[00:11:57] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Galingposgres.com, where you can signup to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks. --