psql Tips, Postgres 13 Upgrade, Roles Hierarchy, Replica Scaling | Scaling Postgres 160
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss tips for using psql, stories from a Postgres 13 upgrade, exploring the role hierarchy and replica scaling considerations.
Content Discussed
- psql Tips
- Standing on the Shoulders of a Giant Elephant: Upgrading Discourse to PostgreSQL 13
- PostgreSQL: Get member roles and permissions
- Replica Scaling by the Numbers
- pgBackRest – The Best Postgres Backup Tool with a very active community
- Announcing Google Cloud Storage (GCS) Support for pgBackRest
- Introducing pgBackRest Multiple Repository Support
- Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy – Part 1
- Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy – Part 2
- How to run Hierarchical Queries with PostgreSQL
- Getting started with QGIS, PostgreSQL and PostGIS
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and workers continue to do well. Our first piece of content is "Psql Tips". This is from mydbanotebook.org. At the top of that site, they have a tab called "Psql Tips" and when you click on it, you get a random tip out of 100. Now you can also go to the all tips link that I'll provide and it shows you all 100 psql tips. Now that's a lot of tips, however, each of them are slight variations. So for example, tips one through ten show different ways you can execute SQL commands with psql. So you can use the -c or the command flag to send a single command, or you can do multiple commands, or you can send in a file, or you could pipe it or do different types of variations of just executing SQL. Now of course that's not what all the tips are. There are tons of other tips about connecting and whatnot. But if you want to improve your psql knowledge, I highly suggest you check out this resource to see if there's more that you can learn about psql.
Next piece of content is "Standing on the Shoulders of a Giant Elephant: Upgrading Discourse to PostgreSQL 13". This is from a blog.discourse.org and they're talking about their upgrade to Postgres 13 and the main area that they are highlighting is the B-tree deduplication. So for example, they looked at their largest table and when they looked at a version 12 table, with all the data in it, it was 114 GB. Looking at version 13, it was 85 GB. So that's a 25% reduction in size. Now I should mention this is the total table size, so it's indexes plus the actual table. Now when they split it off, they can see, for version 12, what the table size was versus the indexes, and the same for 13. Here you can see the big reduction in the index size. Then more specifically, they looked down at which indexes showed the size difference. So for example, their unique index is identical because it's a deduplication, but everything is unique in there, so nothing's duplicated. However, their post timings on the user ID and the summary did have some duplication. It goes from say, 21 GB for each index down to about seven, or a little bit less than 7GB for each index. So for them, this was a great reason to upgrade to Postgres 13. And if you're looking for these types of space savings, maybe you want to consider upgrading as well.
The next piece of content- "POSTGRESQL: GET MEMBER ROLES AND PERMISSIONS". This is from cybertec-postgresql.com and they're talking about roles and seeing who is a member of each role. So for example, in this post, they created two users, users A and B, and now, the difference between a user and a role is that a user is still just a role, but it can log in. But here they've defined essentially roles that can log in A and B or users and then C, D, E, and F for other roles. They granted roles to particular users or roles to other roles. So they ran these various grant statements. So basically, you have a hierarchy of roles. Certain roles belong to others and users belong to those roles and they want to see who belongs to what.
So the first thing they're looking at is the tables that will be involved. One is the pg_authid, which shows the role name and some information about it, such as the role can log in, as well as the pg_auth_members which shows the members of a particular role ID. Now, with this information, you can then build a recursive query using the WITH clause to be able to extract who are the members or what is the hierarchy of the role membership. So you can see the results of this query right here. So just a simple basic post about finding the hierarchy of roles. If you're interested in learning more, definitely check out this blog post.
The next piece of content is "Replicas Scaling by the Numbers". This is from momjian.us and he's talking about read-write workloads. The thing you have to keep in mind when you have a replica, is that whatever write load is on the primary will also be on the replica because it needs to keep up with everything that the primary is writing. So for example, he says, quote "Suppose the write load on the primary is 30%- that leaves 70% of the IO for read-only queries...". So each replicate can only use essentially 70% of its read performance because it's still having to do that 30% of writes. So this may be a scenario where having more replicas really helps your performance. But if you predominantly have a write workload, say 75-80% of the primary is just dealing with writes, it's going to be hard to scale out using read replicas. In this instance, you may want to go with sharding, which he mentions here. So if you want to learn more about this, you can check out this post.
The next piece of content- "pgBackRest- The Best Postgres Backup Tool with a very active community". This is from migops.com. And they're talking about pgBackRest. It looks from their opinion that this is the best backup tool to use with Postgres. They list 15 advantages for using it. Some of the big ones they mention are that it's open source, it does parallel backups and streams those files to a remote repository or somewhere in the cloud, and it can do incremental and differential backups. But of course, they list ten more reasons why it's a very, very good backup tool. Then the second part of the post covers how you install it, and how you get set up to be able to backup some Postgres database systems with it. So if you want to learn more about pgBackRest and get an implementation up and running, definitely check out this blog post.
The next piece of content- "Announcing Google Cloud Storage (GCS) Support for pgBackRest". This is from blog.crunchydata.com and they have now added Google Cloud Storage to the number of repositories that you can send pgBackRest data to. The others that already exist are for Amazon S3 and for the Azure repository. So this adds a third. So if you want to learn more about this addition, you can check out this blog post.
The next one, also from blog.crunchydata.com, is "Introducing pgBackRest Multiple Repository Support". So this means you can send your data to more than one repository, not just one. In this example, they are showing where they set up one repository as a locally mounted NFS volume and a second repository for Azure. So if you want to learn more about this feature in pgBackRest, definitely check out this blog post.
The next piece of content- "Deploying PostgreSQL for High Availability with Patroni, etcd and HAProxy- Part 1". This is from digitalis.io and they're talking about setting up High Availability Postgres using these tools. So they discuss a little bit about it and then they go into the process of starting to set it up to create a three-node cluster where you're going to have one lead and two followers in Postgres. They're all communicating using Patroni, etcd, and HAProxy. So that's part one. Part 2 goes into more depth on setting up the final system. So if you want to learn more about this, definitely check out this blog post.
The next piece of content- "How to run Hierarchical Queries with PostgreSQL". This is from highgo.ca. This post is a little bit different because it talks about how Oracle does it. It shows a typical example where you have someone being managed by another employee in terms of the data set that they're looking at, and they show you how you would do this in Oracle. Then they answer those questions using PostgreSQL, CTEs, or WITH clause queries. So if you want to learn how to do hierarchical queries like this, definitely check out this blog post.
The last piece of content is "GETTING STARTED WITH QGIS, POSTGRESQL AND POSTGIS". This is from cybertec-postgresql.com and they're talking about how to very simply get started up using geographical information systems and Postgres. So if you want to learn more about how to do that and set it up, definitely check out this blog post.