Terabyte Scale, Permissions, Fast Column Adds, pgBouncer | Scaling Postgres 6
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering terabyte scale & analytics, database permissions, fast column adding coming to PosgreSQL 11 and pgBouncer.
Content Discussed
- PostgreSQL at 20TB and Beyond: Analytics at a Massive Scale
- PostgreSQL at 10 TB and beyond
- Building On Solid Ground: Getting Postgres Foundations Right With pgbedrock
- pgbedrock at GitHub
- Pain free ADD COLUMN with non-NULL defaults
- Making Hadoop Relatable Again
- Mountpoints and the Single PostgreSQL Server
- Change This: wal_compression
- Announcing general availability of Azure database services for MySQL and PostgreSQL
- A Guide to Using pgBouncer for PostgreSQL
- PostgreSQL WAL Archiving with pg_receivewal
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about terabyte, scale databasepermissions, faster column ads and PG bouncer. I'm Kristen Jameson, and this is scalingpostgres episode six.
[00:00:21] Alright, our first article is PostgreSQL at 20 terabytes and beyond analytics at amassive scale. Now, if you've been watching my episodes, that title might be familiar to youbecause in episode one of Scaling Postgres, I found the slides to this presentation. But herethey recently posted the YouTube video of the presentation by Chris Travers and this is part ofthe FOSSASIA YouTube channel.
[00:00:49] So again, just to remind you, 20 terabytes is a single database, but it's actuallymultiple databases that are handling all the data that this service supports. So it's up to 400terabytes at this point, I believe. So definitely a piece of content to check out if you're looking athow to potentially scale your PostgreSQL database and see some of the challenges that they'verun into.
[00:01:18] Now, related to this, he did an earlier presentation at an entirely different organizationand that presentation was called PostgreSQL at Ten Terabytes and beyond. So this was adifferent organization, different application, and also different problems and issues he had toencounter. Now, this is an older presentation. It was posted last February, or I should sayFebruary 2017 on the Edument YouTube channel.
[00:01:48] But again, if you are at this level, it would be a good presentation to review whenyou're looking how to scale your PostgreSQL database.
[00:01:57] The next article is from the Squarespace Engineering blog, and the title is Building onSolid Ground getting Postgres Foundations right with PG Bedrock.
[00:02:09] So basically, this is a tool that enables you to manage your PostgreSQL permissions.So they were having some issues because they have many users and different responsibilitiesfor each of those users for accessing data to their PostgreSQL database. And the issues theywere encountering is that permissions are managed in multiple tables across PostgreSQL and itwas challenging to correctly set the appropriate permissions for every user and maintainconsistency.
[00:02:44] So they developed this tool with these goals in mind to collate all configuration withinone file, simplify their permission complexity and assert that the configuration matches reality. Soagain, they developed this tool called PG Bedrock to manage the roles, role memberships,schema existence and ownership and object privileges. So it's doing a lot for maintainingpermissions in their database.
[00:03:16] And they define a configuration file, as you can see here. So they have a particularuser, they can log in, they're a member of a particular role, they have these attributes, they ownparticular schemas, and here are their privileges to other schemas tables or sequences.
[00:03:34] So in terms of scaling, this isn't necessarily a performance related post, but it's whenyou scale, say, staff members and you start to get really complex permissions that need to beestablished for access to the data within PostgreSQL. So definitely something to check out ifyou're experiencing these kind of pains to see if this particular tool would help you. Now onGitHub, here's the tool on the squarespace PG bedrock and I'll include the link in the shownotes, but it has a very, very comprehensive README that describes how everything gets setup. So again, if you're experiencing permissions issues, I would definitely suggest checking outthis new tool.
[00:04:20] The next post is pain free ad column with non null defaults waiting for PostgreSQLeleven. Fast alter table add column with a non null default so this is a feature that's coming inPostgreSQL eleven, and this was mentioned on the Dataegrid.com blog by Andrew Dunstan. Sothat you want. However, with this patch that should be coming in PostgreSQL eleven, itenables you to add a default value when you add the column without rewriting the entire table.And this blog post goes into a little bit of the details about it, so I suggest checking it out if youwant to learn more.
[00:05:49] The next article is from datanami.com and it's Making Hadoop relatable again. So Ifound this post and I was reading it a little bit, it's not about PostgreSQL, but I found itinteresting. They're talking about making Hadoop more like a relational database. And here's aquote if you make Hadoop more like a relational database, then people will do more with it. Soit's recognizing the power of a relational database. So I just found this interesting that all of thesedifferent database systems keep going back to the relational database and using SQL in order toallow easy access to the data. So it's interesting how these NoSQL options seem to keep goingback to SQL.
[00:06:43] The next couple of articles are very short, brief ones that give suggestions forPostgreSQL configuration.
[00:06:52] So for example, these are from the Build blog. The first one is mount points in thesingle PostgreSQL server. So basically it's a very quick little post that says two rules to keep inmind when choosing a name for the mount point in your directory structure for your PostgreSQLdatabase. One, always include the major version in the directory you're using, but never includethe major version in the mount point.
[00:07:23] For example, use something like PgSQL for a mount point, something that doesn'tinclude the version. But when you're establishing your PG data or the actual data directory douse a version for it. So here they had an example of within the mount point ten data, the reasonbeing because at some point when you're doing a PG upgrade, you're going to want topotentially have two different versions running when you're using PG upgrade. And having thedirectory structure in this manner enables upgrades to happen much more easily.
[00:07:58] The next article is again from thebuild.com Change this wall compression. Andbasically he's suggesting that pretty much everybody should go ahead and enable thisparameter even though it's off by default. And basically as long as you're not severely CPUbound, you get a benefit in disk space and a reduction in network traffic if you're sending to astreaming replica. So definitely something to take a look at if you have not enabled that setting.The next post is a guide to using PG Bouncer for PostgreSQL. And this is from the severalnines.com blog and this goes over the general reasons why you would potentially want to usePgbouncer and some other alternatives such as PG Pool. But this focuses on PG Bouncer andhow basically it helps make many, many connections to the database much more efficient thanthe stock PostgreSQL.
[00:08:56] And then at the bottom here, it actually goes into a configuration and differentparameters that you can use to set it up. So if your connections are starting to increase moreand you haven't looked into PG Bouncer, and I would say your connections going up to 200,300, 400, definitely check out PG Bouncer as a way to help manage that many connections toyour PostgreSQL database easier.
[00:09:24] The last article is PostgreSQL wall archiving with PG Receive wall. So this is a videotutorial on essentially how to use PG Receive wall. Now, if you haven't heard of that, that isutility that simply lets you use streaming replication protocol to stream wall files to another serverthat doesn't have to be a database or a regular replica. So I would say this is one of the mostefficient ways to get the wall files off of a master database. If you're doing that, you essentiallystreamlim and then produce the files on this other server. And I go into in depth how to set thatup and the different commands that you would use. So if you're looking for a potential solutionfor better wall archiving, I encourage you to check it out.
[00:10:16] That does it. For this episode of Scaling Postgres, you can get links to all the content