Account Configuration, Pricing, System Identifier, HyperLogLog | Scaling Postgres 96
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss user account configuration, pricing, the system identifier and the HyperLogLog extension.
Content Discussed
- My GOTO Postgres Configuration for Web Services
- RDS Pricing Has More Than Doubled
- Support for PostgreSQL’s System identifier in Barman
- Postgres HyperLogLog Extension
- How to create, test and debug an extension written in C for PostgreSQL
- How to build and debug PostgreSQL 12 using latest Eclipse IDE on Ubuntu 18.04
- pg qualstats 2: Global index advisor
- Finally – fixed time calculations for parallel plans in explain.depesz.com
- PSQL_EDITOR: Fighting with Sublime Text under Windows
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "My GOTO Postgres Configuration for Web Services". This is from tightlycoupled.io. Even though he's talking about Postgres configuration for web services, basically my interpretation of this is web application because he talks about some Ruby, some Python. So basically web applications, what's the best way to configure your database? Now specifically he mostly focuses on account-based information and some configuration settings related to accounts. And I would say this piece of content is a must-read if you're a developer. I really liked the way he laid out things and the way that he's chosen to set up his database. First, he runs through a few assumptions. The first is that you have superuser access to the database, although he said he did do an update for services like RDS and you're developing with a web service team.
Schema changes are done as part of an admin process. So you're only going to be creating objects, views, tables, sequences, and indexes through an admin process. For example, using Rails migrations, Python migrations, or this sequelize migration. The next assumption is that you have one application Postgres database and one schema that happens to be named after the database. But I imagine you could change that if you want to. So basically how he has set it up is that you have one user that owns all the database objects. The database schema, tables, and index are all owned by one user whom he happened to call the owner. But of course, you can make that name whatever you would like. Any of these migrations that happen that change objects in the database are run by this owner. Next, you have a role that is for read-write users. So they can do SELECT, UPDATE, INSERT, and DELETE, but they can't create objects.
Next, you have read-only users so they can just do SELECTS, but they can't do anything that the read-write users or the owner can do. Essentially, what your application connects to the database with is an app user that essentially belongs to the read-write role. So again, can't create objects. So you need to run your application migrations, if any exist, using the owner user. Now, he does make some configuration settings at the database level as well as at the user level. Defining statement timeouts to ensure statements don't run too long a lock, timeout to avoid lock queues that something is not locked for too long. Idle_in_transaction_session_timeout to make sure you don't have long-running transactions that are idle_in_transaction. Defining connection_limits. So you're not using all of your connections for your app user, but you have room to log in as owner or a superuser to do certain activities.
Then define the search path because he is defining a separate schema as opposed to using public schemas. Now, he lists a fair amount of upsides that you're welcome to read through and add a few downsides but I would agree with him that the upsides outweigh the downsides. Then he goes through the whole process of how he runs it, the exact commands to set up the database as well as the users and all the different permissions and he even includes a gist of it here as well. And then at the tail end, he has some considerations about how you do certain things like you need to make certain schema changes. What are some things you can do to handle local development? If you want to list users, you want to add new users using this type of scheme. You need to remove users, update their settings, or change other settings. So overall this is a really great blog post and if you're a developer, I highly suggest you take a read-through and see if you want to make changes or modify how you tend to set up or configure your database when working with web applications.
The next post is "RDS Pricing Has More Than Doubled". This is from Rick Branson on medium.com and he noticed this very interesting disparity between the EC2 cost. This is hourly on-demand costs and RDS, their hosted database service, hourly on-demand costs. And then these instant types list which generation like the first generation, the price premium for them managing the database was 33%. The second generation it's at 35%, 3rd generation at 39%. But then in the 4th generation, it jumps up to 75%. And then at the M5, this is their most recent generation of general use case instances; it's 78%. In addition, I checked Postgres and specifically the Postgres RDS.
It's even more of a premium, getting closer to about 85% for these M5s. Furthermore, looking at the R5s, which are memory-optimized, you would think people would want a lot of memory for their Postgres instance, and if you're using R5s, the price premium is closer to almost 100% for Postgres. So this is something I didn't really realize that prices had. Essentially this price premium has gotten larger and larger as they've gone with new instance generations. Hence the title. Why RDS pricing has more than doubled. So it's definitely something to keep in mind about. Do you want to rely upon a cloud service provider to manage your database? And clearly if you have a large enough budget you can do that and not worry about it.
But if cost is a concern, managing your own, particularly as these instance sizes grow larger and larger as you scale your application needs, you may want to consider running your own as opposed to relying on a vendor such as Amazon Web Services. So a short but definitely interesting blog post is something I really wasn't aware of because I actually use just standard EC2 instances and manage my own databases. I've thought about using RDS for smaller Postgres instances that I use but I really haven't taken the jump yet. But seeing these price differentials, I think I may just keep managing them myself, even for small used databases.
The next post is "Support for PostgreSQL's System identifier in Barman". So this is from 2ndquadrant.com, and they are talking about Barman 20, which is a backup and restore manager for Postgres. They're talking about the system identifier, which is a unique identifier for each Postgres instance. And they've added support for it to make sure that when you're doing backups or restores, the WAL files you have match the actual database files so you don't get a mismatch between them. And they say how you can find out where the System Identifier is. You can use the application pg_controldata to get it and it's part of what's returned by the database System Identifier.
Or you can actually do a psql query, querying the pg_control_system to get the System Identifier and you can even get it through the replication streaming replication protocol in looking for Identify System and it will return the System ID. The rest of the post just talks about how Barman has implemented this to again give you those types of protections to make sure you're not trying to restore WAL files from another instance with another associated backup. This was an interesting identifier I was not aware of, so be sure to check out this blog post if you want to learn more about it.
The next post is "Postgres HyperLogLog Extension". This is from getwisdom.io and they're talking about HyperLogLog, which is an extension from Citus that actually enables you to do very fast, distinct counts at an approximation. So it's not 100% accurate. Even some of the things they mention here quote "The catch you get about a 1.5% accuracy, configurable of course, by taking up more space". So it seems very inaccurate. But again, the more space you dedicate to the statistical data structure should be able to give you more accuracy. But it basically lets you do very fast distinct counts over large amounts of data and they give an example of how you can set this up and actually use the HyperLogLog extension as well as some gotchas to be aware of. So if you have a use case dealing with a lot of data needing to get distinct counts, maybe check out the HyperLogLog extension and this blog post.
The next post is "How to create, test and debug an extension written in C for PostgreSQL". This is from highgo.ca. So this was an interesting post that basically describes how you can create your own C extension. So it goes through creating the extension, actually testing the extension, creating a test script that will test it and then doing a debug using the Eclipse IDE. So if you're interested in getting started with writing your own extensions for PostgreSQL, definitely a blog post to check out.
Now a second post from highgo.ca is "How to build and debug PostgreSQL 12 using latest Eclipse IDE on Ubuntu 18.04". So if you actually wanted to essentially build from source, this is another blog post to check out and do debugging using the Eclipse IDE.
The next post is "pg qualstats 2: Global index advisor". This is from rjuju.github.io. Basically, he runs POWA, which helps with Postgres performance optimization and monitoring. And there is an index advisor that's a part of that. And it looks like he's rolling out this feature into something he's calling pg qualstats 2 which basically does a prediction of what indexes are needed for particular queries that are hitting the database. So he basically has pulled this out of the POWA product. So if you're interested in using that, he said it's not released yet, but basically in a testing phase. So if you're looking for a tool that will enable you to look at different queries and statistics to give suggestions as to what indexes you may need, definitely a blog post and a tool to check out.
The next post is "Finally - fixed time calculations for parallel plans in explain.depesz.com". This is from depesz.com. So this is actually a graphical means of presenting an EXPLAIN plan. Now, I haven't really used it, but if you want a graphical representation use it, and apparently there was a time calculation for parallel plans that had an issue that has been resolved. So feel free to check this out if you want to get a more graphical representation of your EXPLAIN plans.
The last post is "PSQL_EDITOR: FIGHTING WITH SUBLIME TEXT UNDER WINDOWS". This is from cybertec-postgresql.com. So this is basically when you're using psql and wanting to define editors, there's a variable you can set, psql_editor and he's talking about using it under Windows and the best way to set it to be able to use, say, Sublime Text. So if you're interested in that, definitely a blog post to check out.