PgBouncer SCRAM, Art of PostgreSQL, Distributed Time-Series, Window Functions | Scaling Postgres 79
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss PgBouncer SCRAM support, The Art of PostgreSQL, making Postgres a distributed time-series DB and window functions.
Content Discussed
- Waiting for PostGIS 3: GEOS 3.8
- PgBouncer 1.11.0
- The Art of PostgreSQL
- Building a distributed time-series database on PostgreSQL
- Webinar: Business Intelligence with Window Functions in PostgreSQL [Follow up]
- Postgres Table Partitioning
- Pgpool vs PGBouncer
- Tips for Postgres from a Postgres Insider
- [YouTube] What's New in Postgres 12
- pgBackRest to Backup PostgreSQL
- Compute day working hours in PL/pgsql
- Why you need plpgsql_check (if you write procedures in PLpgSQL)
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is PgBouncer 1.11.0 is released and the major new feature is support for SCRAM authentication. This is from pgbouncer.gihub.io So this is pretty big. It only supported MD5, of course, and some others, but finally it has SCRAM authentication. There's a change log here that talks about adding SCRAM authentication. Handle auth_type=password when the stored password is MD5 like a PostgreSQL server would. Add option log stats to disable printing stats to the log, add time zone to log timestamps, and put the PID into brackets in the log prefix. And it also lists a number of fixes, even one for PostgreSQL 12. So definitely a great set of enhancements. Particularly the SCRAM authentication is now offered for PgBouncer. So that's great.
The next piece of content is that a book, "The Art of PostgreSQL", has been released with the subtitle "Turn Thousands of Lines of Code into Simple Queries". This is from theartofpostgresql.com. Now, I'm not too familiar with this book, but this is just another piece of educational content. So if this seems interesting to you in terms of learning more about PostgreSQL, check out the website and it looks like you can download a free chapter to see if it's something that you'd like.
The next post is "Building a distributed time-series database on PostgreSQL". This is from the timescale.com blog. So they made Timescale, which is a database extension for PostgreSQL, focus on time series data. And they're actually introducing a distributed version. So it enables you to distribute the data across multiple database systems. So essentially you have to scale up. Well, now you can scale out to multiple systems. And it looks like this is something in private beta right now. So this is an announcement about kind of why they're doing it. They're not necessarily considering sharding, but more chunking because they operate at the level of what they call chunks. That allows them to do some different things than sharding typically would that they say here.
So you can put more chunks of data being managed to do a scale-up scenario, elasticity being able to move them around, partitioning flexibility in terms of changing the chunk sizes or partitioning dimensions, and deleting the chunks older than the threshold. So maybe they're kind of like partitions, but not really. They're their own thing. And they show some benchmarks here that they've looked at with a single node. The INSERT performance for the cluster was 2.7 million metrics processed per second. And with eight nodes and one access node that grants access to essentially the data nodes, they get up to 12.8 million metrics per second. So if you work with a lot of time series data and you're looking to scale further from a single instance, then maybe this is something you'll want to take a look at. So it's a pretty long post. It goes into some of their rationality for structuring the way they did, and how they feel it's different from traditional sharding. So definitely a blog post to check out.
The next post is actually a webinar, and it's Webinar: Business Intelligence with Window Functions in PostgreSQL [Follow Up]". This is from 2ndquadrant.com. So you can register for the webinar right here, or look for the recording right here and it basically covers Windows functions. I haven't had the opportunity to watch the whole thing yet due to other time constraints this week, but I'm always interested in learning more about window functions. They're the kind of thing that I don't use on a daily basis and anything that keeps me fresh on different capabilities of analyzing data, I like to keep up. So if you'd like to do that, definitely a webinar to check out.
The next post is "Postgres Table Partitioning". This is from enterprisedb.com. It describes the partition feature of PostgreSQL, talking about range partitioning, list partitioning, and the newly added hash partitioning and how you would use them in different scenarios. So range is typically for dates, you have the list of known values that you can specify and hash is being able to use a hash function on a wide variety of data to be able to break it into equal buckets. They go through showing what you can do with declarative partitioning, create the base table, and then create each partition along with inserting data. Talking about the default partition and then talks about the performance benefits of doing partition exclusion when doing queries. And it gives some advice on when to use partition tables. So if you're looking into partitioning, definitely a blog post to check out.
The next post again from enterprisedb.com is "Pgpool versus PgBouncer". So this takes a very rational examination of each of these products. PgBouncer is marketed as a lightweight connection pooler, so that's pretty much what it does. And they say here quote "PgBouncer executes pooling correctly out of the box, whereas Pgpool II requires fine-tuning of certain parameters for ideal performance and functionality". So Pgpool II has a lot more features compared to PgBouncer, but it requires more configuration. So if you're looking for this evaluation or potentially want to switch from one connection pooling solution to another, definitely a blog post to check out.
The next post is "Tips for Postgres from a Postgres Insider". This is from enterprisedb.com as well. So these are a set of 9 tips and they're not specifically PostgreSQL related, but they can be items that are tangential to PostgreSQL. In other words, something that will impact the database but is not in the database necessarily. So the first thing mentioned here is SMART Tools for assessing Hardware. So that's not necessarily a PostgreSQL configuration, but it's something if you have direct access to the hardware, something you want to monitor the health of your storage, as they say here. Pg_test_fsync to determine what fsync method is the fastest for your operating system and hardware.
They talk about the backend flowchart that actually communicates how data flows through PostgreSQL if you want some extra knowledge in that area. And they go over some other ones such as a command line control and GUC or grand unified configuration levels. So where you can set different levels of configuration, the setting of work_mem, how to do that optimally, transactional DDL, which is one of Postgres's more interesting capabilities as they mentioned here, the advent of virtual columns in PostgreSQL 12, and of course time zones. So if you want to get more in-depth into some of these tips, definitely check out this blog post.
The next piece of content is actually a YouTube video called "What's New in Postgres 12?". This is from a webinar and they basically outline some of the new upcoming features for 12, including the partitioning improvements, B-tree improvements for indexes, most common value statistics, and the fact that it inlines many CTE queries by default now. Prepared plan control, just-in-time compilation, checksum control, and reindex concurrently. So if you want to learn more about Postgres 12 features that are coming, definitely a webinar to check out.
The next post is "PGBACKREST TO BACKUP POSTGRESQL". This is from cybertec-postgresql.com. This is exactly what it says. It explains the pgBackRest solution, how to get it installed, and how to configure it for doing backups for your system. So if you're interested in moving to pgBackRest or evaluating it, definitely a blog post to check out.
The next post is "Computing day working hours in PL/pgsql". This is from fluca1978.github.io. So this is a relatively simple thing to do, basically, calculating the number of working days in a particular month. He actually designed a custom function in PL/pgSQL. So if you want to get more experience with writing PL/pgSQL functions, here's a blog post you can check out.
With regard to that, the next post is "Why you need plpgsql_check (if you write procedures in PLpgSQL)". This is from okbob.blogspot.com. So what this does is it does some validity checks to the code that you're writing that the built-in solution does not have. It even has a profiler as well as some performance warnings here. So it looks like it's a great tool in your PL/pgSQL development. So definitely check out this blog post if you tend to do a lot of that because maybe you'd like to use this tool to make your job easier.
The last post is "Waiting for PostGIS: 3GEOS 3.8". This is from crunchydata.com. So basically what they're talking about here is that a PostGIS v3 is coming. What a lot of it does is it links to specialized libraries, as they say here, to handle particular problems. In one of these, they used GEOs for computational geometry, and there were some issues. They're talking about some functions that are backed by GEOS, including the ones listed here. They've also had some problems because sometimes the library throws errors, but they're working to make improvements with PostGIS 3 that apparently you'll really see the benefits with GEOS 3.8 in terms of being more reliable. So if you use PostGIS, definitely something to look forward to with the upcoming PostGIS 3.