Columnar Tables, Advent of Code, PG 16 Features, Commit Times | Scaling Postgres 246
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss columnar table storage, solving Advent of Code using Postgres, new features coming in Postgres 16 and when Postgres development happens.
Content Discussed
- Powering PostgreSQL 15 With Columnar Tables
- Citus Columnar
- Advent of Code 2022: Days 1-5 With PostgreSQL
- How to Solve Advent of Code 2022 Using Postgres - Day 1
- How to Solve Advent of Code 2022 Using Postgres - Day 2
- How to Solve Advent of Code 2022 Using Postgres - Day 3
- How to Solve Advent of Code 2022 Using Postgres - Day 4
- How to Solve Advent of Code 2022 Using Postgres - Day 5
- Waiting for PostgreSQL 16 – Add grantable MAINTAIN privilege and pg_maintain role.
- Waiting for PostgreSQL 16 – Non-decimal integer literals
- Waiting for PostgreSQL 16 – Add test scaffolding for soft error reporting from input functions.
- pg_dump compression specifications in PostgreSQL 16
- PostgreSQL commit times
- Benchmarking multi-column, covering and hash indexes in Postgres
- Just Use Postgres for Everything
- What's new in Postgres 15?q
- Global Unique Index ATTACH Support and Its Potential Deficiency
- Oracle to PostgreSQL migration cost assessment made easy
- Tushar Ahuja
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family and coworkers continue to do well. Since we're at the end of the year, I'm going to be taking a few weeks off. So the next show will be coming in 2023. I look forward to seeing you and discussing Postgres in the new year. So for this week, the first piece of content is "Powering PostgreSQL 15 With Columnar Tables". This is from percona.com and they are talking about an extension that's called, I believe, Citus Columnar. That is actually the next generation of cstore_fdw (column store). And basically, instead of storing data in rows, the data is stored in columns. So all the data of a particular column is stored together. Now, the advantage of this is for online analytical processing because typically you want to take a sum, take account, know a lot about a particular column, and it also enables you to highly compress that data because it's all identical types of data within a given column.
And usually, you can get some speed and some space savings doing this. The disadvantage of storing it in a column is that it's impossible or at least very difficult to update data or delete data or even looking at single rows becomes much more difficult. But what this post does is it covers this feature using the Citus Columnar Extension and he discusses some cases where you would want to use it. Basically, OLAP is a major component of what you want to do with the data. INSERT performance is not necessarily a priority because it takes time to decompress and reinsert new data into each column. And you have append-only data because you cannot perform update delete operations on this type of table.
But it goes through and discusses more details with regard to that and then actually goes ahead and installs it and starts working with it and shows different examples and use cases of using this extension. So here you can see the typical access method is heap. That's the standard heap that is row storage. But you also can see the access method for this particular table is columnar. So he tries different queries and different runtimes comparing the two. So if you have an append-only, analytical-based use case, you may find this post of interest and this particular extension. So definitely check it out if you want to learn more.
Next piece of content- "Advent of Code 2022: Days 1-5 With PostgreSQL". This is from softwareandbooze.com. And every year there is an advent of code challenge. This particular post addresses days one through five, which include calorie counting, rock, paper, Scissors, rucksack, organization, camp cleanup, and supply stacks. As you can see, for each of these days, he has a video of how he did the solution for this, just using PostgreSQL. He said he did try to avoid using functions, procedures, and things of that nature. So it's just using SQL and the base functions provided.
Now, related to this, Greg at crunchydata.com has been doing the same challenge and he has days one through five of solving the advent of code challenges. But he did it in separate blog posts. So you can see days one, two, three, four and five are all separate blogs. And of course, all of these will be linked in the description of this episode if you want to check them out and see how they approach the advent of code puzzles using Postgres.
Next piece of content- "Waiting for PostgreSQL 16- Add grantable MAINTAIN privilege and pg_maintain role". This is from depesz.com, and he's talking about the new features added to 16 that enables you to grant a particular user the ability to do vacuums, analyze, reindexes, refresh, materialized views, clusters, and lock table and have no other superuser privileges, and also not be able to actually query the data. So he shows an example here where he granted a particular user the pg_maintain role, and now he can see that it can do a vacuum, but it can actually not select from that table. So this is a great addition and he discusses this enhancement here.
The next enhancement he discussed is "Non-decimal integer literals". So basically, Postgres can now take different types of integer literals, including hexadecimal, octal and binary integers. So you can see here when he does a select statement using hexadecimal formats, octal and binary formats, they actually resolve to the same integer, essentially. So check this blog post if you want to learn more about that.
And then his final discussion is on "Add test scaffolding for soft error reporting from input functions". Now predominantly this affects testing, but they say some user use cases may be present for these types of functions. And he discusses that in this blog post too.
Next post also related to things coming in Postgres 16 is "PG_DUMP COMPRESSION SPECIFICATIONS IN POSTGRESQL 16". This is from cybertec-postgresql.com. They're saying in 16 with pg_dump, you will now be able to determine what compression algorithm you use. So instead of just having none or gzip, now you can use things like gzip, lz4 and zstd, which I'm super happy about for doing a pg_dump because I find this the fastest, most efficient way to do it. It has parallel compression built right in, meaning you can use multiple CPU cores, which I hope is a configurable option. But like today I use pg_dump but then pipe it to zstd to do my compression. And if it's built into pg_dump to be able to do that, that would be awesome. But you can check out this blog post if you want to learn more about that.
Next piece of content- "PostgreSQL commit times". This is from peter.eisentraut.org and he looked through the git repository of Postgres to determine "Hey, when were these commits made, on average, at what time of the day over the last nearly 30 years?". So you can see examples here that prior to 2000, it was basically a late evening, early morning hours work being done on Postgres. Then up to maybe about 2012, or 2010, it started to be an evening occupation working on Postgres. But ever since, again between 2010, and maybe 2013, people committing to Postgres have been doing it in the middle of the day. So basically it's become more of their job responsibility to work on Postgres. And then he actually showed only his commits and basically it lines up well with that. When he started committing to his job, it became pretty much the middle of the day, whereas before it was part of his job it was primarily, as you can see, done at night. So pretty interesting insight and you can check out this blog post if you want to learn more.
Next piece of content- "Benchmarking multi-column, covering and hash indexes in Postgres". This is from pganalyze.com and these were two posts that were discussed in the last couple of episodes of Scaling Postgres, and here Lukas discusses them as well if you want his perspective. Definitely encourage you to check out his content.
Next piece of content- "Just Use Postgres for Everything". This is from amazingcto.com and this post, as opposed to using Redis for things, Mongo for things, and Elastasearch for searching things, you should just use Postgres as opposed to implementing all of this other technology. And I definitely agree with a lot of what is said here. I've never implemented a separate message queuing system, I've always used Postgres's capabilities. I've used plenty of the JSONB functionality in Postgres as opposed to other types of non-relational databases. I tend to only use Postgres for searching as opposed to using Elastic search. I'm not so sure about using Postgres for Cron jobs because I typically just run it on the operating system, although if you have a hosted database that may become more important. Also I haven't really used what they're describing here in terms of unlogged tables and stored procedures to add or force an expiry as opposed to using Redis. So I would probably still tend to use Redis in this case, but this is an alternative. But if you want to learn more, you can definitely check out this blog post.
Next piece of content- "What's new in Postgres 15?". This is from supabase.com and this is a post that covers all the different features in a pretty well organized way that have been introduced into Postgres 15. So if you want a quick overview of that, definitely welcome to check out this article.
Next piece of content- "Global Unique Index ATTACH Support and its Potential Deficiency". This is from highgo.ca and this is another post discussing their globally unique indexes that they're trying to get working for Postgres partition tables. So if you want to learn more about their progress on that, you can check out this blog post.
Next piece of content-" ORACLE TO POSTGRESQL MIGRATION COST ASSESSMENT MADE EASY". This is from cybertec-postgreslq.com. They're talking about a CYBERTEC Migrator they developed to help you make decisions when you want to migrate from Oracle to Postgres. Well, now they have a cost estimate as a part of it.
And the last piece of content. The PostgreSQL person of the week is Tushar Ahuja. If you're interested in learning more about Tushar and his contributions to Postgres, definitely check out this blog post.