Postgres is Fine, Generated Columns, Postgres Inserts, Synchronous Replication | Scaling Postgres 122
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss determining if Postgres is fine, generated columns vs. triggers, insert features and synchronous replication usage.
Content Discussed
- PG Phriday: It’s the End of the World as We Know it (and Postgres is Fine)
- Generated Columns Vs. Triggers in PostgreSQL 12
- A Walk Through PostgreSQL INSERT
- [YouTube] When to use Synchronous Replication in PostgreSQL
- Features In PG13 – Deduplication in B-Tree Indexes
- Random Strings and Integers That Actually Aren’t
- Getting Started with PostgreSQL Operator 4.3 in OpenShift
- A simple way to trace a bug
- pg_timetable: start-up improvements
- Kohei Kaigai
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PG Phriday: It's the End of the World as We Know it (and Postgres is Fine)". This is from 2ndquadrant.com, and this was a presentation that was given at the Chicago Postgres Users Group and was converted into a blog post. It's all about monitoring. He first covers why monitors and of course, this is good for determining why something potentially broke or failed. However, it's also important to know potentially what could break in the future. He then goes over to talk about some different areas you would want to monitor, such as transactions, sessions, status of replication, disk space, et cetera. He mentions a few key areas that you definitely want to be monitoring to make sure that your Postgres installation is running optimally. The first is a transaction ID wraparound to ensure that essentially that does not happen on your tables.
He gives you two different queries, one at the database level and at the table level to be able to see where the txid is at. The next is making sure you don't have any prepared transactions that have been orphaned. Now again, this is not used very much, but if it's being used at all, you want to make sure that you don't have any of these orphaned because that could definitely cause a problem. Also monitor your number of idle transactions, again, because you want to minimize these as much as possible. Also the question about do you have hot_standby_feedback on because that could also cause some WAL to build up potentially. Then he talks about checking replication lag, looking at slots, and also just checking pg_stat_replication in general, reviewing your temporary storage space checkpoint activity as well as your active transactions and your potential WAL throughput. So if you are looking for a good set of areas to monitor for your Postgres installation, definitely check out this blog post.
The next post is "A COMPARATIVE APPROACH BETWEEN GENERATED COLUMNS AND TRIGGERS". This is from ongres.com and they're comparing the new feature-generated columns in Postgres 12 to using a trigger method. So a generated column enables you to run an immutable function against data in the table to generate a new column, essentially. So in this table here that they've created, they have a radius named R and it calculates the area using a generated column using the function to determine the area based upon the radius. Now they check the performance of this. They show the system that they used here. And here are the benchmarks that they looked at. So using generated columns was the fastest approach.
Creating a trigger in C got very close, as well as an immutable trigger in C got very close to the performance of generated columns, but generated columns still came out on top. The slowest was the PL/pgSQL trigger and then the second slowest followed by the Immutable trigger. So in general, if you have an Immutable function, you would want to use generated columns. If you have to use a trigger and need better performance, writing it in C would give you better performance than using, say, PL/pgSQL. Now this is for INSERT operations. The UPDATE operations weren't that different between each of these methods, so that's pretty interesting. So if you want to learn more about these generated comms versus trigger benchmarks, definitely check out this post.
The next post is "A Walk Through PostgreSQL INSERT". This is from crunchydata.com and it's talking about INSERT statements. The first thing it covers is that if you need to insert multiple values, you definitely want to do this as a part of one statement. So you can see here it has values and then a comma between each row you want to insert. So if you have multiple values, this is a much faster way to do INSERTS. The next thing they cover is that if you happen to have need of data after an insert, you can use the RETURNING function. So for example, in this case, you're returning the ID to know what the ID of the row you just inserted was.
So that could be beneficial if you then need to do work with that record. Then the last thing to cover is the different ON CONFLICT actions you could use. So if you have a conflict during the INSERT, you could DO NOTHING. So just ignore that INSERT such as you're violating a primary key, it would do nothing in that case. Or you could do an update, essentially an upsert. It's going to try the insert. If that fails, it's going to do the update. So this is just a very quick post showing some different features of the insert statement in Postgres.
The next piece of content is actually a YouTube video called "When to use Synchronous Replication in PostgreSQL". This is from the EnterpriseDB YouTube channel. This is a follow on from the blog post that was mentioned in the previous episode of Scaling Postgres. But I think this goes into a little bit more depth of the different individuals in the video discussing when to use synchronous replication. It mentions considerations where there are times when you actually may want to have one setting at the database level. But then for a particular session or transaction, use more, say, aggressive settings for the synchronous_commit command so that you can ensure that that particular transaction or set of data gets committed on all replicas with a high degree of confidence. So if you're investigating using synchronous replication, definitely check out this YouTube video for some additional insight.
The next piece of content is "Features In PG13- Deduplication in B-Tree Indexes". This is from highgo.ca. So this talks about the new feature that dedups more entries in the B-tree index in version 13 compared to 12. They talk a little bit about how it works. It doesn't dedup everything, but it significantly increases the amount of deduplication that is done. Now, in their example here, they inserted about a million rows, but then to create the duplicates, they actually just updated those rows three times. So this isn't really a duplicate of the data itself, but they're creating multiple rows that I'm assuming haven't been vacuumed.
So it's interesting they actually didn't create duplicate entries, so that may have given a slightly different result. But looking at Postgres 12, you can see the index size is 86 megabytes, whereas, in Postgres 13 Beta2, the index size is 62 megabytes. So that's a pretty good space savings. Now, they also did a query to see if it is any faster. Looking at version 12, it returned a query from this table in 190 milliseconds, whereas in Postgres 13 it returned in 174 milliseconds. So again, a little bit faster in terms of performance. So if you want to learn more about the performance implications and space savings coming with the deduplication of B-tree indexes in Postgres 13, definitely check out this blog post.
The next post is "Random Strings and Integers That Actually Aren't". This is from endpoint.com. So this is a blog post that covers how to create random alphanumeric strings of this nature. So things used for maybe coupon codes or things related to that. What it does is it converts an integer into this type of string repeatedly and it uses a Feistel network. Now this is done using PL/pgSQL functions. So this function takes an integer. So in this case, imagine this is a serial data type.
So you're getting essentially the integer primary key and it's converting that into a random string of numbers first. So that's the first step. You can actually use the same function to get this other number back. So once you then have this random number, you can use the string generation algorithm again in PL/pgSQL and it converts it into these alphanumeric strings. Now, as a final bonus, if you use Python, he gave a Python implementation as well. So if you have a need to generate these types of random strings for coupon codes or other use cases, maybe you want to check out this blog post.
The next post is "Getting Started with PostgreSQL Operator 4.3 in OpenShift. This is from crunchydata.com. So if you use Kubernetes and want to use the PostgreSQL operator by Crunchy Data, check out this blog post on how to install it and how to use it with OpenShift.
The next piece of content is "A simple way to trace a bug". This is from highgo.ca and this is talking about a bug that the writer discovered in Postgres in the circle function. It talks about how he actually went through the source code using tracing to actually find where the bug was. So if you're interested in this technique, he used to be able to discover where this bug was, definitely check out this blog post.
The next piece of content is "PG_TIMETABLE: START-UP IMPROVEMENTS". So these are some improvements that cybertec-postgresql.com has implemented for their tool called pg_timetable, which is a scheduling tool for PostgreSQL. So if you want to learn about the changes and improvements to that, definitely check out this blog post.
The last piece of content, the PostgreSQL person of the week is Kohei Kaigai. If you want to learn more about Kohei and his contributions to PostgreSQL, definitely check out this blog post.