Custom Data Types, Row Level Triggers, Dynamic pl/pgsql Columns, citext Usage | Scaling Postgres 162
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss implementing row level triggers, how to return dynamic columns using pl/pgsql, and when to use citext.
Content Discussed
- Creating Custom Postgres Data Types in Rails
- What to return from a PostgreSQL row level trigger?
- Getting value from dynamic column in pl/PgSQL triggers?
- Handling case insensitive strings - EXPLAINING POSTGRESQL
- Hash Index Internals
- Calling Stored Procedure from JAVA – Part 2
- Visibility with query cache
- Pavlo Golub
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Creating Custom Postgres Data Types in Rails". This is from pganalyze.com and they are talking about custom data types in Postgres. But also the second part of it applies to how you can actually use it with the Ruby framework, Ruby on Rails. So what they're talking about is a custom data type where, for example, you could create a composite data type where you have dimensions that are, itself, built from three integers. Then you can actually use this data type in a table. The first thing they cover there are two data types. The first one is a domain type where you can put a certain restriction or a constraint on that data. One example they gave is a data type with currency, for example, can't be negative, or in this case, it's a string that cannot contain an exclamation point. They just called it a string_without_bang.
This is a domain type. So they show some examples of using it and how Postgres will give you an error if you try to insert exclamation points into this column. And you can also nest these domain data types. So they added an additional check where an email must contain an @ symbol and that works as well. Then they talk about the composite data types, where it's a data type that contains more than one base data type. Here, they're using dimensions and three different integers for the depth, the width, and the height. How you generally insert into there is using these parentheses in order to insert the data. There is a way in SQL to be able to pull out one single column from that data type. Now, in the next part of the post, they talk about how you can implement this in the Ruby framework, Rails. The first thing they did was they did a migration to apply the DDL.
So they're creating the new type and creating their new domain and then creating the table using these new types. After they're created, the database is all set up and ready to use. However, it doesn't really know how to interpret it. It interprets pretty much anything it doesn't know as a string. So if you try to return the dimensions, it'll just return it as a string. Rails or the Active Record Interface has an attributes API where you can define these new types. So generally, in Ruby at least, you would create a class that defines the different entities it contains and define a few functions to use it, as well as create an ActiveRecord type value to be able to interpret this dimension type. So the database knows how to use it. Once you have this in place, as well as defining this attribute on your active record model, you can then use this new composite data type as you would any other data type. So if you want to learn more about composite data types and domain data types in Postgres, especially if you're using Ruby on Rails, definitely check out this blog post.
The next piece of content- "WHAT TO RETURN FROM A POSTGRESQL ROW LEVEL TRIGGER?". This is from cybertec-postgresql.com. They talk about when you create a trigger in Postgres, generally, the first thing you do is define the function of what actions you want the trigger to do. Then you define the actual trigger that determines when that function will be fired. Then they cover different trigger types. So there's event triggers and then there's triggers that get fired by data modifications and that's what they focus on here. With regard to that, there are statement-level triggers that fire for each statement. But what they're discussing here is actually whenever a row gets changed, those are the row-level triggers. Triggers can run either before, after, or instead of a data modification. They talk a little bit about the return value of a trigger in that generally statement-level triggers, it's null and row-level triggers, it depends on how they're being used.
But one thing to keep in mind when you're using row-level triggers is that you're going to have the definition of a new or an old variable in the trigger. NEW is set on inserts and updates. So this is the new data and OLD is set on updates and deletes. So it represents the old data that was in the row that's either been deleted or from what has been updated. So they created a basic row-level trigger. So basically they have a table with a value in it that's text. Then they created a history table to track modifications to it. They defined a function and they used a conditional looking at the trigger operation. So if it's a delete, insert the data into it this way, given the presence of the old or the new variable.
Otherwise, insert data into the history table this way. Then they define the trigger that on INSERT OR UPDATE OR DELETE for each row EXECUTE this function. Now, what they make note of here, a lot of audit triggers use a technique similar to this. But you can make this more efficient by removing this conditional because you can actually do it by just defining a coalesce to determine what is shown here. So for example, you could just do an insert into there and coalesce, whether it's a new or an old, so that that which is NOT NULL is the one that gets inserted, and then new will be inserted or it will be inserted as NULL. Essentially, you can significantly simplify this audit using this type of function versus this one with a conditional. So definitely an interesting use case of coalesce in this case for audit triggers. So if you're interested in learning more, you can check out this blog post.
The next piece of content- "Getting value from dynamic column in pl/PgSQL triggers". This is from depesz.com. He's talking about how when you want to pass in a row into a function and then be able to determine what column you want to return. So as an example, he created a basic function where you're passing in a row and it's always going to return the same column from that table that's defined. He did a test of returning all the data and then using the function to return, essentially, the payload again through the function. So this works as expected. Next, he says, how can you do it given a dynamic column that's passed in?
So you get to determine what column of data gets returned from this function. So here he developed another one and he's using EXECUTE and formatting the query to be done using the p_row being passed in into a variable that gets returned. As you can see, this works and gives you a dynamically returned column based on the variables that are passed in. Now, I actually tried doing the same thing, but using an H Store column to actually do the return. This implementation gives you the same answer as well as trying JSON and JSONB implementations.
To complete this, he even used an entirely different language using Perl as a way to do it. They needed some performance checks to see, okay, what is the most efficient at doing this? Now, of course, the fastest is to get static. So that's the fastest implementation out of all of them. But the next fastest was the dynamic_hstore. So the HStore implementation, which I was a little surprised by, then comes the JSON and the Perl. They're pretty much equivalent in the JSONB may be in the range of statistical variation, but by far the slowest was the pl/PgSQL, where you're doing an EXECUTE on a format. So if you have a need to do this, maybe you're interested in checking out this blog post to find out more.
The next piece of content is actually a YouTube video and it's "Handling case insensitive strings- EXPLAINING POSTGRESQL". This is from the YouTube channel CyberTech and it's a relatively short six-minute video, essentially talking about the citext extension and data type that you can use for case-insensitive strings. So as opposed to using a function “lower” to query a column to make sure that you ignore case, and then worrying about potential expression indexes because otherwise an index won't be used if you're applying a function to it, the citext kind of gets around that. Now, there may be some downsides and considerations you need to take into account for your application framework, but definitely, an interesting video that shows you how you could start using citext if you're interested.
The next piece of content- "Hash Index Internals". This is from highgo.ca, and they're talking about the hash indexes. Now, they do have a number of constraints that we've discussed in a previous episode of Scaling Postgres in terms of how they don't really allow unique constraints, index-only scans, or covering indexes. There may be some size advantages to B-trees in certain instances, but there may be some use cases for them. And this kind of goes into detail about how they work on the implementation side. So if you're interested in learning more about them, you can check out this post.
The next piece of content- "Calling Stored Procedure from JAVA- Part 2". This is from highgo.ca, and they're discussing how you can use stored procedures using not only multiple in parameters but also in-out parameters. So if you're interested in learning more about this, you can check out this post.
The next piece of content- "Visibility with query cache". This is from pgsqlpgpool.blogspot.com, where they're discussing how pgPool can actually do caching of queries. But there are some potential drawbacks to it depending upon how your transaction isolation levels are configured and what you're running as. So READ COMMITTED kind of gives you some expected results. But REPEATABLE READ using this feature can give some unexpected results or even potentially serializable transaction isolation. So if you use this feature or you're interested in using pgPool, you may want to check out this post to learn more about these potential gotchas.
The last piece of content, the PostgreSQL person of the week is Pavlo Gulub. So if you're interested in learning more about Pavlo and his contributions to Postgres, definitely check out this blog post.