Declarative Partitioning, Permissions, Trigger Speed | Scaling Postgres 20
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering declarative partitioning, handling permissions to view data, and the speed of triggers.
Content Discussed
- Declarative data partitioning with PostgreSQL
- PostgreSQL 10 – How to create a partitioned table based on JSONB column field value
- Scaling IoT Time Series Data with Postgres-BDR
- Using search_path and views to hide columns for reporting with Postgres
- rules or triggers to log bulk updates?
- PostgreSQL 11’s Support for SQL Standard GROUPS and EXCLUDE Window Function Clauses
- Diagnostic of an unexpected slowdown
- PostgreSQL Concurrency: Isolation and Locking
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about declarative partitioning,permissions, trigger performance and window functions. I'm Kristen Jameson and this is ScalingPostgres, episode 20.
[00:00:21] Alright, our first piece of content this week is actually a presentation and it's calledDeclarative Data Partitioning with PostgreSQL. And this is from PG Day, I Believe in Italy byGabriel Bartolini. And this presentation basically goes over declarative partitioning inPostgreSQL as of version ten because that's when this feature was introduced and it also coverswhat's coming in eleven. So this was a very comprehensive presentation about the state ofpartitioning, how it started out, supporting partitioning through inheritance and having to usetriggers to the declarative partitioning, which makes it a lot easier, and even what's coming inversion eleven, which are some really great changes. So it talks about offering the support ofmultilevel partitioning so that you can have a central master table or in this case here, a parent,but then you can have one using a range of dates as a first level and then even a second level,breaking out those dates and then into by customer ID or account ID. And then he goes intomore depth about what is introduced in terms of version eleven, offering hash partitioning inaddition to the list and range partitioning, about being able to add indexes on partition tables,foreign key support for partition tables, local unique index support updates, moving rows acrosspartitions and also default partition support. In other words, if you have a piece of data thatneeds to be inserted into one of the partitions but there's no match for it, it could be put into thisdefault partition. So a lot of great changes are coming in eleven. But still he does mention in thispresentation that there are a number of things that eleven still is not all the way there yet wherewe want it to be. Such as being able to prune not just for select to choose the right partition, butalso doing updates and deletes, making it easier to create partitions in the first place. Foreignkeys with partitions as the target and a concurrent detach and attach of partition. And then thislast important point is performance still requires work. So partitioning by years, not the samething of partitioning by hours. So when you get a lot of partitions, you can still see someperformance issues, but all those are probably going to be worked on in the next version,version twelve, version 13, et cetera. So if you're interested in partitioning, this is definitely apresentation to check out.
[00:02:48] Now, related to this, the next post is PostgreSQL ten, how to create a partition tablebased on JSON B column field value. And this is from the newbiedba WordPress.com blog. Thelittle things about databases. And this is a super short post, but he gives you in very minimaldetail in postgres ten how to set up a partition data using data that's contained within a JSON Bfield. So I think this is a great example of how easy it can be. And you don't have to use a JSONB field but for setting up a partitioning.
[00:03:26] The next post is also related to partitioning, although the title of this doesn't indicate itnecessarily. The title is Scaling IoT time series data with postgres BDR. And this is from theSecond Quadrant blog. And he does talk about the Internet of things and about how time seriesdata tends to be append only. But those are good candidates for partition tables. So a lot of thiscovers this section here, time based partitioning in Postgres. Now, you can pretty much ignorethe BDR that stands for Bi Directional Replication, which is something that Second Quadrantproduces. But this is in general a good post on how to set up partitioning. And he shows youhow to do it by setting partitioning over time. And again, this is using the declarative partitioning,how you can insert the data and then select it out. And then he also shows you how you canpartition over time and space. Now, space here he's using different devices that are collectingtemperature sensor data in his example. But like in the previous presentation, the very first pieceof content I covered, they were doing it by account ID. So it's similar but just a different way todo it. But definitely if you're looking to get into partitioning, these last three posts are really greatones to look over on how to get started and particularly what's --coming in version eleven, which are some really great changes.
[00:04:51] The next post is using Search Path and Views to hide columns for reporting withpostgres. Now, this post is about essentially permissions and what type of data someone isallowed to view. Now, as an example, they're using GDPR has gone into effect and there'sincreased concern about access to personally identifiable information.
[00:05:14] In this example they use a user's table that has an email and a first name and a lastname. But maybe you have other elements that you want someone to be able to access to. Lookat the user's table, but you don't want these identifying fields to be included in it. Now, in thisexample they use schemas and views and then revoke and grant privileges to do that. So thefirst thing they do is create a new schema called Mask and create a user called Nonadmin, orthis could be essentially a role. And then they revoke all privileges on the schema public. That'sthe by default schema that you receive for a database is the public schema and you remove thisnonadmin user from being able to access the public schema. Then you create a view using theschema that allows you to select that from the main users table. You grant usage on the schemamask to the nonadmin and then grant select on all the tables in the schema mask to thenonadmin. So basically the nonadmin can only access data through this newly created Maskschema. So now at this point users that are not nonadmin, they can just do a select all from thepublic schema users and get all the data. Whereas if you are non admin user you can do selectall from Mask users and you will see only the created at date because that's only what isincluded in the users table. But you can of course include other columns that don't includepersonally identifiable information.
[00:06:47] So this is a pretty brief post but a great example of how you can use schemas andviews in order to segment access to your data. And this tends to become important as you scaleup your database and more users want access to that data.
[00:07:01] The next post is Rules or triggers to log bulk updates. So there are a few interestingpieces of this post that I found interesting, but they're talking about rules. Now I haven't usedthese but upon looking through this post it seems like rules are a bit faster but they're more of anolder technology and has a quote here. There have been attempts to deprecate them but theyare used in the implementation of views so they're probably here to stay for a while. Not sure.Historically I've mostly used triggers, I haven't really used a rule. Now, just talking about the testcase here, he's actually setting up an unlogged table so there's no crash recovery for it, but italso is faster because it's not logging all the operations. And he also has turned auto vacuum offfor the creation of this table and basically he's creating a second table to serve as the audit logfor it. And then he does a loop to generate a series of data and checking the performance withusing two different triggers or a rule. So he shows a logging with a rule, he shows logging with arow level trigger and this will be fired on after the update of every row. So if you are updating tenrows it's going to fire ten times and then he does logging with a statement level trigger so thatyou just gets executed once for an entire update. And an interesting point he makes here is thatthis statement level trigger in post version, PostgreSQL version ten uses transition tables and hequotes the referencing clause will make all rows modified by the statement available in atemporary table called New Rows. So this implementation, I've been tested but presumably willonly work in version ten. And he looks at the test results and essentially a baseline for justinserting data was 295 milliseconds, the rule was 454, so about 54% slower. A row level triggerwas almost 4 seconds, so really took a lot longer to execute, whereas the statement level triggerwas pretty respectable at 642 seconds. So just a little over twice as slow. But given the rule is, Idon't know if it's necessarily antiquated, but may be deprecated in the future. It seems like forthis particular implementation, a statement level trigger might be the way to go. So if you'reusing triggers, this post gives you a good examination of different performance levels and withsome of the new features in version ten, maybe you would want to co --nsider using a statement level trigger to get a little bit of a higher performance improvementversus a row level trigger.
[00:09:40] The next post is PostgreSQL's eleven s support for SQL standard groups and excludewindow function clauses. So of course these are features in version eleven with regard toWindows clauses. And first what he's talking about is that there is a frame clause which givesyou a sliding average over your data. So essentially it's a moving average and the current valueyou are on in the last two and the first two are averaged in to give you what the moving averageis. And he gives an example of the data here. Now in this class he's using rows between twopreceding and two following, but he said there's other units you can use, you can use range. Andnow with version eleven you can use groups and he says, quote rows, count the number, exactnumber of rows in the frame range. Performs logical windowing where we don't count thenumber of rows but look for a value offset and then Groups counts all groups of tied rows withinthe window. So he actually gives two different comprehensive examples of how do rows behave,the range behaves and the new groups behave. This post also includes the exclude clause to beused in your window framing and gives an example of that again for the rows, the range and thegroups. So this is a very comprehensive post about these new features in PostgreSQL elevenwith regards to window functions. So if you're using window functions or you want to use them,definitely a post to check out to really grasp the potential use cases for window functions. And Ishould say this is from the J. O. Oq.org. Blog, which is Javasqlnjoq.
[00:11:31] The next post is Diagnostic of an unexpected Slowdown. So this was an interestingpost because again it gives you a real world scenario of a problem that was encountered andhow to diagnose what was going on and getting around it. Basically they had a 40 core server,essentially 80 threads with Hyper threading enabled, 128GB of Ram, shared buffers at 16GBand max connections of 1500. So a lot of connections don't normally see them that high, butthey had some special cases where they had to run due to an application using the database, ithad to run on 9.3.5, that specific version, and you could not use a connection pooler, so you hadto have 1500 connections to the database. And in that they were getting some long runningqueries for what should be relatively simple. So for example, with the normal timing doing acount of PG Stat activity returned in six milliseconds, whereas when they were having particularissues it was as long as 670 milliseconds, so 100 times as slow. So using the Explain plan,looking into the different possible solutions there are now this does get very technical and a littlebit deep in the weeds, but it's definitely an interesting read if you are into that sort of thing.Basically led down the trail of a potential memory issue as frequently happens when you have1500 connections. So he talked about the solution that they had to deal with and basically theychose to reduce the amount of shared buffers to the point that it resolved this memory issue butit was still able to give adequate performance. But he did note there were they could dosomething as of 9.4 if that was an available version to allocate the shared buffers in huge pagesand that could have alleviated some of the memory issues he believes. So definitely one ofthese behind the scenes posts about having a problem and addressing the solution you came upwith. I definitely find these particular posts quite interesting.
[00:13:41] The last post is PostgreSQL concurrency, isolation and locking. So this is a prettycomprehensive post discussing isolation and locking. So it's good as a documentation toaugment what's available in a PostgreSQL documentation, but it goes over the different levels ofisolation and as it relates to concurrent updates and gives some examples of how this behaveswith a practical example. So if you're wanting to learn more about that, definitely a blog post tocheck out.
[00:14:15] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show notes. Be sure to head over to Scalingposgres.com where you can signup to receive weekly notifications of each episode or you can subscribe via YouTube or itunes.Thanks.
[00:14:35] Our channel. Our channel? --