New CVEs, Postgres Programming, JSONB, Advisory Locks | Scaling Postgres 76
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss new CVEs, programming Postgres, JSONB comparison and using advisory locks.
Content Discussed
- PostgreSQL 11.5, 10.10, 9.6.15, 9.5.19, 9.4.24, and 12 Beta 3 Released!
- Postgres is the coolest database – Reason #5: It can not be bought out
- Overview of Server-Side Programming in PostgreSQL
- Comparing Postgres JSONB with NoSQL
- Postgres 12 highlight - SQL/JSON path
- Postgres Advisory Locks with asyncio
- Doubling the Sorting Speed of Postgres Network Types with Abbreviated Keys
- Automatically updating materialized views
- PostgreSQL Graph Search Practices - 10 Billion-Scale Graph with Millisecond Response
- Waiting for PostGIS 3: Separate Raster Extension
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is actually new PostgreSQL releases on the postgresql.org website. So they have new releases including 12 Beta 3. Now most of these appear to be due to security issues that were identified, but also 40 bugs were addressed. Now two of these security issues involve the SQL server itself. One related to security definer functions which we've talked about in a previous episode of Scaling Postgres. If you're using these you should definitely consult the documentation on the best way to use them without causing a potential problem. And then the last one is memory disclosure and cross-type comparison for the hashed subplan. And this one says for an attack to become possible, a super user would need to create unusual operators. The other two involve Windows installers and of course, you also get a host of bug fixes as well. So go ahead and update your instance when it's best for your environment.
The next post is "Postgres is the coolest database - Reason #5: It cannot be bought out". This is from 2ndquadrant.com and they've had various different reasons coming up as to why Postgres is the coolest database. This is reason number five and they have links to some of the other ones that they've been talking about. But this was kind of interesting talking about the structure of the PostgreSQL community, how it develops, and why that kind of makes it pretty resilient and that it can't be bought out. For example, they use the example of MySQL being bought out by Oracle and how that's not really possible because there's really no one company behind it.
It says it assigns copyrights to the PGDG which is the PostgreSQL Global Development Group which is a global community that cuts across countries, enterprises, people, and cultures. Some other interesting things are something as important as the core team they say, which currently comprises five people. It does not have more than two members from the same company and community conferences don't allow more than 50% representation from the same company on talk selection committees. So basically this just gives more evidence that it is truly an open source project that cuts across a variety of organizations and individuals. And there are links to some of the other reasons. So an interesting blog post to check out.
The next post is "Overview of Server-Side Programming in PostgreSQL". This is from pgdash.io. So this goes over all the different ways that you can do server-side programming. The first area they address are SQL functions, so you can create a function and make sure the language is SQL and you can basically create user-defined functions that do some sort of purpose like updating items and inserting them into an audit table for example. Now they make a note here, that the function body can only contain SQL statements, however, that means there are no flow control statements if there are variables and the like. Now you can do that with procedural-level which is essentially procedural-level PostgreSQL. So because the language is SQL, you can't really use these capabilities.
But we'll see that in the post further down. The next thing they talk about is C functions. So you can develop C functions, as they say, to pretty much do anything. So you can create a function and then define essentially where your entry point is talking about. You have a shared library here and the entry point is a sum. So you can call that function within some C code and then they go into the PL/pgSQL functions. Again, the language specified here is PL/pgSQL and this does allow loops and conditionals and variables and things of that nature. So it is more of a programming environment. This is included by default in PostgreSQL. Then there are other core procedural languages, so you can use Python, Perl, as well as TCL, but you may need to install some additional packages in order to get that to work because it's not really included by default.
For example, they were saying on Debian, you may have to install this package in order to get the Python working. But then they say there are also some non-core procedural languages, so there are some open-source projects that have support for things like Java, Lua, R, et cetera. And they have a link to the list here. Then they're talking about aggregate functions and how you can essentially create your own. And again they're using the SQL language to be able to do that. So for example, this does a median and they do SELECT MEDIUM from a student's table. And of course, you also have user-defined types, so you can basically create your own types. And you remember how one of the CVEs was about creating some sort of custom operator? There's a risk associated with that. Here, they're literally creating an operator. The next area they cover are triggers and how you can set up a trigger to be before or after an INSERT, UPDATE, and DELETE a row of a table on the truncate of a table, or instead of INSERTING, UPDATING, and DELETING a row of a view.
So triggers perform some action when another action takes place. So their example here is after an update on the items table, basically, you're going to insert it into an audit table. Then they have event triggers which are similar to triggers because as they say quote "While triggers respond to DML events on a single table, event triggers can respond to DDL events on a particular database". Then you can define rules which essentially lets you rewrite queries as they say here. So for example, CREATE RULE rule1 AS ON AN INSERT TO items DO INSTEAD NOTHING. So basically don't allow it. Then they talk about stored procedures, which is new with Eleven, which is essentially like the standard functions before, but these procedures allow transaction control so you can do COMMITS and rollbacks within them, whereas with PL/pgSQL straight functions you can't do that.
It's all part of one transaction, but with procedures, they enable you to have this transaction control. Then they cover what they classify as a few other exotic things such as Foreign Data Wrappers, connecting to another database, another Postgres instance, or MySQL, Oracle, et cetera. How it's possible to create your own index types with upcoming PostgreSQL being able to create your own table access methods. They talked about logical replication plugins used with things like logical decoding procedures, language handlers as well as extensions. So of course, this allows a whole host of functionalities to extend what Postgres can do. So it's definitely an interesting blog post. I encourage you to check it out if you're more interested in more options when it comes to server-side programming in PostgreSQL.
The next post is "Comparing Postgres JSONB with NoSQL". This is from blog.couchbase.com. So of course they're talking about NoSQL in the context of CouchDB, which is a NoSQL database. So in terms of comparing them, there's probably going to be a little bit of bias. But whenever they talk about JSONB and different ways to use it in PostgreSQL, I always find that content interesting. So I like just looking through it just to kind of understand more how people are potentially data modeling and using JSONB in the context of PostgreSQL or even NoSQL databases. Now, of course, they do talk about some disadvantages. They're saying the JSONB syntax is not necessarily friendly, but again, with a JSON path that's going to be coming, I find that syntax much more appealing. I included the link here to the JSON path feature that's coming in PostgreSQL 12 where you can do syntax like equipment rings, get the first item of the ring array, and get the name. So I find this syntax much easier to understand compared to some of the other syntax that JSONB uses with our current set of operators. So if you're interested in more content as it relates to JSONB, definitely a post to check out.
The next post is "Postgres Advisory Locks with asyncio". This is from unexpectedof.net. Now, I believe asyncio and AsyncPG are related to Python here, but they are talking about advisory locks and how they're using it as a solution to have only one process interact with a database on a particular connection. You can use here's the syntax that they use in terms of just create an advisory lock with a given name. So there's not a lot of PostgreSQL-specific content in this article, meaning code, so a lot of the code is Python and using these particular tools. But if you want to know how someone is using an advisory lock in their use case. This is a pretty short post to check out and see how they're using it.
The next post is "Doubling the Sorting Speed of Postgres Network Types with Abbreviated Keys". This is from brandur.org. Now, this is a post that talks about how they developed a patch for Postgres to be able to speed up inet and cidr types in Postgres. So it's more of a technical discussion. It doesn't give you guidance on how to improve the sorting speed of PostgreSQL as it is, but it's what this individual did in terms of developing the patch, in terms of achieving almost doubling of speed with inet and cidr data types. So if you're interested in the content behind this patch and a lot of detail into how it was developed and the rationale with regard to it, definitely a blog post to check out.
The next post is "AUTOMATICALLY UPDATING MATERIALIZED VIEWS". This is from pgsqlpgpool.blogspot.com. Now, this is a short post, but it's basically communicating how to do it. Now, I don't see a lot of code with regard to the materialized views because of how they're doing it through triggers. So they don't actually have the code of the trigger, but they do explain some of the concepts of how you would need to develop triggers to be able to update a materialized view incrementally. So again, this is a very brief post, but it gives some ideas conceptually of how to automatically update materialized views.
The next post is "PostgreSQL Graph Search Practices - 10 Billion-Scale Graph with Millisecond Response". This is from alibabcloud.com. They're talking about using graph search using a CTE syntax that can easily implement a graph search in terms of in-depth search, shortest path, point, edge property, and so on. So this is a pretty long post that includes a lot of code in detail about how to do this. So if you have a data set that can be represented in a graph such as this, you might want to check out to see how you could potentially use CTEs and recursive CTEs to be able to achieve a similar type of response time in your application.
The last post is "Waiting for PostGIS 3: Separate Raster Extension". This is from crunchydata.com. So again, upcoming in PostGIS 3, they've broken out something that was included by default previously. Now, they are breaking out the Raster functionality allowing packagers to more easily strip down just the basics Postgres without also building the Raster dependencies. This also includes the somewhat heavy GDAL library. So just something to keep in mind when you upgrade PostGIS to version 3 is that this functionality is a separate extension.