Leveraging Indexes, Slugs, Addresses, Security Definer | Scaling Postgres 64
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss leveraging indexes, a slug function, addresses and security definer in functions.
Content Discussed
- PostgreSQL 11.3, 10.8, 9.6.13, 9.5.17, and 9.4.22 Released!
- Getting The Most Out Of Your PostgreSQL Indexes
- Ultimate PostgreSQL Slug Function
- Quick and Dirty Address Matching with LibPostal
- Abusing SECURITY DEFINER functions
- PostgresPro Presentation
- PoWA 4 brings a remote mode, available in beta!
- How to play with upcoming, unreleased, PostgreSQL?
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about leveraging indexes, slugsaddresses and security. Definer. I'm creston, Jameson. And this is scaling postgres episode 64.
[00:00:21] Hi, I hope everyone is having a great week. So before we start off with the content forthis week, I wanted to just say thank you for watching so many episodes and for subscribingbecause actually the Scaling Postgres YouTube channel just crossed over a thousandsubscribers. So thank you very much for tuning in every week. So our first piece of content thisweek is actually a news item from PostgreSQL, where new versions of PostgreSQL have beenreleased since 9.4, so 9.4 up to 11.3. Now there are four security issues. Few that are related toWindows, one related to bypassing Row security policies. But this is probably the one that mayimpact probably most people if you're on eleven. So it only affects eleven and it's a memorydisclosure and partition routing. So it relates to partitioning. So if you are using that versioneleven and are using that feature, you probably want to go ahead and get patched as soon asyou can. And also with this release, they say they have over 60 bug fixes and list the differentthings that were resolved. So definitely something to check out if you need to do an upgrade.The next post is getting the most out of your PostgreSQL indexes. So this is sort of a bestpractice document for using PostgreSQL indexes. So the first section it covers is coveringindexes, where you can essentially attach an additional payload to an index so that you canincrease the probability of getting an index only scan. And in this example, they're looking foractive customers and they actually put in the include statement the email. So if you're wanting toget the email from the customers on who is active, you can get an index only scan by using acovering index using the include clause. But again, this is only available for PostgreSQL eleven,so just keep that in mind. The next area to think about is using partial indexes. So partial indexesonly cover a certain part of the data in the table. So in this case they have an address table andwhat they wanted to do is actually create a partial index where the district is California. So it'sonly going to be maintaining those index values for the part of the table where the district equalsCalifornia. Now, where I find partial indexes particularly useful is if you have a column that isindicating some sort of status for the items in the table. Like maybe you have three or fourstatuses, so maybe the cardinality how random the values is rather low. Using separate partialindexes will help pull out different records of different statuses. So I find those very effective touse in that case. And then they talk about MultiValue indexes. Now initially I thought they werereferring to multicolumn indexes, but they're actually talking about data types that store morethan one value. So like JSON fields or array fields and things of that nature. So in this examplethey're using the containment operator to see if the left hand is a superset of what's in the righthand and when they're just doing a query on a I believe they're using an array field here for thisvalue it just does a sequential scan. When they use a b tree index, again, it's still a sequentialscan. So with columns that contain multiple values, what you want to use is actually a gen index.So things like wanting to use it for full text search or arrays or JSON or JSON b data types, youwould want to use a gen index in that case. And then they talk about in general you want toeliminate duplicate indexes because of course it uses up more disk space, it requires resourcesto maintain those indexes. So you want to review and find unused indexes and remove themwhen they're no longer used. And then they talk about rebuilding indexes and how for versionsten and eleven you can bump up some of the max parallel settings to get faster index creationas well as once your database gets of a certain size or it's a best practice to do it in general isused concurrently with a live or production database. So definitely a lot of great advice to followwith regard to PostgreSQL indexes. Oh, and I should say this is from the PG IO blog.
[00:04:39] The next post is ultimate PostgreSQL slug function. And this is from Kdobson Net andhe said he's developed a few web apps and APIs and utilities and he's created Slug functions.That basica --lly takes some sort of a title. Like a blog title and converts it into a URL friendly format,generally undercasing it and putting a hyphen between usually where a space is and eliminatingcertain characters such as a double quote, exclamation points and then convert non ASCIIcharacters as well. Although he says that may not be 100% necessary. But he's actually lookedto do this in a PostgreSQL function that he can call from his applications. Now it's based uponone that was developed by Ianks on GitHub and he has the link here but he's done a little bit ofhis own in terms of removing a single and double quotes. And if you look it's pretty simple tounderstand. It creates a slugify function and it is using the unaccent function to handle some ofthose non ASCII characters. So you do need to install the extension to the database, but it'sbasically just a CTE that first removes the accents, lowercase the string, remove single doublequotes, replace anything that is not a letter number, Hyphen or underscore. With a hyphen andthen trims the hyphens if they exist on the head or tail of the strings so it goes through and doeseach of these steps in order to create this slug. Now, once you have that, you can just call thefunction and it will output the slug value right here. Now you can immediately start using it or youcan actually use it as a trigger. In his example here, he creates a trigger and before the insert foreach row where the title is not null and the slug is null, go ahead and execute the procedure. So,a quick little blog post if you want to look into slugifying using a PostgreSQL function.
[00:06:38] The next post is quick and dirty address matching with Lib Postal. And this is fromCrunchydata.com Blog and this is looking at Lib Postal which is a they describe as a naturallanguage processing library trained on over 1 billion address records in the internationalOpenStreetMap database. And they're actually using it through an extension called psql Postalto call the library from SQL. So what it does is it takes a string that represents some sort of anaddress and you can actually get normalized values of how this could potentially be interpreted.And then you can also do a parse and then outputting to JSON pretty to actually interpret thisstring. Now you could do this pretty well by yourself using like us for addresses for example.However, this library has the great ability to handle all sorts of different international addressesas well. So here's something from an address in Berlin and it automatically knows how totranslate the address in the example here, STR. It's able to know it means Strauss which isstreet as they say here. And then they go into if you need to use this type of tool, it tells you howto download and install the Lib Postal library, prep the extension and then go ahead and loadsome data in order to test the features of this. So if you're wanting or needing to work with a lotof different addresses, potentially from different sources and different formatting and want tocome up with a common way to do it, this is definitely a blog post to check out because they alsodiscuss searching them using the full text search engine. So definitely an interesting blog post ifyou're interested in that.
[00:08:29] The next article is Abusing security definer functions. This is from CyberTechpostgresql.com. Now the first question is what is a security definer function? And they say herePostgreSQL functions are defined as security invoker. By default, that means they're executedwith the user ID and security context of the user that calls them. Okay, that makes sense, but asecurity definer function will run with the user ID and security context of the function owner. Sothis enables you to pass on certain powers to someone. But as you can tell, this is pretty riskythat they're basically running something as you, so you need to be extra careful when you'reusing this. And one of the recommendations they have here is that the search path should be setto exclude any schemas writable by untrusted users. This prevents malicious users fromcreating objects, tables, functions operating that mask objects intended to be used by thefunction. And in terms of the dangers they look at this here where they have this pretty harmlessfunction that just says select what is passed in and add one to it looking for an integer. However,if you do these series of steps, you create a function that along with --an operator, set the search path here and then you execute this function. You can literallymake yourself a super user. So this looks to be something you need to be really be careful with.And of course, the last section is how can you protect yourself. And they suggest threemeasures here as recommended by the documentation always set search path on a securitydefiner function and put Pgtemp on the list as the last element and that's emphasized in thePostgreSQL documentation. Also don't have any schemas in the database where untrustedusers have the create privilege. In particular, remove the default public create privilege from thepublic schema and also revoke the public execute privilege on all security definer functions andgrant it only those users that need it. So again, a set of security guidelines you should definitelyfollow, particularly if you're using these security definer functions. The next post, I actually don'tknow what the title is because this is from Postgres Pro which I believe is located in Russia, butthis is a super comprehensive presentation that was given. It is in English, but I'm not sure as towhat the title is here, but it's over 90 slides long.
[00:10:55] A great many are talking about the JSON feature set that's available in PostgreSQL,talking about Controllable CTEs which are coming with PostgreSQL twelve where you candefine whether it will by default be materialized or not. Talking about the K nearest to neighbor,talking about different indexes features that are coming in twelve, the new pluggable storage thatwas implemented as well as potentially Zheep which will probably be one of the first new storagetype solutions available and talking about the different partitioning improvements. So if you wantto get a sense on where Postgres is today and going in twelve, definitely a great presentation tocheck out. The next post is Power four, brings a remote mode available in beta. This is from ourJuju GitHub IO. And this is talking about the postgres workflow analyzer. So it basically doesmonitoring and performance analysis of your PostgreSQL database instance.
[00:11:58] And before it ran on the node with its own database here, collecting statistics and ithad a web interface that would interface with a database. But with this new mode it can actuallyhave a separate power repository from your database systems and thus causing less load onthose to run it. So if you're interested in this type of monitoring solution, definitely a blog post tocheck out.
[00:12:26] The last post is how to play with upcoming unreleased PostgreSQL and this is fromDep and he goes through all the process to actually build from source the most recent commitsthat have been made to PostgreSQL. So if you want to get your hands on twelve early, this is ablog post that you can check out and get it running.
[00:12:49] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --