Performance Training, JSON Path, Hypothetical Indexes, Interpolation | Scaling Postgres 69
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss performance training, SQL/JSON path support, hypothetical indexes and linear interpolation of data.
Content Discussed
- PostgreSQL 11.4, 10.9, 9.6.14, 9.5.18, 9.4.23, and 12 Beta 2 Released!
- Explaining CVE-2019-10164 + PostgreSQL Security Best Practices
- CVE-2019-10164 Who's affected and how to protect your systems.
- [YouTube] PostgreSQL In-Depth Training: Performance Part 1
- [YouTube] PostgreSQL In-Depth Training: Performance Part 2
- Postgres 12 highlight - SQL/JSON path
- Hypothetical Indexes in PostgreSQL
- Per-project Postgres
- Linear Interpolation with PostgreSQL
- Physical Replication Mechanisms in PostgreSQL
- PostgreSQL Streaming Replication
- Install PostgreSQL 9.6 with Transparent Data Encryption
- Announcing Timescale Cloud: The first fully-managed time-series database service that runs on AWS, GCP, and Azure
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres we talk about performance training, JSON path,hypothetical indexes and interpolation. I'm Kristen Jameson, and this is scaling postgres episode69.
[00:00:21] Alright, I hope you're having a great week. Our first piece of content is actually thatnew versions of PostgreSQL have been released. Now primarily this is due to a security issuethat affects versions 1011 and twelve because of something to do with the scram authenticationand error in it. So that only impacts 1011 and twelve in terms of the security issue. But there arenumerous bug fixes and improvements for the version nine series. So definitely, especially ifyou're on version ten and higher, go ahead and upgrade your version of PostgreSQL as quicklyas you can. Now, going a little bit in depth into this security release is this next piece of contentexplaining CVE 2019 164 PostgreSQL security best practices. So this talks about this essentiallyout of cycle update release for the 2019 101 six four vulnerability. They go over essentially whatit is and essentially it's an arbitrary code execution vulnerability. But you need to be anauthenticated user and you could come up with a password that can basically crash thePostgreSQL server or in the worst case, have the operating system user postgres that runsarbitrary code. So if you have your system locked down, this mitigates some of this, but you'lldefinitely want to upgrade your version of PostgreSQL as soon as you can. And it also talksabout fixing a potential denial of service attack vector as well. Now again, they reiterate here thegood news, that the attack can only be accomplished by a user who was already authenticatedto your PostgreSQL server. So it's not someone who doesn't have access to it. They talk abouthow do you protect it. Basically you do the upgrades to the next version and they also did someupdates to libpq on the client side where it can cause issues for clients as well, potentiallycausing those clients to crash. And one of the recommendations here is that you can do a clientverification mode, make it verify full to avoid connecting to rogue PostgreSQL servers that couldpotentially do something to your client. So if you want to get more information about thisparticular vulnerability, here is one blog post. And this is from the Crunchydata.com blog. There'salso another one pretty similar in nature stating close to the same thing called CVE 2019 164who's affected and how to protect your systems. And this is from Evolvenkey@blogs.com. Soyou could check over this post as well. That describes some additional details.
[00:03:06] The next piece of content is actually two YouTube videos together, part one and parttwo. They are. PostgreSQL in depth training performance, part one. And the same thing.PostgreSQL in depth training performance, part two. And this was put on by Jeremy Schneider,database engineer at AWS. So the first video is about 44 minutes and the second one's aboutan hour now he had a t shirt on that said Oracle Nerd. So it looks like he has a lot of history withOracle and there were some issues when he was doing his talk or he needed to look up someinformation with regard to PostgreSQL. So it doesn't appear he's been someone who has spentyears and years in PostgreSQL, but maybe has come from an Oracle background, but it doesgive you some different insight into performance than maybe you've seen with some otherpresentations. So he covers issues with regard to monitoring and how important that is, talkingabout explicit locking because that's clearly an area you can get yourself in trouble. He talkedabout prepared statements and how those could be utilized. And then he spent a fair amount oftime talking about wait events, which is something that was added to believe started to be addedto like PG Stat activity in 96 and then further in ten and eleven additional ways to get insight intowhat queries are waiting for. And he advocated using tools to be able to identify or track what'sgoing on with queries over time, even at the second or five second interval, to be able to trackdifferent wait states on what individual queries are waiting on. Now this does have an emphasison RDS, their relational database service for Amazon, but it does have relevance forPostgreSQL in general. And he does talk about Performance Insights, which is a tool on RDS.But some of his thinking in terms of performance and how he goes --through identifying, okay, there's a production issue, how do you narrow down what the exactissue is from using these weight event insights and then drilling down using Explain Analyzeonce you've found the query and then diagnosing what the problem is. So if you want a little bitof a different perspective, these are two videos definitely to check out on optimizing yourPostgreSQL performance.
[00:05:27] The next post is postgres twelve. Highlight SQL JSON Path this is from the MichaelPacier Pacquiao XYZ blog and he talks about they've started to add compliance for the SQL2016 standards for SQL JSON feature set for JSON processing inside of Relational database.So this is kind of their first cut at supporting the SQL 2016 standards for doing it. And he has anexample here of where he created a table with a JSON B column. Now he does make referenceto an additional data type that's been added called JSON Path, although with his table here he'susing JSON B. So I don't know if this is eventually going to be a data type that you would assigna column or you just use JSON B fields. I'm not too clear on that yet. However, in his examplehe uses a JSON B field and he inserts like in role playing game Stat set here where characterhas a name, an ID, HP value, a level, his class as a warrior his equipment, he has rings, what isequipped on his right arm and his left arm. So it's just basically a JSON structure. And then youuse these different references to pull out data. So you say essentially the dollar sign for myinterpretation is essentially the column name level, which brings you the level value here andreturns ten equipment. Arm left goes, chooses the equipment and then it chooses Arm left and itreturns the shield of faith and you can return full array of rings. And so it's different ways to pullout JSON in a JSON B field. And here you use the query like this. You use this function JSON bunderscore path, underscore query to say pull out the name. So this is some very interestingnew features that are being added in version twelve. So if you're interested in that, definitely blogpost to check out.
[00:07:27] The next post is hypothetical indexes in PostgreSQL and this is from theProcona.com blog. So hypothetical indexes are essentially indexes that aren't there, but you canimagine they're there and see what the planner does. So essentially there's a PostgreSQLextension for hypothetical indexes called hypopg. So first they give you some instructions onhow to install hypopg and then you create this extension and you have access to these differentfunctions and it shows some of the different functions like you do hypopg create index and thisessentially creates an index just for your session. It's not an index that actually gets created, it'sjust a reference that it keeps track of to say within this session I'm in in the memory. Essentiallythis index exists and when you're doing query planning, like using Explain, it will actually takeinto account the index you've created and assess through statistics on how it would change thequery plan for you. So they have this section down here testing hypothetical indexes. So itcreated a table, inserted some values, they didn't explain, selecting particular values from thetable, and it did a sequential scan. But when they added the hypothetical index here, it uses thishypothetical index when he does an Explain plan.
[00:08:51] Now this is with the B tree index. They did try a bloom index and apparently worked,brin kind of worked, but had some unusual issues at times. But Jen just another index that theycouldn't get working. But since most people use Btree indexes, this could be pretty beneficial.So, definitely interesting blog post to check out. If you want to test this, particularly if you have alarge database, this could let you test a couple of different index variations to see what theplanner interprets as being the most efficient path it would take to execute queries.
[00:09:26] The next post is linear interpolation with PostgreSQL. So they're referencing aprevious post where someone was wanting to ignore nulls using window functions to fill in gapsin data. So for example, maybe you have data on three days and four days, no data and thensome data in the next set of two days. Essentially you have gaps and there are nulls there, howdo you say make them zero instead? So that's kind of what this post talks about. And this onetakes a step further. An --d here they have a climatological data set, so weather information and they don't necessarilywant to make a null value zero because they're talking about why are there missing values?Well, as they say here, quote devices may break, networks might be down, disks may be full. Somaybe the data is not zero or shouldn't be zero. Maybe it should be a value between what theknown existing values are. And so essentially they're using a technique of linear interpolation tobe able to pinpoint what the values are that are missing. And he goes over a whole set offunctions that he developed in pgSQL to be able to do this linear interpolation. So if you'reinterested in this as a particular technique, definitely a blog post to check out.
[00:10:42] The next post is Per project postgres. And this is from Jamiethesharps US and he'stalking about developing different projects and he has to use a particular database for a project.He worked a way to use a postgres database almost as he says, that's almost as convenient asSQL Lite. So basically he does a local install into a current working directory called Postgres. Soessentially as a postgres instance, per project he's working on, instead of having one version ofa database and all the different databases residing inside there, he has one postgres essentiallycluster per project within a local folder in that project. So I've never seen this, but it's definitely aninteresting way to go, particularly if you have projects that are on different versions. Like if one isusing 9.6 and one is using eleven, I could see the benefits of potentially doing this. And he alsomentioned a few other tools that help him manage it. One is dirm, as well as a package managercalled Nix. So if you have the need to do multi project databases, maybe this is a blog post you'llwant to check out.
[00:11:56] The next post is Physical Replication Mechanisms in PostgreSQL. This is from thePG IO blog and essentially they're talking about physical replication, which is otherwise knownas streaming replication. It's streaming the wall files from a primary to a Replica. You can also doa log shipping where you actually take the log files, copy them onto the Replica and then replaythem. That's another alternative. But they're talking about wall files, log shipping, recovery modeand basically tell you how to set up different standbys talking about replication slots. So it goesover how you can set up physical replication between a primary and one or more Replicas. Now,related to this, I did do a video tutorial on this called PostgreSQL Streaming Replication. So ifyou're interested in the video format, you can feel free to check out this piece of content as well.
[00:12:49] The next post is install PostgreSQL 9.6 with transparent data encryption. So this isfrom Cybertechresql.com and they actually developed a method of doing transparent dataencryption. So basically, as the data is written, it gets encrypted and it doesn't rely on something,the disks already being encrypted. That's one way you can do it, but this is a way where certainthings in the database on the files get encrypted in terms of tables, indexes, temp, files, wall.Things that are not encrypted are things in the memory, the clog and textual log. And this is apatch that they've developed for 9.6 and it tells you how to install it and some of the caveatsrelated to it. So if you're interested in doing transparent data encryption and you're using aversion like 9.6, definitely a blog post to check out.
[00:13:43] The last post is announcing Timescale Cloud, the first fully managed time seriesdatabase service that runs on AWS, GCP and Azure. So, as we've seen with some other opensource tools, like I'm thinking of Cytus that had their Citus Cloud that they bought on monthsago, where essentially they manage it for you. Here is a Timescale DB hosted time seriesdatabase solution and they talk about some of the benefits of doing it. But one thing I thoughtwas particularly interesting is that they talking about Amazon Timestream. And from themlooking at this from some of their calculations, they say AWS Timestream is ten to 100 timesmore expensive than Timescale Cloud. Now, of course, that seems like a very big discrepancy,but the interesting thing, they put these calculations in a public spreadsheet they listed below.But if you're interested in a time series database and you would like them to potentially host it foryou, this is a --potential solution for that purpose. So definitely a blog post to check out that does it. For thisepisode of Scaling Postgres, you can get links to all the content mentioned in the show notes.Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notenotifications of each episode. Or you could subscribe via YouTube or itunes. Thanks. --