DBMS of 2018, Data Anonymization, JIT, Reserved Keywords | Scaling Postgres 45
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering Postgres as DBMS of 2018, data anonymization, implementing JIT and reserved keywords.
Content Discussed
- PostgreSQL is the DBMS of the Year 2018
- 8 Anonymization Strategies with PostgreSQL
- An Overview of Just-in-Time Compilation (JIT) for PostgreSQL
- Keyword mess
- Fun with SQL: Self joins
- PostgreSQL Replication for Disaster Recovery
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about DBMS of the Year, dataanonymization, JIT and reserved keywords. I'm creston, Jameson. And this is scaling postgresepisode 45.
[00:00:17] One.
[00:00:21] Alright, we're still a little light on content, but we do have few more articles this week,although not too many of them are with regard to scaling. But the first article is PostgreSQL isthe DBMS of the Year 2018, and this is from the Dbengins.com site. And basically like last year,PostgreSQL has been chosen as the database management system of the year, with the runnerup being MongoDB, and in third place Redis, which are one's essentially NoSQL document storeand the other is a key value store. So another good year for PostgreSQL.
[00:01:00] The next article is eight anonymization strategies with PostgreSQL.
[00:01:06] Now he's talking about data anonymization, which basically means making your dataanonymous or basically hiding it from view. And there he's working on a project called thePostgreSQL anonymizer. Now, the purpose of this is generally for using some data indevelopment or CI or functional testing or analytics, where you're anonymizing the data, trying toobscure it or scramble it, but still have some use cases for it. Now he's talking about a basicpeople table here that has a name, address, age, salary, phone number, et cetera, and how togo about perhaps making this data more anonymous. Now actually what I found the mostbeneficial is this sampling here, because there's actually a PostgreSQL feature called TableSample, which enables you to get a percentage sampling of your table. So if you have aproduction database and want to get a minimal sample of it for some sort of testing purposes,you can use this feature, Table Sample. They're using a Bernoulli at 20% to get 20% of therecords from that. And he also mentioned an extension here called PG Sample, which alsomaintains referential integrity while taking these samples. So this is something you haven't reallyheard about, but that's definitely a couple of interesting tools. Now he goes into differenttechniques that he believes could be used. You could use suppression, basically convertingeverything to null, do random substitution, random, inserting random characters into the fieldsvariants, basically varying the data that's in the field by some sort of percentage amount, usingencryption techniques, shuffling the data around, faking and mocking partial suppression,basically various different techniques to try to obscure the data. And at the end here, he givessome suggestions on when to use these different types of techniques. Now, I haven't had toomuch of a call to anonymize data, I basically use fabricated data or entirely fabricated data fortesting purposes or things of that nature. But if you're interested in trying to anonymize the datayou have, this is definitely a blog post to check out on some different techniques you couldpotentially use.
[00:03:21] The next post is an overview of just in time compilation or JIT for PostgreSQL. Andthis is from the Several nines.com blog and basically this is covering JIT. That is the new featurein PostgreSQL Eleven. Talks a little bit about why they're using LLVM and when to potentiallyuse JIT, which is pretty important. So it's mostly used for analytical databases and to speed upexpressions that you're calculating. So if you have to do a lot of calculations in an analyticaldatabase, that's probably where JIT is going to give you the most benefit for an OLTP ortransactional workload. Maybe not so much in that first you have to make sure it's compiled withthat support built into it and then you actually have to enable it and potentially adjust the costsettings to be able to use JIT. So if you think this is a potential use case for you, definitely a blogpost to check out because they do go over how to set it up and the different results that they getfrom using it.
[00:04:26] The next blog post is Keyword Mess and this is from the CyberTech Postgresql.comBlog and basically these talk about reserved keywords. So it starts off, for example, here once Ireceived a mail with a question, can you tell me why I can't select the column references? Sothat is a reserved keyword and basically you have to potentially use double quotes around it. Imyself have encountered this where say, a column name is named from or to and you have toreserve keywords when doing SQL statements. But this goes into much more depth ofidentifying the different sources and how these reserve keywords are set up. So if that issomething of interest to you, definitely a blog post to check out.
[00:05:13] The next post is fun with SQL selfjoin. And this is from the Citusdata.com blog. Nowgenerally where you would use self joins is when you have some sort of relationship, like maybea manager to the person they're managing. In this example they're using a gift exchange, sosomebody needs to get a gift for someone else. And how you would do this query is you wouldjoin a table to itself and give it an alias for it when you're doing your query. So if you want tobrush up on what self joins are, definitely a blog post to check out.
[00:05:45] The last post is PostgreSQL replication for disaster recovery. And this is from theSeveral nines.com blog and it talks about how you could handle disaster recovery withPostgreSQL in terms of having a master database with a replica and how you can promote thatreplica in case something happens with the primary database. It also goes in discussion aboutpoint in time recovery and how you can set that up. So if this is something that's of interest toyou, it's definitely a blog post to check out. I also encourage you to check out ScalingPostgresql.com because we have some tutorials I've done that talk about doing a backup andpoint in time recovery for your system, doing streaming replication and also talking aboutreplication slots. So these are video tutorials, so if you're interested in that content, just checkout Scalingposgrows.com Tutorials.
[00:06:38] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode. Or you can subscribe via YouTube or itunes.Thanks. --