background

Non-Relational Data, Contributers, Security Features, High Availability | Scaling Postgres 98

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss storing non-relational data, recognizing contributors, Postgres security features and implementing high availability.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "8 Ways RDBMS Postgres Handles Non-Relational Data". This is from the enterprisedb.com blog. Now, this is actually a brief synopsis of a presentation, and there's actually a link to the presentation slides, and I actually like those a little bit better. These are on momjian.us, his site, so you can see in more detail kind of what was covered. This presentation just talks about how Postgres stores non-relational data. So something apart from your typical string of text or integer or boolean, things of that nature. covers these areas here, arrays, range types, geometry, XML, JSON, JSONB, and row types, which are pretty much custom types from my interpretation and character strings. So with arrays, you can store more than one value within a column or within a field, which kind of goes against some of the normal forms of a relational database. 

But Postgres allows you to do this, and he goes through different ways. You can work with that array in terms of accessing particular entities of the array, unrolling, creating them, aggregating them, and things of that nature. The second one is talking about range types, so you have something as a start and an end, and you can have non-bounded as well. He tells you how you can query it to look, to say, does this exist in a particular range? And also mentions that you would want to use a GiST index for using a range type. That's the most efficient index to use. And he goes over exclusion constraints on how you could see if something is going to be double booked. For example, you can add an exclusion strain to be able to handle something like that. In the third area, he talks about geometry and how if you're using PostGIS, you can store geometric shapes within Postgres and how best to index them, and again use the GiST index for that purpose. 

Then he talks about XML and the XPath query on how you can work with XML that you're storing in the database, and conversion of different values from XML. Then he goes into JSON, and this is a JSON data type that's different from JSONB. It basically stores the whole JSON structure as it's represented and doesn't make a lot of changes to it. And he mentions that there's over 100 different JSON functions you can use to work with JSON. He goes over different ways of loading it, presenting it, accessing it, concatenating it, and things of that nature as well as calculations. Then it goes into JSONB. Think of it as binary JSON. So it supports indexing of all keys and values for efficient retrieval of information. It's stored in a compressed format, but it doesn't preserve the original JSON like it doesn't preserve the key order, doesn't preserve white space, and retains only the last duplicate key. 

So this is something definitely to keep in mind depending upon how you're going to be using the JSON. And he goes over and shows the different examples here. Of course, when you're working with JSONB, generally you would want to use a gin index for search purposes. Then he talks about row types and essentially these are like creating your own custom types. So he created a custom type driver's license that is actually composed of three different subtypes as you were, a character integer and a date. Then it goes over character strings and how to work with character strings, how to query them, how to handle case situations, and how best to do indexes, and then follows up with a talk on the full-text search and using tsqueries and tsvectors in order to handle full-text search capabilities. So if you're interested in this content, definitely it's a great presentation to check out.

The next post is "Contributors Team, Redux". This is from rhaas.blogspot.com. This is a follow-up to a previous blog post that we mentioned on Scaling Postgres. But this is talking about the difficulty of how to address contributors to Postgres. Not just Postgres, but also libraries or tools associated with Postgres. And then how about their contribution meaning differentiating between a contributor or major contributor and what happens if they're a major contributor but then they're not contributing that much today? And just all of the difficulty of giving essentially credit to people who are really helping Postgres make it the way it is today. He is also asking for feedback on how to make this better. So this is basically just an essay on the struggles of trying to recognize people appropriately and how best to handle it. So go ahead and give this a read if you have any thoughts you contribute to a comment or reach out to him if you so choose. But definitely, since Postgres is a community-built and supported database system, definitely a blog post to check out.

The next post is actually a series of three posts, parts one, two, and three called "Understanding Security Features in PostgreSQL". This is from the highgo.ca blog and he's basically covering different security areas of Postgres, mostly with regard to network security and some of the roles. So, for example, this part one here covers the listening address. Postgres listens on the host-based access in terms of how you're granting people to actually connect over the network to the database system or even locally being able to connect to the database system. How you can integrate it with an LDAP server or Pam, and then go into roles. Users and privileges and how to set those up to grant access to the system. Now the second part, which is this blog post here. 

Part two is about essentially SSL or TLS and how it works or public key cryptography. And this doesn't have a lot of directly relevant postgres. It's all about setting up certificates in public key encryption. But this is a good thing to read before reviewing part three, which is definitely more PostgreSQL-focused because it talks about how you can prepare your TLS certificates, set up a transport layer security on the server on the client, giving examples of how you can connect, and it even talks a little bit about transparent data encryption. This is where data is encrypted at rest and essentially Postgres can't handle that. Yet there is a plan to add some features in version 13, and additional features in version 14 to have it ready by that time. But he talks about what it is and what it does. So if you're interested in learning more about the security aspects of Postgres or you're looking in particular areas you want to increase your knowledge on, definitely a set of three posts, you may want to check out.

The next post "How to Automate PostgreSQL 12 Replication and Failover with repmgr - Part". This is from 2ndquadrant.com. So the repmgr is a replication manager that was set up by 2ndQuadrant. This blog post goes into a lot of detail about how to set up a High Availability system using it specifically with Postgres 12. They're talking about setting up three different nodes and three different looks like AWS Availability Zones. So one will be a primary and then two standbys or two replicas that will follow the primary, and then there will be an election process that repmgr uses to be able to determine if one of the standbys or replicas should take over the job of the primary. So it goes into a lot of depth on how to set this up and how to configure all the different features to get started. So if you're interested in High Availability and want to potentially use repmgr, definitely check out this blog post.

In a related post, High Availability will be relying on Kubernetes and Crunchy Data's PostgreSQL Operator is another way of achieving the same means. So this blog post is "Deploy High-Availability PostgreSQL Clusters on Kubernetes by Example". This is from crunchydata.com. So they're using the same concept of three different nodes. There's one primary and two replicas, but it's being done on Kubernetes and they have a load balancer that kind of handles these systems and how the operator coordinates setting this up so that you have an election to make sure there's only one primary and the other replicas follow it. And they're using the Distributed Consensus Store of Kubernetes to be able to handle this. So again, it's another way of getting high availability. So if you want to look at it from a Kubernetes angle, definitely check out this blog post.

The next blog post is "jsonb_set_lax". This is from 2ndquadrant.com and they're talking about the jsonb_set function. And they say they've had some complaints that if the value argument of the function is NULL, the result will be NULL. Now, the reason this is happening is because this function is declared as strict, which basically will turn NULL if the input is NULL. However, some people wanted some slightly different behavior, so they introduced this function, jsonb_set_lax, which is scheduled for version 13 of Postgres. Then you can add an additional parameter that allows you to either use the default, which is to use JSON NULL, or to raise_exception, return_target, or delete_key. So if you use this function and want this additional functionality in version 13, definitely keep an eye out for it.

The next blog post also covers something coming in version 13 of Postgres is "Waiting for PostgreSQL 13 - Allow vacuum command to process indexes in parallel". This is from depesz.com and basically for the index vacuuming portion, you can assign multiple job workers to be able to vacuum indexes in parallel. And you can either set it explicitly or rely on max_parallel_workers to determine how many autovacuum workers will work on it. Now, he has built out an example here of a test table that he developed along with seven indexes, disabled autovacuum, did some delete of the data and then went ahead and ran vacuum and did timing tests. 

So when he was doing no parallel vacuum, it ran in about 25 minutes. With two workers it ran in 11 minutes, so more than twice as fast. And with eight workers, it ran a little bit slower in about 12 minutes. So not quite sure, it was a little why it was a little bit longer here and he doesn't know why. I mean, this is something in development, so it's essentially at a beta stage, so maybe it will be improved. But there was definitely a big win going from zero parallel workers to two parallel workers. So definitely a feature to look out for in the upcoming Postgres 13.

The last post is "PGWATCH2 V1.7.0 RELEASED". This is from cybertec-postrgresql.com. So this is a means of monitoring your PostgreSQL installation. So if you use this or want to consider using it, they've recently upgraded the version, adding a log parsing, a recommendations engine, real-time stats tracking, and some new dashboards. So if you're interested in learning more about it, definitely check out this blog post and their tool.

episode_image