background

JSONB Types, Earth Distance, Dates, Times & Intervals, Authentication | Scaling Postgres 123

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss working with JSONB types, calculating earth distance, utilizing dates, times & intervals and Postgres authentication.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "JSONB: A Container of Types". This is from momjian.us and it's talking about using JSONB columns. Now, I found this interesting because he's talking about how types are used relative to Postgres types and JSON types. And in this example here, he just puts straight values into a JSONB column. So he puts a text string, an integer, and a boolean, and there's actually a function called jsonb_typeof that returns what the type is of that JSON string. So you can use this function to determine what the actual type is that is stored within that JSONB field. Now, of course, the actual type according to Postgres, is still JSONB for all of these different pieces of information that are being stored. 

Now, you can convert each of these values to text using this operator here, or you could also cast it. What's interesting is that even though you can cast it, also when you're storing data, you need to define it appropriately. So, for example, when you're inserting values, when you try to insert a text in a numeric combination, it actually gives you an error. You actually need to store it as text because that's not a numeric, for example. That's actually a series of a text string, whereas this can be interpreted as a numeric type. So he corrected it by changing it to actually an exponential. But, if you want to actually store this value, you could convert it to text by using double quotes. 

Then below here, you can see how he's casting it to a numeric using the double colon. Now, he's just been using values, but he says you can also apply the same concept to documents as well. So full documents, you can do the same thing and get the type JSONB type to tell you whether it's a string or a number. So I found this pretty interesting because there are times I've had to store an unknown set of data. Typically, I've just used text strings to do it and had to do the type conversion at the application layer. But having this ability using the jsonb_typeof function, I might explore different alternatives when I need to implement a new solution to this. So if you're interested in this content, definitely check out this blog post.

The next piece of content is "Recreating YikYak with Postgres". This is from adamfallon.com. Basically, he's talking about calculating distances using Postgres. YikYak was a social network that would show people posting information within a certain distance around you. So he wanted to implement this using Postgres. He set up Postgres, set up a database table for Postgres. In it, he used floats to store a latitude and a longitude. Now, he does say you can do it as a point, so that's another data type he could have used. But he did it using two floats. He inserted two posts into the table, storing different latitudes and longitudes. Then from a third location, he posted another one. 

Now, to do this, he installed two extensions. So basically, here's post one. Here's post two. And he's saying, let's imagine that we are at this location here. Now, to calculate this, he installed a cube extension and an Earth distance extension. So this is separate from PostGIS. You can do the same sort of technique using PostGIS. I've actually seen references that PostGIS is potentially more accurate, in some of the functions they have. But this is a quick and easy way to do it, using these two extensions. To find the nearby posts, you use this query right here. So he's selecting from posts, he's putting in latitude and longitude of the 'we are here' area. 

For a certain distance in meters around this point, it uses a function called ll_to_earth. So latitude and longitude to Earth coordinates and then uses an Earth box as kind of the bounding area and then uses the contains operator to look through the posts, latitude, and longitude, again converting it using this ll_to_earth function. And with that, he gets the two posts that we were expecting. So it's a very quick and easy way to see locations that are close by to a certain area. And then as a bonus in the conclusion, he says, of course, you can add a GiST index here with a function against the table to get much better performance. So if you're interested in doing calculations involving Earth distance, definitely check out this blog post.

The next piece of content is "Representing Dates, Times and Intervals in PostgreSQL". This is from pgdash.io. Now, the first section here goes over a, I would say, pretty opinionated perspective of what types you should be using and how you should be using them. Although they don't give a lot of justification as to why, but just basically say do this, don't do this, et cetera. Now, the main types they advocate you using are dates. So dates do not contain a time. Timestampz is the timestamp that includes a time zone. So that's pretty much going to be most of your time tracking data type. Then an interval. So a duration, for example. They give an example here, one month, three days, for example. So these are generally the data types that you're going to want to use, working with dates and times. 

Now, actually, depending upon your application framework, I've also seen the case of using the timestamp without the time zone, although a lot of Postgres users advocate not using that. Some application frameworks have time zone capabilities built in and therefore the database just stores everything in UTC time and the conversion happens in the application. But of course, it's up to you to determine how best you would like to do that implementation. Then it goes through and shows you different ways you can use these different data types. So for example, you can convert what is now to a date and add a certain number of days, or determine how many days from now a certain day is. 

How many days till Christmas, the ten longest courses in a table, and then they talk about the timestamp type and how that can be used. Also specifically talking about the differences between a transaction_timestamp, statement_timestamp, and the clock_timestamp. So the transaction one indicates the timestamp at the start of the transaction multiplied by the statement_timestamp is the timestamp at the start of the statement that you're in. Then clock_timestamp gives you the actual system clock. And lastly, it covers interval types and how you can work with those. Then they follow up looking at how you can extract different date time components from timestamps, as well as doing time zone conversions for working with dates and times. So if you're interested in a post to learn more about dates, times, and intervals, definitely check out this post.

The next piece of content is "How Secure is your Postgres? Part 2: Authentication". This is from the enterprisedb.com blog. In this section, they're talking about authentication and primarily that's done through the pg_hba.conf file. So it has different entries within that file that when a client connection is attempted, this file is reviewed and the first matching entry is what's used for authentication purposes. So for example, this is using a local connection, connecting to a particular database for a particular user. Then they're going to use SCRAM authentication. In this next example, they're using host-based access to connect via a particular IP address to a specific database for a specific user from a specific client IP. 

And they're going to be using MD5 authentication. Now of course, what you are generally going to use Postgres is hostssl, so requiring SSL connections to the database server or the hostgssenc, which is a GSS API encrypted connection. So generally those are the two you're going to want to use. You can, of course, also use local connections if you're connecting to the database for administrative tasks. Then the next section here, they cover the different authentication types. So Trust is there's no protection at all if you're using Host and the IP matches, you can get in, and there's no password or anything required. They talk about Peer and Ident authentication. Peer is local connections that match up users. So if the user in the database matches the user name in the system, it grants you access. 

Ident is the same concept on a network level. So that is definitely not as secure, so you want to probably pretty much avoid that. These are the password authentication methods MD5  and SCRAM. As of right now, everyone's advocating moving towards SCRAM which is supported in the most recent versions of Postgres. You could also choose to do an external authentication system using Kerberos or LDAP, as well as use certificates. It describes how you can set up certificates to be able to authenticate connections between clients and servers. And lastly, they follow up by talking about the authentication_timeout you may want to set in your postgresql.conf file, as well as a contrib module you could add off delay that minimizes how quickly someone can try password attempts against your PostgreSQL database server. So if you're interested in learning more about authentication and PostgreSQL, definitely check out this post.

The next piece of content is "How to SCRAM in Postgres with pgBouncer". This is from crunchydata.com, and it explains mostly how SCRAM works, but it actually doesn't show you how to set it up. So it doesn't give the commands and the steps to run through but explains how it works, how PgBouncer kind of handles it, and the advantages of using scrum over previous authentication methods. So if you want to learn more about how scram works with PgBouncer, definitely check out this post.

The next piece of content is "What are PostgreSQL templates?" And this is from supabase.io, and they're talking about Postgres database templates. Now by default, when you create a cluster, you're going to see a template0 database and a template1 database. So whenever you create a new database, it uses the contents of template1 to make a copy of that to create your new database. So you can make changes to template1. And then every database you create from that point forward will have any changes you've made to, say, objects you've added to that database. Now template0 is basically a backup in case some big problem happens to template1. So basically never make changes to this, but you can use it to create a new template1 if you need to. 

Actually, they advocate not making changes to template one, but what they advocate doing is creating a custom template database. What you would do is create a new database, set it up the way you want with all the objects added or changes you want to be able to make it a template, and then execute this command, ALTER DATABASE, with the new template name WITH template TRUE. Then at that point when you create a new database to choose your new template, you say CREATE DATABASE with the new name template and then your template name and it will create that new database using your custom template. So if you want to learn more about templates in PostgreSQL, definitely check out this blog post.

The next piece of content is "PGWATCH2 V1.8.0 RELEASED". This is from cybertec-postgresql.com, and this is a monitoring tool for Postgres, and it has a new release. The highlights for it are support for Pgpool II, PostgreSQL 13, and TimescaleDB metric storage. So if you're interested in pgwatch2 or looking for a monitoring solution, definitely check out this post.

The next piece of content is "Barman Cloud - Part 2: Cloud Backup". This is from 2ndquadrant.com.  Now, we covered the first part of this post in the previous episode of Scaling Postgres talking about the Barman WAL archive and it was basically a way to send the WAL archive to a cloud destination provider such as Amazon's S3. Well, this shows you how to actually do the cloud backup part. So if you're interested in that and use Barman, check out this post.

The last piece of content, the PostgreSQL person of the week is Stacey Haysler. So if you're interested in learning more about Stacey and her contributions to PostgreSQL, definitely check out this blog post.

episode_image