New Releases, AlloyDB, Time Zones, Connecting to Postgres | Scaling Postgres 215
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we discuss new Postgres releases, AlloyDB from Google, time zone management and how to connect to Postgres.
Content Discussed
- PostgreSQL 14.3, 13.7, 12.11, 11.16, and 10.21 Released!
- Introducing AlloyDB for PostgreSQL: Free yourself from expensive, legacy databases
- Introducing AlloyDB, a PostgreSQL-compatible cloud database service
- Time zone management in PostgreSQL
- PG Phriday: What You Should Know About Basic Postgres Authentication
- The Vectors of Database Encryption
- Demystifying Database Performance for Developers
- 5mins of Postgres E17: Demystifying Postgres for application developers: A mental model for tables and indexes
- Changing Tires at 100mph: A Guide to Zero Downtime Migrations
- Episode 511: Ant Wilson on Supabase (Postgres as a Service)
- Queries in PostgreSQL: 4. Index scan
- pg_ivm 1.0 released!
- Abhijit Menon-Sen
- Rubber Duck Dev Show Episode 42 | Symbols, Strings and Freezing
YouTube Video
Podcast Audio
Transcript
All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 14.3, 13.7, 12.11, 11.16, and 10.21 Released!". This is from postgresql.org and this release seems to be primarily being done due to a security issue that affects autovacuum, reindex, and others where these actions omit a security-restricted sandbox operation. So it says they quote "make incomplete efforts to operate safely when a privileged user is maintaining another user's objects". Now, along with this security patch, there are also a number of other improvements that are included with this release. And you can feel free to review this post to get more details about each one. Then also as a part of this notification, they included a reminder that PostgreSQL version 10 is end of life this November. So November 10, 2022. So if you're still on version 10, you'll want to plan your upgrade process ideally before then.
The next piece of content- "Introducing AlloyDB for PostgreSQL: Free yourself from expensive, legacy databases". This is from cloud.google.com, and they have released this Postgres-compatible database they're calling AlloyDB. So it looks to be similar to what Amazon has done with AuroraDB and what makes them different from other solutions is in PostgreSQL, their number one is superior performance and scale. So basically, they're saying with whatever metric they've used, that AlloyDB is 4x faster than standard Postgres and 2x faster than Amazon's comparable PostgreSQL-compatible service, which I think they're referring to AuroraDB.
So in addition to this article, there's also this companion YouTube video called "Introducing AlloyDB, a PostgreSQL-compatible cloud database service". That's about 20 minutes. And looking at that in this article, it looks to be that they're doing a lot of separation from the storage layer and the compute layer and dedicating resources for each in a unique combination, which seems to be some similarity with what AuroraDB was doing as well, where it was mostly the storage environment that changed the most from stock PostgreSQL. But in addition to greater performance improvements, they're suggesting they're also providing a pretty high SLA of 99.99. And they're saying it works great for real-time analysis too, in that it automatically takes data and puts it in a column format that allows it to do much faster aggregate queries. It says it uses machine learning to enforce based upon your queries that certain data needs to be column stored.
So basically, handling some of these hybrid transactional analytic workloads and then their machine learning, assisted management and insights. So they'll take care of handling patches, the backups scaling up and down if need be, and the replication. So this is another hosted offering and I guess it's not surprising that Google has come up with this because Amazon has had Aurora for a while, which is a PostgreSQL-compatible hosted database service. As well as Microsoft is probably their site as acquisition is kind of doing similar things, I suppose, but that seems to be a little bit more open-sourced than this solution. So even though they're talking about getting away from legacy database systems, this is a new system, but you kind of do have vendor lock-in if you choose this route. But if the presumably low maintenance and better performance are important to you, then maybe you choose to go this route.
The next piece of content- "TIME ZONE MANAGEMENT IN POSTGRESQL". This is from cybertec-postgresql.com and they're talking all about time zones. The first thing that they're talking about in this post is that the SQL standard doesn't have a lot of support for time zones, particularly daylight savings time. So they have the concept of zones and time zone displacement from UTC time. But that's basically the support. Again, there's nothing that helps handle things like daylight savings time. So then he gets into talking about the Postgres timestamps and there are basically two. There's a timestamp without a time zone, so you don't include a timestamp when you use it and then there's a timestamp with a time zone, and when you use it, you're generally including the time zone. Now, what's interesting here is that the timestamp with the time zone doesn't actually store the time zone.
How it's stored is the number of seconds since midnight of January 1, 2000, UTC time. So with time zone just basically means you use it with a time zone. So when you're inserting data into it, you include the time zone. Whereas no time zone is included when you're using a timestamp without a time zone. Now, he has a side discussion here that some people choose to store times in terms of the number of seconds from the Unix epoch. But he said you should really try to avoid that. I personally agree because when you go to look in the database, you're not going to be able to know what date it's talking about. It's harder to do as he says here, datetime arithmetic. So the support for intervals of adding or decrementing hours, months, and days from a particular time doesn't work anymore and it could potentially lead to bad performance for your queries.
Now with time zones, assuming that you're using the timestamp with a time zone, how does it actually calculate the time zones? Well, basically it stores that one-time zone in the database, but then it uses a parameter to translate based on what time zone you're looking for. So for example, he created a table here with a timestamp with a time zone column. He inserted the values and because it's within the time zone, he included the time zones here, +2 and -2. And how it determines what gets sent to the client is based on how the client session is set. So this particular one, he set it for UTC time. So now the dates and times he's going to get back are in UTC time. This could be made Eastern Standard Time, the Pacific time zone, or a time zone in Europe, and it will translate those internal dates to the proper time zone based on what you requested.
Now, how does it handle this translation? Primarily it's through a database maintained by IANA. So the database he says here is known as the Olson database after its founder. So it includes an area and a location and then an offset essentially from UTC time. Now, Postgres includes this database, but you can also use your operating system's time zone database instead. Actually, when I use Postgres on Ubuntu, the default installer for it does include the operating system's copy of the time zones. And then he shows an example here of where you can create a timestamp with a time zone and how it outputs it appropriately based upon the time zone that's set for the client or for the session. Now, he also mentions two caveats: you should probably avoid the time zone abbreviations. These are the three-letter or sometimes four-letter acronyms for the time zones.
He also suggests avoiding the POSIX-style time zones as well. And then he talks about conversion. So these are basically converting, say, a timestamp into a timestamp with a time zone, or potentially changing the time zone of a particular timestamp you have. And there are two different ways to do it. One with a cast with a time zone, but I tend to use the AT TIME ZONE clause when I'm doing these. I think it makes a little bit more sense to me. Now, the most important part of this post, is that when you're going to be using time zones in your application, he says you have two choices, and I wholeheartedly agree. One, "Use timestamp throughout". That means timestamp without time zones throughout. "Store UTC timestamps only in the database and let the application handle time zone conversions".
So basically you allow the user to set their time zone. You set it automatically through JavaScript, or maybe you have time zone settings per account for your application. You basically store UTC timestamps in the database without the time zone, and you convert those from UTC time into whatever the person's local time is based upon their settings. The second option is "Use timestamp with time zone throughout" your database and for each session you set the time zone correctly. Basically, this lets Postgres handle the time zone conversions as opposed to your application. This is a great post and really comprehensive discussion about the different time zone features within Postgres. So I definitely encourage you to check it out.
The next piece of content- "PG Phriday: What You Should Know About Basic Postgres Authentication". This is from enterprisedb.com, and this is an interesting post about what happens when someone installs Postgres, and then starts trying to use it if they've never used it before. So, for example, say you're on Debian or Ubuntu, and you do a "sudo app install postgresql-14", and you say, all right, I want to get started, so I'm going to type psql, and you get a connection error, and it says, "FATAL: role 'bones' does not exist". What is that? Well, Bones is presumably the user's name. And he's like, wait, my role doesn't exist? Okay, well, there's a command to create a user, so let me create user Bones. But of course, that command fails, giving you the same error message. So you do a little bit more digging, and you say, oh, I think I can do it if I assume the Postgres user, because the Postgres user has rights against the database, and I can create the user bones, then.
Well, that causes a problem, because the database Bones does not exist. So basically, it can get very frustrating for new users to figure out how to connect to the actual database. One scenario you can do is you can use the psql list, and just to get a list of the database that exists, you can see that Bones clearly doesn't exist, but you can see you can connect to Postgres. So you can do psql Postgres, and it will actually connect to you. Now they go into the process of actually creating a database for your app and then how to actually get it connected with a password to modify the configuration, to be able to listen to particular addresses as well as configure pg_hba.conf. So, as this post is clearly discussing, doing this for a new user is a pretty steep learning curve. And as a community, we should probably try to figure out how to make this a little bit easier. But this is a very interesting post, highlighting the issues that people experience when they're new to Postgres, trying to use it for the first time.
The next piece of content -"The Vectors of Database Encryption". This is from crunchydata.com, and they're talking about basically three different ways that you can encrypt your data as it relates to data being stored in Postgres. The first area is data-at-rest. So presumably, this is data that's residing on a disk not necessarily at rest in memory, where it's just being stored there temporarily. But data-at-rest is considered on the disk and you can use an operating system level or a disk level way to encrypt that such that the encryption key must be used in order to access data on it. So he discusses this first vector and the advantages and disadvantages of it.
The second vector is data-in-transit. So it's being transported presumably not within the database machine, like from disk to memory, but to another machine or to some other location. There, you're going to want to employ things like TLS and certificates to be able to securely encrypt the data-in-transit, when it's in transit between machines, such as even between your application server and your database server. The third vector is data-in-use. So this is data that is actively being accessed. Now, I would kind of prefer a slightly different name for this, but it's data that is actively moving around that's in, say, the memory of the database, it's still encrypted at that point. So what encrypts it and decrypts is actually a key residing on the application server or on a separate server that the application server has access to, to be able to dynamically encrypt and decrypt the data before the database even sees it.
So basically, when it's in memory and when it's moving to different locations within the database system, it's always encrypted because the database doesn't have the keys to it. It's the application server or a separate server that has the keys to be able to do the encryption and decryption operation. Then he closes off this post talking about backups and how you can handle data-at-rest, in-transit and in-use with regard to backups. And they're pretty much the same thing, ensuring that when you take the data off of the database server and put it in files or some location it's encrypted when it's in transit, be sure to encrypt that transmission. And when you're using data in use, essentially it's always encrypted within the database, so you don't need to do any additional action to protect it. But this is a pretty interesting perspective of the advantages and disadvantages to different levels of encryption with Postgres.
The next piece of content- "Demystifying Database Performance for Developers". This is from crunchydata.com and they're using the example of a library, comparing it to a database and discussing how data interactions lead to better performance. And they talk about a few different areas. One is indexes and how advantageous indexes can be in terms of speeding up access to data and can also be a burden when you need to change that data. They talk about the important concept of index cardinality. So this is basically, as they say here, quote "...the number of records returned per value". So for example, if you have a unique column with an index on it, that's a very high cardinality. Whereas if you have an index on a Boolean field, which you really shouldn't, but essentially there are only two values in there. So if you have a billion rows in a table with two values, that cardinality is going to be super low.
If you think you need something like an index on a Boolean field or some kind of status field that only has a handful of values, a better approach to take would be to use a partial index. So you could say where the value is true or where the value is false, and it'd be more efficient just to create, for example, in a Boolean case, two indexes with a partial for each case, as opposed to creating one index on it. That would be a much more efficient way to pull the data. They cover table scans and some of the disadvantages of that. Also when you're building queries, be cautious of using functions on the columns that you're pulling. For example, if you take a column that is in a table and you add an interval to it, it basically has to add all of that to every row before evaluating your now command. Now function indexes can potentially handle some of this issue, but it's much better to do your calculation from the now function minus three days from it and then compare it to the value in the database. Then an index can efficiently be used without having to worry about a functional index. And they also talked about the process of updating, deleting records and talking about scaling.
Now, if you want to learn more about this, this is actually the article that pganalyze.com did with their post "5mins of Postgres E14: Demystifying Postgres for application developers: A mental model for tables and indexes". So if you want to learn a little bit more about it from Lukas's perspective, you can definitely check out this content.
The next piece of content- "Queries in PostgreSQL: 4. Index scan". This is from postgrespro.com, and they go into a lot of in-depth details about how index scans work in Postgres. So if you want a more internal look and understanding of how index scans work, you can definitely check out this blog post.
Next piece of content- "pg_ivm 1.0 released!". This is from yugonagata-pgsql.blogspot.com. So this is the Postgres incremental view maintenance extension 1.0 is now released.
Next piece of content. The PostgreSQL person of the week is Abhijit Menon-Sen. If you want to learn more about Abhijit and his contributions to Postgres, you can definitely check out this article.
The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one was on "Symbols, Strings, and Freezing". So if you're interested in more developer-based content, we encourage you to check out our show.