background

Tablespaces, Streaming Replication, More Postgres 14, Security | Scaling Postgres 159

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss when to use tablespaces, setting up streaming replication, features coming in Postgres 14 and implementing security.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "WHEN TO USE TABLESPACES IN POSTGRESQL". This is from cybertec-postgresql.com. They're talking about when you should use tablespaces, and they compared it to Oracle, where Oracle basically has its own file system it uses to store the data, whereas Postgres relies on whatever operating system it's using to store its data. They have a terminology in Oracle versus Postgres chart. Here, they say an Oracle table space is similar to an operating system, or file system, where an Oracle data file is equivalent to a logical physical volume in a file system. A segment in Oracle is equivalent to the data files that are part of a table and an extent in Oracle is equivalent to a segment or a data file in Postgres. Now, I don't necessarily 100% agree with this chart because it doesn't list where the tablespace exists. 

So the tablespace is a logical representation of where you're storing the data that is connected to a physical location. So, for example, you logically create a tablespace, but you specify a physical location for it. So when you create a new table, you specify that tablespace which is actually stored in that location. So Postgres normally just uses a data directory and stores everything in there. You can direct, say, logs to a different location using SIM links or other methods like that. You can determine to a certain extent some temporary space, but mainly everything is in the data directory and tablespaces allow you to store things outside of that data directory. When you create objects, you need to specify the tablespace. Now, they do say that when you specify a table to go in a particular tablespace, when you create indexes, you need to specify that tablespace as well. 

It doesn't automatically go with the table, but if you put a whole database on a particular tablespace, it will stay within that tablespace. Now, the blog talks a little bit about backups and how you need to take tablespaces into account and then ask the question, when should you create a tablespace? Now, here are some reasons and they've been specified. A lot of these reasons don't necessarily apply anymore. They applied more in a time when we were using a lot of physical drives, a lot of magnetic storage using SSDs, and these virtual file systems in a cloud environment. A lot of these reasons don't really exist anymore. I've only used tablespaces when we're running out of space where the primary data directory is located, and there wasn't an easy way to expand it. So what we typically did is bring up a new tablespace and then create large indexes on that tablespace and then delete them on the primary tablespace or the default data directory to free up some space to buy us time to address it. 

But it is a bit of a hassle doing backups when you have table spaces, so I definitely wouldn't recommend it. That's what this post advocates as well, is that you probably shouldn't be reaching to use tablespaces a lot and they make a quote here. "Since almost everybody uses virtualization these days, tablespaces are becoming an increasingly irrelevant PostgreSQL feature". So when they mentioned that in Oracle, it's more of a consideration, with tablespaces in Postgres, it's really only with certain use cases that they iterate here. Mainly, the only way I've used them is if you're running out of space on the main data directory and cannot easily move or expand that data directory. But if you want to learn more, definitely check out this blog post.

The next piece of content is "Setting up Streaming Replication in PostgreSQL 13 and Streaming Replication Internals". This is from migops.com and they're talking about basically physical or streaming replication and they talk a little bit about the internals, basically, the WAL files that are being generated. You stream that to a standby server to be able to create a replica of the primary database and then they go into all the different commands that are used to get this set up and to get two systems up and running and replicating data from the primary to a replica. So if you're interested in that, you can check out this blog post.

The next piece of content- "Waiting for PostgreSQL 14- Add 'pg_database_owner' default role". This is from depesz.com. So this is a new default role that's been added and the intent is to add it to things like template 1. Template 1 is a database that all databases are created from. So it's a template for you to create new ones. If you apply this to template 1 and in his example here, he created a security definer function to be able to query the pg_stat_activity table and then applied it to template one so that when a new database is created with a particular owner they can run this function, get_running_queries, against the database they've created because they are the owner. So it's a way to give more permissions to users at a more granular level. So if you're interested in learning more about this new feature, you can check it. 

Now there are two other posts related to the new PostgreSQL 14 features. The next one is "Add date_bin function". So this is similar to the date_trunc function where you can take a date and truncate it down to a particular resolution. So for example, there are microseconds, milliseconds, days, months, decades, et cetera. Well, now you can bend it to a particular time and then offset it. So it's not always truncated to a particular time, so it's bent at an off-time interval. So if you're interested in that, maybe you want to check out this blog post.

The third post is "Add unistr function". So this allows you to decode a string with Unicode escape sequences. So if you're interested in that, you can check out this one.

The next piece of content is "Is Postgres secure?". This is from blog.crunchydata.com and the answer to the question, of course, as they say here, quote "Short answer is YES". But it's all based on your implementation and how you set up Postgres. They have a number of resources that they've linked to here to give you guidance on how to set up Postgres in a secure fashion. The first thing they cover is the Common Criteria for Information Technology Security Standard, and that there is a protection profile for database management systems that you can apply for deployment of Postgres. They also talk about work done with the Center for Internet Security and the United States Defense Information Security Agency to develop more guidelines for deployment, configuration, and administration. They include that here, along with the Security Technical Implementation Guides. They also talk about Postgres SQL injections and encryption requirements. So if you're interested in making a more secure Postgres installation, definitely check out this blog post.

The next piece of content is, again, more webinars that have been posted to the EDB YouTube channel. So if you're interested in video content, you can check that here.

The next piece of content is "Preventing FreeBSD to kill PostgreSQL (aka OOM Killer prevention)". This is from fluca1978.github.io. FreeBSD is different from Linux and we have covered articles talking about the out-of-memory killer in Linux and there's specific documentation on Postgres about how to do that. But this covers FreeBSD and a number of different functions like Protect. It has to be able to do that and configure it. So if you run Postgres on FreeBSD, maybe you want to check out this blog post.

The next piece of content is "Logging of Pgpool-II on Kubernetes". This is from b-peng.blog.spot.com. So if you run Pgpool II and you want to do it on Kubernetes and are interested in logging, definitely check out this blog post.

The next piece of content- "Dumping a ByteA with psql". This is from cleverelephant.ca and he's talking about using binary data in Postgres, specifically the byte array to store binary data. And he was using that to do some debugging and creating a pipeline within Postgres to be able to take Raster images that are stored in binary in Postgres but be able to examine them. So if you're interested in learning more about the pipeline he generated here in terms of working with byte array data, definitely check out this blog post.

The last piece of content, the PostgreSQL person of the week is Jan Karremans. So if you're interested in learning more about Jan and his contributions to Postgres, definitely check out this blog post.

episode_image