background

Postgres 13 Beta 2, System Stats Extension, Tuned, Join Strategy | Scaling Postgres 120

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss Postgres 13 Beta 2, a system stats extension, tuning your operating systems for Postgres and different join strategies.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 13 Beta 2 Released!". This is from postgresql.org.  So this is leading up to the release of Postgres 13 in the fall, and it mentions some of the changes since Beta 1 predominantly bug fixes, but a couple of different changes. So if you're interested, go ahead and check that out.

Now, to go along with this, there has been a recent YouTube video posted on the San Francisco Bay Area PostgreSQL Users Group YouTube channel called "A First Look at PostgreSQL Version 13". This talk was given by Christophe Pettus, and it's a good summarization of all the new features in Postgres 13, particularly the ones that are of interest to him. Now, the video is about 30 minutes in length, but it's really the first 20-22 minutes of presentation and then questions at the end. So it's a pretty short overview that gives you all the different changes coming. So if you're interested, check out this YouTube video.

The next piece of content is "system_stats extension for PostgreSQL". This is from pgsnake.blogspot.com, which is Dave's Postgres blog. And it's actually an extension for Postgres that actually looks at various operating systems and hardware statistics. Basically, they've given you SQL access via these new system tables to be able to query the state of the hardware in the state of the OS system. So they present operating system info, CPU info, usage statistics for CPU, memory info, as well as IO and disk info. System load averages, process info, network info, and memory by process info. So this is a great addition. I'm definitely going to be checking out this system extension to see if I could potentially use that to help with monitoring. So if you're interested, check out this extension as well as his blog post about it.

The next piece of content is "tuned, PG, and you". This is from hunleyd.gitub.io, and he's talking about tuning to your OS that PostgreSQL is running on. He advocates using this tool called Tuned which helps with dynamically making changes to your system. I believe it's written and maintained by Red Hat. He advocates using Tuned over something like making changes or editing sysctl.com instead. He goes through the process of taking some existing tuning configurations and adjusting them to come up with a PostgreSQL tune configuration. So this is not only beneficial to see if you want to use Tune, but to look at his OS configuration changes that he does for some of the systems. Of course, he said he uses benchmarks to also determine what these changes should be. So definitely something to take into account. But if you're interested, definitely check out this blog post.

The next piece of content is "JOIN STRATEGIES AND PERFORMANCE IN POSTGRESQL". This is from cybertec-postgresql.com. Now, this post goes over joins in the three different join types that are typically used with a B-tree index. So they talk about the nested loop join strategy, the hash join strategy, and the merge join strategy, and they talk about how each of these is used. The algorithm used, the different types of indexes that can help in certain situations, and whether it's a good strategy. So for example, the nested loop join is a good strategy to use if the outer table is small. A hash join using a hash table is advantageous if it can fit into work_mem. Then a merge join can be useful if both of the tables are large. But they do mention that if it's not equality but something different than equality, then pretty much things just drop back to a nested loop join. 

Now what one thing they mention here is a way to be sure you get the best strategy is to make some of the changes they suggest here. Number one, of course, is having your statistics up to date. So either analyze frequently or make sure your default statistics target is set appropriately at the database level and at the table level if you need to. Also, you could potentially look to increase your work_mem if that makes sense for you to get the cheaper hash joins. Also, communicating to PostgreSQL about your hardware and resources, tuning things like your random_page_costs, effective_cache_size, and effective_io_concurrency. So that way the planner may make better decisions on which types of plan to use. He also mentions that quote “You can speed up nested loop and merge joins with index-only scans”. So adding more columns to an index may be beneficial. And with versions 11 and above, you can just include columns onto that index without actually having it in the index. So if you want to learn more about these strategies and how they could potentially help your performance, definitely check out this blog post.

The next piece of content is actually a PDF and it's talking about JSON unification with regard to Postgres. So what they're talking about is that around 2014, JSONB was introduced in Postgres which attracted some NoSQL users. You can tell the rapid rise of Postgres here in terms of popularity from DB-engines and they're arguing that as a result of this, JSON has been added to the actual SQL standard. But the issue is that we have a JSON data type and a JSONB data type, but the standard is going for one common standard. So this is talking about unifying the JSON and the JSONB and how we could potentially do that. Now, this is a substantial piece of content and it's over 89 slides, but it goes into the history of JSON. The different ways you can query it and also how to approach this unification process. So it's very dense in terms of content and material, but if you're interested in learning more about the JSON, the JSON being potentially this unification, that will have to be done with Postgres to be in alignment with the SQL standard, definitely check out this piece of content.

Next piece of content is "POSTGRESQL: LTREE VS. WITH RECURSIVE". This is from cybertec-postgresql.com. They're talking about they had done a previous post talking about ltree and hierarchical data and querying it. They said well what about performance and what about WITH RECURSIVE? So this blog post actually looks at doing something similar in order to optimize to get really fast queries. He basically does the same thing still using ltree but using a materialized view that can be refreshed and therefore you can index that to get better performance. After some of these changes, he got relatively low execution times less than a millisecond for doing some of these queries. So if you have hierarchical data and you potentially want to look into ltree or WITH RECURSIVE and get better performance by leveraging materialized views, maybe you want to check out this blog post.

The next piece of content is "Advanced Active Record: Using Subqueries in Rails". This is from pganalyze.com. Now with Rails, they're talking about Ruby on Rails which is an application framework that uses Ruby, and Active Record is the Orm that Ruby on Rails uses. So it's an object-relational mapper. So as such, it has its own syntax but subqueries aren't the easiest thing to figure out in it. But this post goes over some different techniques that you can use to do it. So for example, this is an example where you have a subquery in a WHERE clause you can actually have a query that you can then insert into the SQL. Again, this is the active record Ruby syntax. So this is equivalent to doing something like this query here with a subquery as a part of the WHERE clause. He goes over different ways you can do things such as where not exist subqueries select subqueries as well as from subqueries and having subqueries. So if you have a need to use subqueries in your Ruby on Rails project, maybe you want to check this blog post to see how best to do that.

The next post is "Composite Primary Keys, PostgreSQL and Django". This is from crunchydata.com and they were doing a database design where they ideally wanted to have a composite primary key but when trying to set it up using Django's ORM they had some difficulties and basically, it was a lot of difficulties but they were able to work through and get something working to their satisfaction. So if you are interested in learning how they were able to rectify the situation with the Django RM, definitely check out this piece of content.

The next piece of content is "Types of Indexes in PostgreSQL". This is from highgo.ca. It goes over all the different indexes of which there are many in PostgreSQL, starting with of course the B-tree index. Talking about hash, GiST, SP-GiST, gin, BRIN, multicolumn indexes, unique indexes, expression indexes, partial indexes, and index-only scans as well as covering indexes. So if you want to learn all about indexes on PostgreSQL, definitely check out this blog post.

The next two pieces of content are in relation to Oracle and Migrating to PostgreSQL. The first one is "Oracle to PostgreSQL: Reasons to Migrate". This is from 2ndquadrant.com. It covers some of those here such as Foreign Data Wrappers versus Federation, all the different languages available to write procedures and functions, and compared to PL, SQL, and Oracle as well as others. So if you're interested in this type of content, check out this blog post.

The second one is "Setup ora2pg for Oracle to Postgres Migration". This is from crunchydata.com. So this is an open source utility called ora2pg that helps you convert your data from Oracle to Postgres. So if you're interested in doing that, check out this blog post.

The next piece of content, the PostgreSQL person of the week is Simon Riggs. So if you're interested in learning more about Simon and his contributions to PostgreSQL, definitely check out this blog post.

The next piece of content is "How to Monitor PostgreSQL 12 Performance with OmniDB - Part 1". This is from 2ndquadrant.com. They're talking about using their monitoring tool OmniDB to monitor Postgres. So if you want to learn more about OmniDB and what it does in terms of monitoring, check out this blog post.

Next piece of content is "Authenticating pgpool II with LDAP". This is from highgo.ca and LDAP authentication is a new feature to my understanding with a new version of Pgpool II. So this shows how to get that working with the newer versions of PostgreSQL and Pgpool II.

The last piece of content is "How to use Neural Network Machine Learning model with 2UDA - PostgreSQL and Orange (Part 7)". This is from 2ndquadrant.com. So Part 7 in a post talking about network machine learning with PostgreSQL. So if you're interested in that, check out this post.

episode_image