Per-Table Autovacuum, FDW Synchronization, Distinct On | Scaling Postgres 33
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we review articles covering per-table autovacuum, FDW for data synchronization, distinct on and Postgres Open.
Content Discussed
- Per-Table Autovacuum Tuning
- Foreign Data Wrappers for Data Synchronization
- SELECT DISTINCT ON in PostgreSQL
- Postgres Open SV 2018 YouTube Channel
- New in Postgres 11: Monitoring JIT performance, Auto Prewarm & Stored Procedures
- Webinar : New Features in PostgreSQL 11 [Follow Up]
- Scaling PostgreSQL using Connection Poolers and Load Balancers for an Enterprise Grade environment
- Moving Tablespaces
- 5x Faster Spatial Join with this One Weird Trick
- PostGIS Code Sprint 2018 #1
- PostGIS Code Sprint 2018 #2
YouTube Video
Podcast Audio
Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about per table auto vacuum, foreigndata wrapper, data synchronization, distinct on and postgres open. I'm creston, Jameson. Andthis is scaling postgres episode 33.
[00:00:18] Point.
[00:00:22] All right, our first article is per table auto vacuum tuning. And this is from the Keith'sRamblings blog at keithf four. And basically this blog post covers auto vacuum configuration andhe has some suggested changes for how he likes to set up auto vacuum, some different scalefactor and threshold sizes he likes to adopt for the whole database system. But this post alsogoes into what you can do per table because as your tables get extraordinarily large, you maywant to consider setting auto vacuum settings per table and potentially getting rid of the scalefactor and only going with a threshold. So for example, he talks about using PGSTAT all tables inorder to look at the statistics with regard to when a table's been vacuum and analyzed. And thenfor a particular table, he shows you how you can update per Table auto vacuum settings for it.So as your database gets larger and even more specifically certain tables, this is a great post tocheck out to see if you want to potentially make changes on a per table basis to your autovacuum settings.
[00:01:28] The next post is foreign data wrappers for data synchronization. Now, normally whenyou think of synchronization, or at least when I think about it, you think of you're setting up areplica so using physical replication to replicate your database to a readonly copy or potentiallyeven doing logical replication where you're defining certain tables you want to replicate from onedatabase system to another. However, this post considers using data wrappers in conjunctionwith materialized views to do it. So he goes over installing the foreign data wrapper for thispurpose, which is basically creating the extension postgres foreign data wrapper, setting up orcreating the remote server and mapping a particular user to be able to access that foreign serverand then create the foreign tables so that you can reference them. And he shows the commandhere to use, for example, import foreign schema and then you can limit it to particular tables. Sothat pretty much sets up your foreign data wrapper. And then he went ahead and set up amaterialized view for these foreign tables and he says, quote, whenever the data needs to befresh, just refresh materialized view in the view name. Or you could also choose to do thisconcurrently if you'd like. So this is a little bit different way of setting up synchronization, kind oflike on demand. So you may have a use case for this.
[00:02:48] The next post is select distinct on in PostgreSQL and this is from theGeekytidbits.com blog. Now, this is a query that you will see a lot, or at least I've used it a lot.Basically you want to get the most recent record from a table and get more than just the datefrom it. So in a sub query or in a join table, you pull out an ID or the data that you're working forand the max timestamp, and you pull all that data for the most recent timestamp. Essentially.However, this is a perfect use case for distinct on. So instead of having to do code like this usingJoin or sub queries, you can simply do select distinct on. You're grouping the data by what youspecify in the on clause. The order of the statement places in the proper order with the mostrecent timestamp first, and then after the select distinct on, you specify the fields or the columnsyou want for each row returned. So essentially you do the same thing, but with a much cleanersyntax. So this is a very quick and easy post, and if you're not using select distinct on or familiarwith it, definitely a post to check out.
[00:03:56] The next piece of content is actually a YouTube channel and it's the Postgres OpenSV 2018 channel, and I mention it because it has five new videos that were posted this week.Now unfortunately, I haven't had time to go through all these videos, but they cover topics likeDebugging, the Postgres Planner, a Scalable version, document store in PostgreSQL, theEvolution of Postgres, High Availability Dude, Where's My Byte? And Bloat in PostgreSQL. Ataxonomy. So if you're interested in learning about any of these topics, definitely a piece ofcontent and YouTube channel to check out.
[00:04:31] The next post is new in postgres eleven, monitoring JIT performance, auto --prewarm and stored procedures. And this is from the Pganalyze.com blog. So last week I wentover a lot of content talking about the new features in postgres Eleven. So some of this is arepeat, but actually some of his JIT measurements were pretty interesting. But he covers the JITEndtime compilation in postgres Eleven, the auto prewarm feature that's coming as well as storeprocedures in postgres Eleven. So if you're interested in learning more about these specificfeatures, definitely a blog post to check out.
[00:05:08] The next post is webinar. New features in PostgreSQL eleven. Follow up. And this isfrom the second quadrant.com blog. And this is about an hour and five minute presentation thatgoes over all the features in PostgreSQL Eleven. Now, he did mention they're projecting therelease date for PostgreSQL Eleven coming around October 18, which is about two weeks away,so that's definitely some interesting news. So if you prefer a video format defining out about thenew features in PostgreSQL Eleven, this is definitely a webinar to check out. You do have toregister for it, but it's on demand so you immediately get access to it. Although I didn't noticeanything that wasn't in the other articles that mentioned the new features coming in the previousepisode of Scaling Postgres. So if you've already read over these features, this presentationmay be a bit redundant, but if you prefer a video format, he definitely goes over a presentationthat shows all the new features that are coming in PostgreSQL eleven.
[00:06:08] The next post is scaling PostgreSQL using connection poolers and load balancersbalancers for an enterprise grade environment. And this is from the Procona.com blog. Sothey've had a number of presentations that they've covered for what they called enterpriseenvironments in terms of handling security, high availability, backups and now this one is onscaling with connection poolers and load balancers. So these posts are mostly a summary andin terms of connection pooling, they're basically talking about setting up Pgbouncer and in termsof scaling across multiple servers, they're talking about using HAProxy for that purpose. Andthey go into the configuration not in detail but some of the main points about how to configurePG bouncer on your application server, to talk through HAProxy, to Direct, some statements toReplicas and some statements to your primary database. So it's relatively brief posts, but ifconnection pooling and high availability are of interest to you, this is definitely a blog post tocheck out.
[00:07:12] The next post is Moving Table Spaces and this is from the Momgen US blog andbasically he goes over the process of how you could potentially move a table space from onearea to another and he has a very quick reference for it right here. First record the OID of thetable space you want to move. Shut down the postgres cluster, move the table space directlyeither within the same file system or to a different file system. Update the OID symbolic link thatrepresents the move table space to the new table space directory location and then restart theserver. So basically you shut down the cluster, move the data over and then move the pointer sothat postgres can find where that new location is. So if you've ever had the need to do this, thisis definitely a piece of content to check out.
[00:08:01] The last set of postgres are all about postgres and all from the Paul Ramsey blog atCleverelifent CA. The first post is five times faster spatial. Join with this one weird trick andbasically he did it by not compressing the data using uncompressed data. And in his conclusionhe says for a 50% storage penalty we achieved a 500% performance improvement. Now I'm notthat familiar with postgres, but if you use it, this might be a blog post you want to check out. Thenext two posts also from the Paul Ramsay blog are talking about the PostGIS Code Sprints.First one is the code Sprint number one and Code Sprint number two. So again if you're usingPostGIS, definitely two pieces of content to check out.
[00:08:48] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --