background

Tips & Tricks, Faster APIs, Full Text Search, pgbouncer_fdw | Scaling Postgres 97

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss tips & tricks, how to get faster APIs, full text search considerations and pgbouncer_fdw.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "TIPS AND TRICKS TO KICK-START POSTGRES YEAR 2020". This is from cybertec-postgresql.com and it's basically just a list of tips and tricks, and some of these I wasn't aware of, so it was great to be able to get this content. The first tip is you can do dynamic SQL loops with psql's gxec command. So basically if you have the need to query the database and then do something from that query, you want to select a set of tables and vacuum them all, you can use the gxec command. It basically runs through the return value of that query and then executes that command. So basically you're building an SQL statement in what you're returning and you can use gxec to actually execute that statement. So this makes it easy to execute statements essentially over a loop and they make a note here to be sure to know that this is not an SQL command, but a psql command. 

The next tip is that you can use session variables, so you can set, under a particular scope, a variable to enter a value. And they say that the data types aren't enforced, so you could set it to an integer or a text value and then you can SHOW for that variable to actually print out what it is. So you can use the current setting to get that setting and use it in an SQL query. And they say something might change in version 13 with something called schema variables, but as of right now, this is something that works in current versions. The next tip is that in addition to pg_sleep which sleeps for a certain number of seconds, you specify you can actually use pg_sleep_for, which indicates essentially a certain interval. So like five minutes, two days, or whatever, you put in a more human-friendly version. Or you could do pg_sleep_until a specific time, so they're using Tomorrow at 03:00 a.m., or you could do a specific date and time. 

So again, these are functions I was not aware of. The next tip is that you can set planner constants per function. So for example, for a particular function, you can alter that function and then specify if you want to enable the sequential scan to off. Now normally people use SQL statements, so they don't have a lot of functions, but this would be a way that you could change how a query is potentially executed, or you can alter the amount of work memory it has or set a particular statement timeout so I could see some use cases where this could be useful. The next tip is duplicating databases for testing purposes. So probably you mostly, as they say, here, do pg_dumps to backup and then restore elsewhere. But if you want to do it within the same cluster, you can use the CREATE DATABASE command and do it from a template. The thing to watch out for is that it actually locks up the source database while it's doing this. 

So it's something definitely you need to be aware of and no connection should be active. But this is another alternative for duplicating a database that should be much faster than pg_dump. The next tip is Debian-specific and I believe the next two might also be for Ubuntu where you can define cluster and initialization defaults. So there actually is a cluster configuration at this location: etc/postgresql-common/createcluster.conf, and it defines all sorts of configuration functions for setting up clusters. So this could be really useful to be able to alter this if you want to change how clusters get created. The next tip is that there are sample configuration files located in /usr/share/postgresql/$PGVER location for Debian systems and /usr/pgsql-$PGVER/share/ for Red Hat-based systems. So if you've altered your configuration files and you want what the original one was, you can consult these directories. 

The next set of tips is with regard to randomly sampling rows and they're using the TABLESAMPLE SQL command. So you can use the TABLESAMPLE SYSTEM and 0.1 for a 0.1% sample of rows. It's pretty fast. Or if you want more randomness, albeit slower, you can use a Bernoulli as they demonstrate here, and they give a few other examples of how you can use TABLESAMPLE. And last but not least, the tip is you can save characters when doing a NOTNULL, and instead of doing it NOTNULL, you can actually just do NOTNULL. So again, yet another tip I was not familiar with. So I'd say this is definitely a blog post you're going to want to keep in reference to be able to use some of these tips and tricks.

The next post is "Faster Ruby APIs with Postgres". This is from goiabada.blog. Now this is talking about Rails in Ruby. However, this is applicable to what the technique they're using is applicable for any application framework. What they're basically saying is that for API purposes, that is, returning JSON, you can do a SELECT of the database, pull all that data into Ruby, then build objects with it as the ORM for active record would do, and then format that output as JSON. An alternative is to do a SQL query directly to Postgres and then have it return JSON to you. So they do this for the Ruby on Rails case where they're using the json_build_object function to be able to build out the JSON that they want along with the JSON aggregate function. It essentially returns the data that you would expect and it essentially can return the same data that you could using Ruby on Rails or any other application framework, and it does it much more efficiently. 

So they have some graphs down here that show it essentially ran 10x faster using this technique compared to going through essentially the ORM and having Ruby build the JSON. This essentially holds true when looking at single tables or a join table or even increasing the number of elements. It's still about ten times faster. Now, other frameworks that may be faster than Ruby probably won't be ten times as fast, but still, it's going to be relatively significant. Now this may involve more complex SQL queries, but if you have an API that requires high performance, then querying Postgres directly, returning JSON from it, and then just passing it on through the framework should be the most efficient option. So if you're interested in using this technique, definitely check out this blog post.

The next post is "Two edge cases in PostgreSQL full-text search". This is from jetrockets.pro.  So they're using full-text search and they have a tsvector column, this TSV that they've built along with their table. So if they have a title and they want to query it, they go ahead and build a two tsvector value here. They do it using a trigger function. And they were using the English catalog. As you can see, it reduces things like painting to paint, roofing to roof, heating and cooling to cool and heat. And that's fine. And you can query like Roof and it will find the record. So whatever you're searching for, you do it to a tsquery. 

However, it won't find roofing. Now, their solution to this was to actually use additional catalogs, and in this case, they could use a simple catalog. So what they did was they changed their function that actually concatenates the results of two different catalogs, the English catalog, and the simple catalog. And those conversions converted painting into paint and painting roofing into roof and roofing. And now when you do a query, you get the one record expected. So if you're not getting necessarily what you expect with full-text search, maybe consult using a different pg_catalog to see if that can give you the results you're looking for.

The next post is "How to Simplify pgBouncer Monitoring with pgbouncer_fdw". This is from crunchydata.com. This is from Crunchydata.com. FDW stands for Foreign Data Wrapper.  So this is a new tool, essentially a Foreign Data Wrapper that allows you to query and monitor PgBouncer because by default PgBouncer, you get details on its running by actually connecting to it with a specific administrator or operator user, and then you SHOW commands. So for example, here it's showing the different pools to get the output. What they wanted to do is actually be able to do SQL queries to query the different metrics of PgBouncer to be able to easily monitor it. 

So they set up essentially a Foreign Data Wrapper. They're using the dblink_fdw and they go through the process of doing it. Then they're adding a FOREIGN SERVER, doing all the granting for the selects. And now you could do something like SELECT ALL from pgbouncer_pools and it returns all the same data. So basically you're querying your main database and it goes and talks to the PgBouncer to get its metrics for monitoring purposes. They also mentioned that they've incorporated this into their pgMonitor open-source monitoring utility. So this seems like a really great addition. I really like this and if you are interested in learning more, definitely check out this blog post.

The next post is "Effectively Using Materialized Views in Ruby on Rails". This is from pganalyze.com. Now again, this is a general subject about Views and Materialized Views. It is specific to Ruby on Rails, but you can learn more about Views and Materialized Views if you want to consult this post. Basically, they cover what is a view, essentially a virtualized table and it helps minimize complexity. So if you have a very complex query, you can put it into a view and it will do that query in real time. Alternatively, you can use Materialized Views which actually creates an underlying table of all of this data. It has materialized this virtual view, so essentially it's getting a one-time snapshot of it. Now as a consequence, you have to refresh that snapshot periodically. So essentially it's like a cache. So it's like a cache for your query. So they do need to be refreshed periodically. But if you want to learn more about Views and Materialized Views and how you can use them specifically with a Ruby on Rails or a Ruby application framework, definitely a blog post to check out.

The next post is actually a YouTube video and it's "Why developers love Postgres". Now, this presentation was presented at Microsoft Ignite that happened recently and this presentation is about 38 minutes in length and it covers for about the first 15 to 20 some of the reasons that developers are loving Postgres. The latter half talks about how an organization switched over from Oracle to Postgres and the benefits that they garnered from it. So if you're interested in viewing this presentation, go ahead and check it out.

The next post is "Webinar: BDR - Latest Features & Updates [FollowUp]". This is from 2ndquadrant.com and this is a webinar that you can register for. There's a link right here to learn more about their bi-directional replication product. I haven't had the opportunity to look at it, but if you are interested in learning more about bi-directional replication, definitely check out this webinar.

The last post is "Waiting for PostgreSQL 13 - ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION". This is from depesz.com. They're mentioning here that Postgres 12's generated columns feature allows you to do, for example, calculations based on other columns. It's a generated column. There's actually no way to remove it once you add it to a table. So this new command actually enables you to do that. So if you want to learn more about it, definitely check out this blog post.

episode_image