Fast PostgreSQL Data Loading Using Ruby
Join Over 1,000 Engineers & Get New Episodes Weekly!
Learn how to quickly load data into PostgreSQL with various techniques using Ruby. We discuss the best ways to load terabytes of data.
Load One Row Per Insert Statement with Parameters
Load data one row at a time using parameterized queries of the pg gem.
#!/usr/bin/env ruby
require 'pg'
require 'faker'
require 'date'
require 'csv'
def time_rand rt, from = 0.0, to = Time.now
Time.at(from + rt.rand * (to.to_f - from.to_f))
end
conn = PG.connect(
dbname: "test",
port: 5432,
user: "creston",
password: ENV["POSTGRES_DEV_PASSWORD"]
)
dir = "/tmp/"
counts = {}
starting_at = Time.now
puts "#{starting_at}: Starting"
table_name = "posts"
counts[table_name] = 100000
i = 1
rt = Random.new
puts "#{Time.now}: Starting #{table_name}"
while i <= counts[table_name]
conn.exec_params(
"insert into posts (id, title, content, published_at, type)
values ($1, $2, $3, $4, $5);",
[
i,
Faker::Book.title,
Faker::Lorem.paragraphs(3).join(","),
time_rand(rt, Time.local(2010, 1, 1)),
Faker::Book.genre
])
i += 1
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
puts "#{Time.now}: Finished #{table_name}"
Load One Row Per Insert Statement without Parameterization
Load one row of data at a time without using parameterized queries. Only show the changes from the script above.
conn.exec "
insert into posts (id, title, content, published_at, type)
values (#{i}, '#{Faker::Book.title.gsub("'","''")}', '#{Faker::Lorem.paragraphs(3).join(",")}', '#{time_rand(rt, Time.local(2010, 1, 1))}', '#{Faker::Book.genre}');"
Load Multiple Rows Per Insert Statement
Load ten rows of data at a time with a nested loop. Only show the changes from the original script above. In the video, we tested up to 500 rows of data per insert statement.
i = 0
rt = Random.new
puts "#{Time.now}: Starting #{table_name}"
while i <= counts[table_name]
sql = "insert into posts (id, title, content, published_at, type) values"
for j in i..(i+9)
sql += "(#{j}, '#{Faker::Book.title.gsub("'","''")}', '#{Faker::Lorem.paragraphs(3).join(",")}', '#{time_rand(rt, Time.local(2010, 1, 1))}', '#{Faker::Book.genre}'),"
end
conn.exec sql.chomp(",")
i += 10
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
Generate CSV and COPY into PostgreSQL
Instead of loading data directly, generate a CSV file using Ruby. Only show the changes from the original script above. Then, load that CSV file using PostgreSQL's copy command.
CSV.open(dir + "#{table_name}.csv", "wb") do |csv|
while i <= counts[table_name]
csv << [i, Faker::Book.title, Faker::Lorem.paragraphs(3).join(","), time_rand(rt, Time.local(2010, 1, 1)), Faker::Book.genre]
i += 1
puts "#{Time.now}: #{i} :: #{i/(Time.now.to_i - starting_at.to_i)}" if i % 1000 == 0
end
end
copy posts (id, title, content, published_at, type)
from '/tmp/posts.csv'
CSV DELIMITER ',';