Bulk Tagging in Rails

Recently, I started using Postgres’ native array type to tag records in an eCommerce application. It was huge upgrade to move to this solution instead of having multiple tables storing related Tag records.

I’m going to pass on the “Intro to tagging” type post here, because it’s already been done better than I can.

But, there is an area that I have not found covered; bulk tagging.

For the sake of our article let’s assume that you’re running an online store, with a Product model that has a tags attribute. Periodically, you want to mark products as being a “sale” item, or a “best seller.” How can we add tags to multiple records at once?

A ProductTagger

Let’s throw aside the UI component for now, and just focus on our code. For the remainder of this post, we’ll build a new ProductTagger model to bulk tag records. We can start with 3 basic requirements:

  1. The tagger should add the tag to all records.
  2. The tagger should not add the same tag more than once.
  3. The tagger should not set each record to having the same set of tags.

As I have in previous posts, let’s also start with some failing tests that lead us to where we want to go. We’ll include 3 tests for the basic requirements described above.


# test/unit/product_tagger_test.rb

class ProductTaggerTest < ActiveSupport::TestCase

  def setup
    @shirt  = Product.create
    @shorts = Product.create
    @tagger = ProductTagger.new @shirt.id, @shorts.id
  end

  def test_tags_added_to_all_records
    @tagger.tag 'sale'

    assert_equal %w(sale), Product.find(@shirt.id).tags
    assert_equal %w(sale), Product.find(@shorts.id).tags
  end

  def test_tags_are_not_duplicated
    @tagger.tag 'sale'
    @tagger.tag 'sale'

    assert_equal %w(sale), Product.find(@shirt.id).tags
    assert_equal %w(sale), Product.find(@shorts.id).tags
  end

  def test_tags_not_set_identical
    @shirt.update_attributes tags: %w(sale)
    @shorts.update_attributes tags: %w(hot)

    @tagger.tag 'top'

    assert_equal %w(sale top), @shirt.tags
    assert_equal %w(hot top), @shorts.tags
  end
end

We can now start to build the ProductTagger model to make these tests green.


# app/models/product_tagger.rb

class ProductTagger
  attr_reader :product_ids

  def initialize(*product_ids)
    @product_ids = *product_ids
  end

  def tag(tag)
    product_ids.each do |id|
      product = Product.find id
      product.tags << tags
      product.save
    end
  end
end

Easy enough, right? But, I can already hear the screams. “You’re doing 2n queries!” You’re right, we do a find query, and an update query; FOR EACH RECORD. But, I didn’t say we were done…

Rails’ update_all Method

Our first helpful method is update_all provided by Rails. The full source is available here. The method is called on a relation, and can be passed a SQL string, an array, or a hash. For example, we could use the following to set the tags method.


# app/models/product_tagger.rb

def tag(tag)
  Product.find(product_ids).update_all tags: [tag]
end

This decreases our number of queries to one, but we now fail 2 of our tests. This will set the tags attribute, but overwrites the existing value. This is about as far as we can go with just Ruby/Rails. There is no way that I can find to do the work without iterating over the collection and updating one-by-one.

The Power of the Database

We’re already using some of the power of Postgres in the array type, let’s use even more of its features to handle these updates without so many queries.

The most helpful resource I found when solving this problem were, surprise, surprise, the Postgres docs.

The first Postgres function that jumped out at me was, array_append. This funtion takes 2 arguments.

  1. An array (the current tags field).
  2. An element to append to the array.

So, using our newfound function, let’s just append the tags attributes of each record.


# app/models/product_tagger.rb

def tag(tag)
  Product.find(product_ids).update_all "tags = array_append(tags, #{tag})"
end

Simple enough right? We are now only failing one test, the test that says that tags should not be duplicated. We’re able to continually append elements to the tags array.

The next Postgres function that has been incredibly useful to learn is, unnest. This function takes one argument, an existing array, and returns SQL rows of the values of each element in the array.

We can now use one of the more common bits of SQL, SELECT DISTINCT. We can unnest an array into a set of rows, select only those records that are unique, and then cast them back into an array.

So, our final implementation is as follows:


# app/models/product_tagger.rb

def tag(tag)
  Product.find(product_ids).
    update_all "tags = ARRAY(SELECT DISTINCT UNNEST(array_append(tags, #{tag})))"
end

Conclusion

And that’s it!

We now have an object that can smartly tag multiple objects, all in one query.

We’ve been able to use 2 very powerful tools, Rails and Postgres, to perform a task. It was a great learning experience to dive through the docs, because I knew something should be possible, I just didn’t know how.