code.eklund.io

space age technobabble for an electromatic boogaloo

Modeling Adjacency Lists in Ruby

graphs, lists, ruby, trees | Comments

Normally, this would be a task I would try to keep in my data layer since PostgreSQL is so great at doing this with recursive queries. However, sometimes you just need to operate in Ruby and Ruby alone. Let’s say I want to model what I always call an adjacency list but what is more technically a a directed graph. We have a set of nodes where any given node can have a parent node:

    1
   / \
  2   3
     / \
    4   5

We will define two classes: Tree and Node. A Node object is a very simple:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Node
  attr_accessor :id, :name, :parent, :children

  def initialize(opts = {})
    @id = opts[:id]
    @name = opts[:name]
    @parent = opts[:parent]

    # add children if passed an array or initialize as an empty array
    @children = opts[:children].is_a?(Array) ? opts[:children] : []

  end
end

We have accessors for the id which will be a unique identifier and will give us the ability to easily retrieve nodes from the tree. The name is just an arbitrary string. Parent is just another Node object. Children is an array of Node objects that all have the current node as a parent. The nodes above would could be modeled as:

1
2
3
4
5
6
7
n1 = Node.new id: 1, name: 'one'
n2 = Node.new id: 2, parent: n1, name: 'two'
n3 = Node.new id: 3, parent: n1, name: 'three'
n4 = Node.new id: 4, parent: n3, name: 'four'
n5 = Node.new id: 5, parent: n3, name: 'five'
n1.children = [n2, n3]
n3.children = [n4, n5]

Adding children like this is tedious and we also don’t have good way to traverse the tree. So let’s introduce a tree class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Tree
  attr_accessor :nodes

  def initialize
    @nodes = {}
  end

  def node(id)
    @nodes[id]
  end

  # helper for hash-like access
  def [](id)
    node(id)
  end

  def add_node(node)
    raise "#{node.class} != Node" if node.class != Node
    if !node.parent.nil?
      # check if the parent is part of the tree
      existing_parent = self.node(node.parent.id)
      if existing_parent.nil?
        raise "Unable to add node #{node.id} to the tree. It specifies a parent #{node.parent.id} That has not been added to the tree"
      end
      node.parent = existing_parent
      existing_parent.children.push(node.id).uniq
    end
    @nodes[node.id] = node
  end
end

This class gives us a single accessor for nodes which is just a hash of Node objects. We can retrieve a node via a method or with hash-like syntax. The add_node method will add a node to the tree and it will also update the children on a parent node. Now we can do fun stuff like this:

1
2
3
4
5
6
7
nodes = [n1, n2, n3, n4, n5]
tree = Tree.new
nodes.each {|n| tree.add_node n}

tree.node(4).name         #=> 'four'
tree.node(4).parent.name  #=> 'three'
tree.node(1).children.map(&:id)     #=> [2,3]

Now lets add some functionality to traverse the tree to get ancestors as well as the path of a node. These methods are part of the Tree class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def ancestors(node, ancestor_list = [])
  # traverse up the tree to get all parents
  # return the list if the parent of the current node is already in the list
  if node.parent && !ancestor_list.include?(node.parent)
    ancestor_list.push node.parent
    self.ancestors(node.parent, ancestor_list)
  else
    ancestor_list.empty? ? nil : ancestor_list.reverse
  end
end

def path(node)
  if node.parent.nil?
    path = node.id.to_s
  else
    path = ancestors(node).push(node).map{ |n| n.id.to_s) }.join ':'
  end
end

Ancestors will return an array of node objects ordered from the top of the graph down to the parent of the given node. The path method uses the ancestor array to generate a human readable path to the node. Given the same tree above:

1
2
tree.ancestors(n4).map(&:name) #=> ["one", "three"]
tree.path(n4) #=> '1:3:4'

The path method could be used, for example, to generate URL paths from an XML taxonomy tree or other data structure.

Complete code gist.

Managing Rewrites for a Rails App on Heroku With Nginx + Phusion Passenger

heroku, nginx, passenger, rails, rewrites | Comments

The custom CMS that we built for kaboom.org has to manage a lot of rewrite rules because the information architecture keeps getting rejiggered. We were managing them with Rack::Rewrite but I always hated having a huge stack of rewrites in the application.rb configuration. Since the app is hosted Heroku there weren’t really any other options until I realized that Phusion Passenger will now run in standalone mode on Heroku.

It’s quite easy to get Passenger running on Heroku: https://github.com/phusion/passenger-ruby-heroku-demo

replace the app server in your Gemfile (unicorn, thin, puma, etc) with

gem 'passenger'

Update your Procfile

web: bundle exec passenger start -p $PORT --max-pool-size 3

This works great but doesn’t come with an obvious was to edit the nginx.conf to add rewrite rules or an include directive. According to the advanced configuration section of the Passenger standalone manual, we see that there is an option to pass a configuration template.

After adding Passenger to your Gemfile and running a bundle install tun this from you application root directory to create a configuration template.

cp $(passenger-config about resourcesdir)/templates/standalone/config.erb config/nginx.conf.erb

You could just add rewrites to this file but because the default configuration template may change from time to time I would recommend adding any custom configuration to another file. Just after the server { line add

include '<%= app[:root] %>/config/nginx_custom_config.conf';

Create this file in the config directory of your app and add all of your rewrite rules and custom directives to this file.

If you are using this to perform rewrites and you are running on Heroku, this needs to be included

<% if app[:environment] != 'development' %>
  port_in_redirect off;
<% end %>

Because Nginx is listening on the random port assigned by the Heroku routing layer Nginx will, by default, include this port number in the rewrite rules that do HTTP redirects. The conditional makes sure that when running on localhost:5000 redirects still work as intended in your development environment.

Other use cases include enforcing SSL on your app, adding HTTP basic auth to an app, and easier generation of CORS headers. So far I’m loving Phusion Passenger standalone on Heroku. In a future post I plan to examine the performance and memory usage versus the existing Unicorn setup.

Autoscaling Resque Workers on Heroku With the Platform-api Gem

heroku, rails, resque | Comments

Heroku is a great service and is extremely powerful. It’s also not cheap so leaving workers idling can cost you a lot of money. If you have background workers that do variable amounts of work it makes sense to scale up the number of workers as new jobs are queued and then scale back down as the queue clears.

Here is how one might scale workers using resque hooks and the Heroku platform-api gem. This example scales up a worker for every three pending jobs. It will scale down once the queue is empty and all the workers are finished. It will also scale down when the number of idling dynos is greater than 5. Because it is not possible to control which dynos are killed off, this assumes that your resque workers are handling TERM signals (and hopefully the jobs are idempotent). It is likely that the downscale could kill off working dynos so the jobs so be sure to handle the Resque::TermException for jobs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# config/initializers/heroku-platform-api.rb

$heroku = PlatformAPI.connect_oauth(ENV['HEROKU_OAUTH_TOKEN'])


# lib/scaled_job.rb

require 'platform-api'

module ScaledJob
  def after_enqueue_upscale(*args)
    worker_count = $heroku.formation.info('app-name','worker')["quantity"]
    job_count = Resque.info[:pending].to_i
    # one worker for every 3 jobs (minimum of 1)
    new_worker_count = ((job_count / 3) + 1).to_i
    if new_worker_count > worker_count
      $heroku.formation.update('app-name', 'worker', {"quantity" => new_worker_count})
    end
  end

  def after_perform_downscale
    worker_count = $heroku.formation.info('app-name','worker')["quantity"]
    working = Resque.info[:working].to_i
    if Resque.info[:pending].to_i == 0 && working <= 1
        $heroku.formation.update('app-name', 'worker', {"quantity" => 1})
    elsif Resque.info[:pending].to_i <= 1 && ((working + 5) < worker_count)
        new_worker_count = working + 5
        $heroku.formation.update('app-name', 'worker', {"quantity" => new_worker_count})
    end
  end
end


# model class
class Foo
  extend ScaledJob

  def self.perform(*args)
    # work it!
  end
end

Careful With Those Inflectors: A Tale of Performance Tuning Rails

newrelic, performance, profiling, rails | Comments

We run our website at kaboom.org on a custom-built Rails 4-based CMS and for a long time I was unhappy with performance of the site. Not that it wasn’t responsive and a huge (and I mean HUGE) improvement over the previous drupal mess, but requests were spending more time than seemed necessary just in the ruby layer. Most requests were spending almost 500ms there according to NewRelic traces.

I installed some tools locally to help track down where and why it was spending so much time each request. The gem rack-mini-profiler got me started and it is a great way to get data on each and every request. This helped me track down some code that was dynamically creating store_accessors for every request. There is an easy way to check if a given stored_attribute exists on a class:

1
Post.stored_attributes[:options].include?("some_attribute")

Implementing this check and only adding the store_accessors when needed provided a small performance increase but each request was still spending way too much time in ruby. I spent some time trying to get usable data from flamegraphs and also experimented with adding ‘step’ blocks in my code to help track down where things were slow. The problem I found with rack-mini-profiler was that the rails stack is so deep it makes it very hard to see exacty where the problems are. I still have it installed on a local branch that and I find it very useful in some cases. But for this issue, it left me digging through a convoluted flame graph and I decided to move on.

It was at this point I took a few weeks off from profiling and also took a vacation. Several weeks later I was preparing for a bug day and was getting error traces from New Relic to add to Trello that I revisited the performance issues. We are on Heroku so I decided to switch our free New Relic plan to the plan the provides the paid-level features for a limited number of dynos. What this gave me was access to my new favorite profiling tool: the New Relic thread profiler. I ran the tool for a 5 minute trace and this was the result:

I had to uncollapse a few parts of the tree but it’s pretty clear that the request was spending over 56% of the time in the name method on the ContentField model. Which, as it turns out, was a method that was no longer even needed. The offending, extraneous code:

1
2
3
4
def name
  return super if super.nil? || super.blank?
  super.downcase.parameterize.underscore
end

I think this was added in the initial phases of building our CMS to ensure conformity with seeded names alongside imported field names from Drupal. The drupal importer I wrote handled all the transformations and there are validations so enure that field names are created properly. This method does nothing at all except run downcase, parameterize, and underscore on strings that already conform to that format. And, as it turns out, inflectors don’t come cheap. I removed the method, ran my tests, and deployed a new version of the CMS. The performance increase was noticable immediately:

It’s not everyday that removing 4 lines of code (well, it was really just one) can give you such a boost in performance. Lessons learned:

  1. Careful with those inflectors! They can end up being performance bottlenecks.
  2. Just because it was cached in memcache doesn’t mean the object initialization doesn’t incur the same cost.
  3. NewRelic thread profiling is pretty awesome
  4. rack-mini-profiler combined with flame graphs is an incredible tool that I need to spend more time learning how to use properly.
  5. Careful with adding dynamic stored attributes with an after_initialize. They get added as class methods so only need to be added once.

Helpful links:

Safely Ignoring Heroku Errors on Rake Db:migrate

heroku, postgresql, rails | Comments

(Repurposing my answer to this stackoverflow question as a blog post)

If you are using Rails 4.x on Heroku and you have your schema format set to sql (which you probably should), you will see this error when running heroku run rake db:migrate

1
2
3
4
Aborted!
Error dumping database
.../activerecord-4.0.4/lib/active_record/tasks/postgresql_database_tasks.rb:55:in `structure_dump'
.../activerecord-4.0.4/lib/active_record/tasks/database_tasks.rb:143:in `structure_dump'

This is an interesting bug that, as it turns out, can be ignored.

ActiveRecord will only run the task db:structure:dump if you are using ‘sql’ as your active_record.schema_format. And that rake task (db:structure:dump) will fail on Heroku because the executable pg_dump is (unsurprisingly) not in the binary path. Interestingly enough, Heroku states that db:schema:dump is not supported but if you set the schema format to ruby, it works fine. Of course, you would end up with a schema.rb file on one-off dyno that will not exist shortly after the rake task finished running.

In Rails 3, the dump task would only raise an error is the exit code of the command was 1. On unix based systems if a command is not found, the exit code is 127. So even if the pg_dump command fails on Rails 3 (which it does), it won’t raise an error and it won’t halt the rake task. So anyone using a sql schema format with Rails 3 wouldn’t have this issue because it would fail silently. The code was refactored in Rails 4 to properly raise an error if the dump failed. This means that db:migrate will raise an error on Heroku.

However, even though it errors with rake aborted the DDL from the migration is actually executed and committed.

Here are some possible solutions (if the error really bothers you):

  • Ignore the error as the migrations are actually running.
  • Since you don’t care about the structure dump on production, set the schema_format to ruby. In config/environments/production.rb:
1
config.active_record.schema_format = :ruby
  • If for some reason you don’t want to change the config file: add a rake task with the following to suppress the error:
1
2
3
if Rails.env == 'production'
    Rake::Task["db:structure:dump"].clear
end

First Post

misc | Comments

Hello? Is this thing on?

I was testing out Octopress and decided it was time to finally start a blog. No, this won’t be the realization of my imaginary blog about the art of the movie trailer. This will be about my experiences with Ruby, Rails, PostgreSQL, Redis, etc. I hope that by documenting what I have learned that I can help others along they way. Plus, as I age and my memory fails me I can just google for my old posts.