skwpspace yan pritzker's home on the web

skwpspace is Yan Pritzker's home on the web

Blog :: Photography :: About Me

TwitterCounter for @skwp

Get the news feed
Get updates by email
Follow me on twitter

hello, i'm yan

This blog is about startups, blogging, Ruby On Rails, virtualization and cloud computing, photography, customer service, marketing, ux and design, git, and lots more.

Top Posts

planypus

I'm the founder of Planypus, the place to share your plans!

cohesiveft

Accessible, manageable, virtualized application stacks ready to download or deploy to the cloud!

flickr

it's hailing in san franciscojessatianalyndaBird in the handkerimodeldanielle pettee and models-4449

Archives

Contact

Reach me at yan at pritzker.ws

Posted
11 June 2008 @ 3am

Tagged
background, code, rails, ruby, thoughts, threads

Long running Threads in Rails and metaprogramming fun

Disclaimer: This post contains evil (but highly fun!) code. Proceed at your own peril…

I was recently designing an application that needed to execute some long running requests against an external host. If you’ve ever tried doing something like this in Rails, you’ll find your mongrels will block up waiting for the request to complete, bringing the experience for all other users to a halt.

I wanted to dispatch my long running request, return to the user, and then poll for results using AJAX. There are many ways to do background tasks in Rails, most of which require running an out of process background server with which you will communicate over some sort of queue or memcached. There’s BackgroundRb, Bj, workling, and so on, but this seemed overkill for my problem.

After reading a post on using Ruby Threads, I decided to be brave and try this approach. I implemented a simple action which would spawn a thread and proceed to return the result whether it was ready or not. This action is polled via AJAX and on the next poll the result will be correct. The pseudocode looks something like

def long_running_action
  #spawn a thread
  precache_the_results

  # This action throws DataNotAvailableException
  # if file is missing/unreadable
  results = read_cached_results

rescue DataNotAvailableException
  # This tells me that when I load the page
  # I should invoke an ajax a couple seconds
  # later to check for results again
  flash[:update_right_away] = true
ensure
  respond_to do |wants|
    # render an RJS update with the results
  end
end

def precache_the_results
  Thread.new {
    expensive_action_outputs_to("file.txt")
  }
end

Because I didn’t join the Thread to the request thread, it lives on after the request completes, which is just what I needed. Since the code inside my Thread is a call to an external provider and doesn’t write to the database, I am not concerned with ActiveRecord threading issues.

The only problem with this approach is that in development mode, Rails likes to reload your classes on every request. But if your thread runs past the request lifetime, the class that’s running it may be unloaded while it’s running, wreaking all sorts of havoc. But Ruby allows us the power to be truly evil:. What if I just prevent Threads from doing what they want to in development mode? Turns out I can!

if ENV['RAILS_ENV'] == 'development'
  class Thread
    def initialize(&block)
      block.call
    end
  end
end

This code is defined in the class where I’m doing the magic. Do NOT just slap this into your environment.rb as you’ll horribly break the Rails startup logic. There’s probably a slightly smarter and safer way to do this by using a Factory pattern to create the threads and explicitly specifying the implementation you want. But this is my party and I’ll monkeypatch if I want to.

So..comments, suggestions, complaints? Is this going to die horribly in production? I guess we’ll have to see!


9 Comments

Posted by
RoR Tuesday June 17, 2008
17 June 2008 @ 1pm

[...] Long running Threads in Rails and metaprogramming fun. You know, I should just link to Yan’s blog and be done with it. [...]


Posted by
Ilya Grigorik
3 August 2008 @ 5pm

Yan, that’s an interesting approach. ;) One thing I still don’t fully understand: how are you polling for updates on the thread is spawned? Do you have another action synchronizing on the file you’re writing to?


Posted by
Yan
3 August 2008 @ 5pm

It’s basically ‘lazy polling’. The way it works is that there is an ajax poll on the front end that invokes the action. The first time the action is invoked, there is no data yet and a thread is spawned to populate. The next time it is polled, it reads the data from the first poll, and spawns another background thread to update. So essentially every time the frontend polls it is getting data that is approximately ‘one request old’. Does that make sense?

Also, I should note that in Rails 2.1 you can probably make this a lot cleaner by using the new cache read/write mechanism. Here I manually write to the file. Right now there is no mutex around the file writing which means technically I can have a race condition that causes the file to have ‘incorrect’ data, but in my case there is really no such thing as incorrect data… basically I’m using this technique to poll Amazon EC2 for launched instance state so at some point amazon will start returning ’started’ and that will be the final state and it will not really change. I am not sure if I am opening the file up to corruption if two threads try to write to it at the same time, so probably a mutex would be a good idea in the future.


Posted by
Ilya Grigorik
3 August 2008 @ 6pm

Ah, gotcha – asynchronous RPC. In fact, we’re using similar model for one our services at AideRSS: when you query for a feed_id, if we don’t have it in our system, we spawn a worker, report ‘progress:0′ and return. Any subsequent call to same endpoint will start reporting the progress. For synchronization between workers, we use a memcached server (worker is on a different machine). It’s been working great so far!


Posted by
dunioltanioli
21 January 2009 @ 4pm

Using internet is simple as hell. But I can tell y ou right now, it can be very hard, if you are the first time user.
So, first thing I suggest – open the Explorer, and type in the address you like.
You’ll get there really fast, it depends on your connection speed.
Good luck.


Posted by
JadabredCrarl
24 April 2009 @ 6am

I’m the only one in this world. Can please someone join me in this life? Or maybe death…


Posted by
Scott A.
26 June 2009 @ 12pm

I’ve been thinking on this problem for a few days and just came across this post.

In an application I’m working on, I need to make a non-critical remote API request that can be potentially-long running for which I do not need to return a result. So far, this is looking like a great way to break that out of the context of the user’s request without having to store the data or spawn a background process.

Thanks for writing this up!

Scott


Posted by
Yan
26 June 2009 @ 1pm

I should say that I recently ran into very strange errors that I believe may be related to this implementation. We are currently moving away from it toward a standalone server that does our background jobs. Basically we use RabbitMQ to pass a message to the server saying “do this”, and create a database record to make the job is in progress. When the rabbit server (which is a full rails process) completes the job, it marks the database record as done, meanwhile the client is ajax polling for the result and when the record is marked as done, the client gets the response, generated by the action being polled.


Posted by
Amit
6 October 2009 @ 11am

Hi

Do you have code for this Long running Threads in Rails? Please send me if possible

Thanks
Amit


Leave a Comment