RSS
18 Jan 2007

Lazy Loading BLOBs in Active Record

Author: ceefour | Filed under: Beginner, Complaints, Opinions, Rails, Ruby, Tips, Web 2.0

One of the performance-hurting issue in current Active Record is that it doesn’t allow you to lazy load BLOB columns (and sometimes, TEXT columns), which can contain huge data. Usually there’s nothing wrong, theoretically, with loading all columns. But let’s see how we can improve performance by lazy loading these columns…

Using Views to Lazy Load Columns

My workaround is not to create a separate table, but just create a separate view (I’m using PostgreSQL, but I believe you can do that too with recent MySQL). This view doesn’t contain the blob fields. I use set_table_name in the model to use this view. Since the view is updatable (in PostgreSQL you have to create several RULEs on the view to make it insert/update/delete-able), Active Record doesn’t know it’s dealing with a “fake” table.

Whenever I want to load/update the actual content all I had to do is issue a select_value(“SELECT content FROM photos WHERE ID=?”, …) something like that. Usually you won’t need finder features just to load the lazy blob, only the corresponding ID(s).

The steps:

  1. Create a database view based on the original table, omitting the columns you want to lazy-load.
  2. Make sure the view is updatable.
  3. Use ActiveRecord::Base.set_table_name class method in your model to use the database view you just created.
  4. Add accessors on your model to read/write lazyloaded columns using plain SQL.

Perhaps there is already a plugin to handle this, but a quick Google search didn’t reveal this. icon sad Lazy Loading BLOBs in Active Record
Caching Processed Images

The next thing you may want to do is cache some BLOBs (or processed BLOBs) to files so Apache (or whatever your web server is) serves them directly, which is much faster than Rails processing.
I don’t actually cache all images, since my photos table contain huge blobs (usually over 100 KB each). I only cache the thumbnails. At first request to a photo thumbnail, my app will process it using rmagick, then save the generated thumbnail into some folder in RAILS_ROOT/tmp. I have a special RewriteCond/RewriteRule directive in .htaccess so that if a requested thumbnail is already there, Rails won’t be ran at all and Apache serves it directly. This makes thumbnails *very* fast. (I symlink in RAILS_ROOT/public/tmp_thumbnails to the tmp/thumbnails folder)

Were I able to use memcached maybe I’ll use that too. Unfortunately not all hosting services provide this luxury.
Steps:

  1. Create a temporary folder. Example: #{RAILS_ROOT}/tmp/thumbnails
  2. Make a symbolic link in public to that folder: (how about Windows?)

    ln -s tmp/thumbnails public/tmp_thumbnails

  3. Make sure your controller/action works correctly. Save the resulting output to a file in your tmp/thumbnails folder.
  4. Edit your .htaccess as such:

    # Make sure the file exists
    RewriteCond %{DOCUMENT_ROOT}/tmp_thumbnails/$1.jpg -f

    # Rewrite the URL so it points to the file, not Rails
    RewriteRule ^thumbnails/(\d+)$ tmp_thumbnails/$1.jpg [L]

Sorry for the too theoretical vague steps. icon wink Lazy Loading BLOBs in Active Record

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

  • Erik

    It might be better and more robust to use a “real” cache like Squid.

    Then all you would need to do is assign the proper Last Modified tags, ETags, expiration dates, caching pragmas, etc. Squid would do the rest, storing a local copy of the image the first time it is retrieved, and (depending on the combination of headers you use), either:

    (1) hold onto it and serve it automatically until the expiration date passes, or

    (2) Forward the request to your ruby app, along with the appropriate If-Modified-Since or ETag header (forgot the name of that one). Your ruby app would need to check the date/etag provided against the database and either return a 200 or a “use cached version” (304?) header.

    Admittedly, this could introduce an extra source of latency, but it relies on standard, existing components and software, requiring much less special configuration on the part of the deployer. What will your code do if you run out of disk space? Will you evict older images from the cache in favor of more commonly viewed ones? What if the image file gets corrupted somehow – will your code detect it? If you have to delete or modify an image, will the cached version be deleted?

    These are all things that are more likely to be easy to handle properly (or at all) with a real HTTP cache.

  • http://hendy.gauldong.net/ Hendy Irawan

    Thanks Eric!

    I agree that would be the proper way to do it… :-)

  • http://hendy.gauldong.net/ Hendy Irawan

    Whoops, sorry!

    I meant “ERIK”!!!

    Sorry for the typo :-(

  • Joao Borges

    Hi Hendy,
    After some googling about a problem I just found in a project of mine,
    I found your post “Lazy Loading BLOBs in Active Record” in the site http://www.adaruby.com

    I have been working on a RoR web application to serve as a simplified Document management system. My problem is the following:
    After having provided the application for production, I just found out about the stupid amount of time taken by the query executed to retrieve a file stored in a bytea field
    of my PostgreSQL DB (about 20 seconds for a 12 MB file!). In PHP, the same query takes just 0.5 seconds!
    It also takes 6 seconds to save this same file to the DB!

    I read somewhere that this has to do with the slowness of the parsing for blob objects to “escape and unescape” certain sequence characters…

    So if this is so, there is no workaround to this problem, is there?

    Much apreciated if you can give me some input to solve this problem.

  • http://www.hendyirawan.com/ Hendy Irawan

    Joao:

    As a quick workaround, I can only think of moving the huge BLOBs to flat files… :)

  • Casey

    I wrote a lazy loading plugin… It’s up on refactormycode.com ’cause I’m not particularly proud of it :)

    http://refactormycode.com/codes/219-activerecord-lazy-attribute-loading-plugin-for-rails