A little Ruby goes a long way

I’ve been coding PHP for 8 years now, and it can be hard to jump into another language when you’re so entrenched in the syntax and quirks of another. However, I still think it’s a useful exercise from an intellectual standpoint to dabble in “competitors” to PHP, like Python and Ruby. It can also provide some real productivity gains when you leverage something another language does more effectively than PHP.

The Web2.0 kids love to chatter about Ruby’s beauty and elegance and super awesomeness. I don’t give a crap about that per se, but their groovy love-in has produced some very nice libraries for handling Web2.0-ish stuff like XML feeds. The FeedTools lib is a very comprehensive, very powerful Ruby library that makes creating, consuming and manipulating feeds much easier — moreso than anything I’ve found in PHP.

Ruby existed as a good general-purpose scripting language long before Rails made it a nerd-hipster household name, and that’s how we’re going to use it here: as a good ol’ fashioned CLI script to aggregate a few separate XML feeds into a single aggregated feed. You’ll need to have Ruby and the RubyGems package manager installed, which is an exercise left up to the reader (it varies from platform to platform). Once we have them working, you’ll need to install the FeedTools gem:

# gem install feedtools

Now that we have that installed, you can use the feed_tools library in your Ruby scripts. Here’s a script I wrote to aggregate three feeds from CERIAS into a single combo feed:

require "RubyGems"
require "feed_tools"

feedurls = %w(http://www.cerias.purdue.edu/feeds/news
  http://www.cerias.purdue.edu/weblogs/feed/
  http://www.cerias.purdue.edu/feeds/seminars_podcast)

combo = FeedTools::build_merged_feed feedurls

combo.feed_type = 'atom'
combo.title = 'CERIAS Super Combined Feed'
combo.copyright = '2006 CERIAS'
combo.author = 'CERIAS <webmaster@cerias.purdue.edu>'
combo.id = "http://foo.bar/foobar/combo.xml"

File.open('./combo.xml', 'w') do |file|
  file.puts combo.build_xml()
end

puts "done writing"

To execute this, you’ll want to do something like:

# ruby /path/to/script/generatecombo.rb

The best idea would be to run this as a cron job, as the feed combination process takes maybe a minute even for just these three feeds.

I think most of the script is pretty self-explanatory, but let’s break it down:

require "RubyGems"
require "feed_tools"

This works just like you’d imagine. Note that to use gem libs, we need to require the RubyGems lib first, as it does some mucketymuck with the require statement to get it to handle gems properly.

feedurls = %w(http://www.cerias.purdue.edu/feeds/news
  http://www.cerias.purdue.edu/weblogs/feed/
  http://www.cerias.purdue.edu/feeds/seminars_podcast)

This section just makes an array called feedurls that contains the three feed URLs. %w(...) is just a handy little shorthand for “treat the stuff between the parentheses as a string, and explode it into an array with whitespace as the delimiter”. I really wish PHP had something like this, as writing out array data can be pretty tedious.

combo = FeedTools::build_merged_feed feedurls

This is the line that does all the work of combining the feeds. One line. Pretty smooth. So we now have a Feed object called combo.

combo.feed_type = 'atom'
combo.title = 'CERIAS Super Combined Feed'
combo.copyright = '2006 CERIAS'
combo.author = 'CERIAS <webmaster@cerias.purdue.edu>'
combo.id = "http://foo.bar/foobar/combo.xml"

We need to set a few properties for combo before it can be published, though. A couple might not be self-explanatory:

  • feed_type determines your feed format.
  • id sets a unique URL for the feed

There are many more attributes for feeds available; read the docs on the Feed class for more info.

File.open('./combo.xml', 'w') do |file|
  file.puts combo.build_xml()
end

The build_xml method generates the XML for the feed object we’ve created. This final piece of code opens (or creates and opens, if necessary) a file called “combo.xml” in the current directory and writes that XML. The file is automatically closed when this block finished executing.

So, we now have a nice combined XML feed, ready to be parsed into our PHP apps using Simplepie or Zend_Feed, and/or served up as-is to feed reading clients.

[tags]ruby, FeedTools, php, xml, generator, parsing, combining[/tags]

  • engtech
    http://InternetDuctTape.com
    03/15/2008 10:59:44 PM

    I’m giving feed_tools a shot as a replacement to the default RSS module as I don’t like how it doesn’t handle atom seemlessly and how it gives me very different structures between different flavours of RSS.

    I’m doing my usual “blog search” before wasting time on a library, and I was wondering if you ended up having any issues with it?

  • funkatron
    http://funkatron.com
    03/15/2008 11:39:22 PM

    I can’t say that I’ve done a ton with feed_tools since then, especially because I do 99% of my web app work in PHP. On the PHP side, SimplePie (http://simplepie.org) is by far the best library for this — it seamlessly handles the various formats, but allows you to also customize for edge cases.