Friday, March 21, 2008

Tag clouds in Ruby and Rails

I love tags, it's a better way to classify content, and it's trendy, at least it was back in 2005. Tagcloud also gives you good visual indication of hot topics of the site content. I will share the method I use for creating a tagcloud in my sites. The trick of tagcloud is collecting all the tags for certain model and give each tag a weight in a scale from 1 to 5 for example. and then render a tagcloud which is basically tags rendered according to their weight, the most frequent tag is much bolder than the less frequent ones.

Tags models
Before using tags we need to create them first. I use a polymorphic relationship to tag several content types using two additional tables. The first table is the "tags" table which basically list all the tags used in the site, it is composed of a simple tag_id and name fields. The other table is the "taggings" tables. It holds all the relation ships between the content and the tags. It is actually a regular many to many association table but serve more than one content type. It may be easier to explain it in code.

class Book < ActiveRecord::Base
has_many :taggings, :as => 'taggable'

class Link < ActiveRecord::Base
has_many :taggings, :as => 'taggable'

class Tag < ActiveRecord::Base
has_many :taggings

class Tagging < ActiveRecord::Base
belongs_to :tag
belongs_to :taggable, :polymorphic => true

In these ActiveRecord models, Link and Book are content models which have many taggings, and each Tagging by its turn belongs to one tag. But notice in the has_many association in the Book and Link model that it we passed an "as => 'taggable'" that tell the ActiveRecord that Tagging belongs to a polymorphic model that is called 'taggable'. Taggable could be any model in your application and this pattern make your tagging mechanism support new models as you make them transparently, all you need to do is adding 'has_many as' association.

class Photo < ActiveRecord::Base
has_many :taggings, :as => 'taggable'

Create the database tables
The following is a basic Mysql schema for the tags, taggings tables:

TABLE `tags` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255),
`created_at` datetime,
`updated_at` datetime,

CREATE TABLE `taggings` (
`id` int(10) unsigned NOT NULL auto_increment,
`tag_id` int(11),
`taggable_type` varchar(20),
`taggable_id` int(11),
`created_at` datetime,

Collecting the tags
The following is a method that could be located in a model probably taggings or in a helper.

def tags_cloud(model, limit=30)
options = {
:select => "count( as count_all, as tag_name, tag_id",
:conditions => {:taggable_type => model.to_s },
:joins => " left outer join tags on" ,
:group => 'tag_id',
:order => 'count_all desc',
:limit => limit

sql = Tagging.send(:construct_finder_sql, options)

taggings = ActiveRecord::Base.connection.select_all(sql)

return [] if taggings.blank?

maxlog = Math.log(taggings.first['count_all'])
minlog = Math.log(taggings.last['count_all'])
rangelog = maxlog - minlog;
rangelog = 1 if maxlog==minlog
min_font = 1
max_font = 5
font_range = max_font - min_font
cloud = []
taggings = taggings.sort{|a,b| a['tag_name'] <=> b['tag_name']}

taggings.each do |tagging|
font_size = min_font + font_range * (Math.log(tagging['count_all']) - minlog)/rangelog
cloud << [tagging['tag_name'], tagging['tag_id'], font_size.to_i, tagging['count_all']] end
return cloud

If I use active record to find the required taggings it will then return a bunch of instantiated Tagging object and that would be a waste of memory and resources. It's better to make Active Record to only construct the SQL for us and then use the SQL to query the database directly.

I used an array with all regular select, conditions, joins and other arguments we usually use with the find method, and then use construct_finder_sql which is a private method of ActiveRecord, that's why i use send method to call it. And it returns the sql statement that i will use directly in an sql connection derived from the ActiveRecord::Base class.

I learned a trick from one of web2.0 books from Oreilly, I don't remember it right now, the trick is to make a contrast between the most frequent tags and less frequent ones using logarithm. Anyway you can figure it our yourself, or just use it as it is :)

The View
This is a sample of a view template that render the tagcloud. IT could be used as a base for a more sophisticated tagcloud.

<% the_cloud = tags_cloud(model) %>
<% unless the_cloud.blank? %>
<div id="big_tags_cloud">
<% the_cloud.each do |tag,id,fsize,count| %>
<span class="t<%= fsize -%>">
a title="<%= count %>"
alt="<%= count -%>"
class="tag_<%= id %>"
href="/<%= model.to_s.downcase %>s/tag/<%= id -%>">
tag %>
<% end %>
<% end %>

Styling the cloud

#big_tags_cloud {padding:5px; text-align:justify; }
#big_tags_cloud a {font-family:'Times New Roman'; line-height:1.7;border-bottom:1px solid #779944;}
#big_tags_cloud span.t1 a{ color:#779944;font-size:11pt;}
#big_tags_cloud span.t2 a{ color:#779944;font-size:12pt;}
#big_tags_cloud span.t3 a{ color:#779944;font-size:13pt;}
#big_tags_cloud span.t4 a{ color:#558800;font-size:14pt;font-weight:bold;}
#big_tags_cloud span.t5 a{ color:#669900;font-size:15pt; font-weight:bold;}


Mike Subelsky said...

This was really helpful to get me going on my own tag cloud code, so thanks! One thing to fix, I had to round off the result of the font size calculation:

font_size = (min_font + font_range * (Math.log(tag['count_all']) - minlog) / rangelog).round

Unknown said...,


JavaScript bloat is a more real problem today than it ever was. Bundlephobia lets you understand the performance cost of adding a npm pack...