My name is Simon Baumgartner.
I am really interested in Music and Programming.
I was born in Munich but moved to NYC a while ago. Recently, I started working at Amazon, where I write software to make things faster and better (and cheaper, you can thank me later).
If you want to get in touch: me at sens3
Since the URL needs to be as short as possible only the information that is absolutely necessary is "stored" in it. In this case the hash "cH2j2V", which is an identifier, necessary to look up the original URL from some kind of storage.
The hash is nothing more than the ID of the storage record for the original URL. Just instead of a base 10 number it is a number with base X. In the case of bit.ly it looks like base 62, which uses [0-9A-Za-z].
(Ruby only supports transformation for a maximum base of 36. I used rubyworks/radix for anything above that)
For 6 digits this gives you room for 56,800,235,583 entries. Not bad.
Why not use one of the existing URL shortener plugins?
The concept described above works great for publicly accessible URL shorteners since they need to shorten whatever URL comes along.
And most of the existing URL shortener plugins work the exact same way. They"ll consist of a database table to store the URLs, and a simple UI.
While this works fine it is way too much for what is essentially necessary.
In most cases where a URL shortener plugin is used the programmer just wants to provide his users with a short URL to one of his resources (profiles, posts, comments) to include in a tweet or somewhere else.
Hence, the kind of URLs that need to be shortened are not arbitrary. Most likely all of them are "identifiers" for your own resources (URI = Uniform Resource Identifier).
For example: http://sens3.com/posts/tooshort-rubygem or http://sens3.com/posts/65
If the app that handles your resources is the same that handles the short URLs the process can be significantly simplified.
How does TooShort work?
This section explains the train of thought behind TooShort. If you want to know how to use it please have a look at the README.
We need to be able to identify any of our resources. So what we need to store is the class and the ID.
I.e Post 64 or Comment 32
We use the same principle that all URL shorteners use to shorten the ID of our record. TooShort currently translates all IDs with base 36 which gives you room for 2,176,782,335 records for a 6-digit hash. That should be enough for most apps.
If we use TooShort for only one resource, the ID is all the information we need to look up an object.
I.e. http://2sh.de/2n9c, 2n9c translated to ID 123456. Since we know which resource we use short URLs for we can lookup the object. Done.
If we use TooShort for more than one resource we need a way to store the class in the hash as well. To do this TooShort lets you specify a scope:
As you can see, no database table is required since all the information we need is already stored in the URL. And once we have the object we can do whatever we want with it (redirect_to, render :json,....)
The second major benefit, besides not having to add another database table, is that we actually map the short URL to an object rather than another URL.
That way, the short URLs are persistent even if you change your URL structure.
If you want to see TooShort in action, look right here at the bottom of this post. The short URL for it is http://sens3.com/1t .
"1t" translates to ID 65. And since we map the short URL to the actual @post object a simple redirect_to takes care of all the rest. More details on expanding short URLs can be found here.
Oh, and if you want to see the real Too $hort in action: