-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 =pod Tiny URLs, like the ones L uses, have been stirring up a lot of controversy lately. People wonder what will happen if the tiny URL service goes down, goes away, or goes rogue. The links in their Twitter posts could become useless, and that would be tragic. This is a problem that's easier to work around than to whine about, so I've implemented a web service that unshortens any short URL. You can get the code from my github repository: L. Start it with C, then visit L. An application like this is an interesting introduction to some neat modules, including C (in my github L), L), L, and L. Here's how I implemented it. When we start the app, we spawn an HTTP server to handle the shortening requests. We only do one thing -- accept short URLs as the request path, resolve them to long URLs, and return the long URL as the response body. We cache the mappings in KiokuDB, to save network traffic in case some URLs become popular. We implement this as two classes, C, the main app, and C, the class that handles storage and lookup for us. The application class, C, handles a few things. It is "runnable", so it has a run method that reads the command-line arguments and starts the app. That part looks like this: lang:Perl use MooseX::Declare; use POE; use feature 'say'; class Unshorten with MooseX::Getopt with MooseX::Runnable { has 'dsn' => ( is => 'ro', isa => 'Str', required => 1, default => 'hash', documentation => 'DSN of the KiokuDB', ); method run() { POE::Kernel->run; } }; C is what gives us the class, with, and method keywords. (It also does a lot more.) C will introspect our class and provide a C method to instantiate the class with options set via C<@ARGV>. It will also print a helpful help message when you run the app with C<--help>. (The "documentation" parameter to the attribute definition becomes a part of this message.) C is a tag that tells a command line utility, C, that this is a runnable class. C knows how to load the class, invoke C, and then start the application by calling C. A small detail -- support for C is not something that's baked into C's internals. C actually introspects the class and determines how best to make an instance of it. So if you write your own C replacement, you can just write a small class in the C namespace, and C will automatically support classes that use your replacement. These schemes are composable, and there is an option to mix in user-specified modifiers (like "Daemonize") at application load time. Read the docs for details. Finally, we also load POE and start it running when the app starts. Eventually, we will add POE sessions that handle HTTP requests and responses, DNS lookups, and the HTTP gets we need to do to resolve URLs. Doing this with POE (or another event loop) ensures that our app can handle multiple concurrent requests. If, for example, an HTTP request is particularly slow, other clients can be serviced while we wait for the answer. This means that we only need to run one instance of the app per CPU, cutting down on memory usage dramatically. It's also The Right Way To Do Things. (I will write a lot more about Event-based web applications in the future. Consider this a very small taste of what's to come.) So at this point, we can try running our app with C (installed when you install C). Since the app isn't in C<@INC> (it's in C), we invoke C with C<-I>, just like we'd invoke Perl: lang:undef $ mx-run -Ilib Unshorten It's worth noting that both C and C have C<--help> commands. You can see C's help string with C. You can see C's help string with C: $ mx-run -Ilib Unshorten --help usage: mx-run ... Unshorten [long options...] --dsn DSN of the KiokuDB Now that we have a shell of an app to run, we can implement the fun parts. The next thing we want to do is to add the C HTTP server. I wrote a role called C to make adding an C instance to a Moose class very easy. We just add C to the rest of the with statements. Then we have C<< $self->engine >>, which is the HTTP server. The user can also see a C<--port> option when he runs the app with C<--help>, and of course override the default port by specifying a value for that argument. We do need to add one bit of configuration to the class; we need to tell C to use C's POE backend: lang:Perl has '_engine_type' => ( is => 'ro', default => 'POE' ); We also want to start the HTTP service when the app starts, so we replace the C method with one that does that: method run() { $self->engine->run; say "Starting server on port ". $self->port; POE::Kernel->run; } C<< $self->engine->run >> creates the server and tells POE about it, while C<< POE::Kernel->run >> actually starts it running. If you try to start the app now, it won't load, though, because the C role requires that we implement a C method. That is the method that accepts a Request and returns a Response. We can write this and finish off the app class if we assume that we have a C<< $self->model >> accessor that will return an instance of the C class. We also assume that C has a C method that takes a short URI and returns a long URI. (I say C here, since the input and output are both instances of the C class. But generally, I call them URLs.) use TryCatch; method handle_request(HTTP::Engine::Request $req){ my $short = URI->new(substr $req->uri->path, 1); try { my $long = $self->model->unshorten($short) or die 'No long URL returned.'; return HTTP::Engine::Response->new( content_type => 'text/plain', body => $long, ); } catch($msg) { $self->model->delete("$short"); # just to make sure bad data doesn't stay return HTTP::Engine::Response->new( code => 500, content_type => 'text/plain', body => "An error occurred while shortening '$short': $msg", ); } } First, we turn everything in the URL after the first / into a URI object. Then we try to unshorten the URI (with our model). If that works, we return the long URI as the response. If an exception is thrown, or something, we return a helpful error message instead. (Note that C is essentially a Mooseified version of C. The same goes for C<::Request>.) That's basically the whole application class. We do need to add a way to access the model, but that's very easy: has 'model' => ( traits => ['NoGetopt'], is => 'ro', isa => 'Unshorten::Model', lazy_build => 1, ); method _build_model() { return Unshorten::Model->new( dsn => $self->dsn ); } Now C<< $self->model >> will return an instance of C that's connected to the database. (The NoGetopt meta-attribute trait says to not let the user specify a value for this on the command-line when running the app.) That's all for the app class. (You can see the whole thing in one piece L.) Now we need to write the Model class. This class will handle the actual URL unshortening, and it will interface with the database. We are using L as the database. It's an object database that can store any in-memory structure somewhere permanent. It has backends that can store your data in-memory ("hash"), BerkelyDB, files on disk, DBI databases, Amazon SimpleDB, CouchDB, etc. It also supports indexing, laziness, sets, and lots of other neat things. I will write more about it sometime in the future, but I will point out that it's not a toy; it's replaced traditional relational databases in most of our "real apps", and it's done so without slowing anything down (in fact, it's sped up a few of our queries that it can index but that Postgres can't). It has also significantly reduced the amount of code required in our apps. (No more C mapping; our "real" classes are the ones we get out of the database.) It's probably overkill for our application, since we only need to map strings to strings, but it's actually simpler to use than any of the key/value databases on CPAN. It will also be easier to extend our app this way, if we ever need to store complex data structures instead of just strings, everything will Just Work. We also end up writing 0 lines of code to get KiokuDB into our app, so while it may be overkill, it's cheap overkill. If you look on CPAN, you'll see that there is a C class that's designed for gluing a KiokuDB to your application. That is exactly what we need to do here, so our C will be a subclass. That way we get the database connection "for free" and we just need to write the C function. To get started, our class will look like this: use MooseX::Declare; class Unshorten::Model extends KiokuX::Model { use MooseX::Types::URI qw(URI); method unshorten(Uri $url does coerce) { return $url; }; }; (Note that "extends KiokuX::Model" is on the same line as "class Unshorten::Model". This is where my "0 lines of code" figure comes from. You needed that line anyway :) This snippet gives us enough of our API to start the app and try a real request. If you run the app and visit C, you'll get the unshortened version of C, namely C. Useless, but you now have a database-driven web app in just two files! (Also, one neat feature of C is its reuse of Moose's attribute type constraints for method arguments. In this case, the C type can coerce strings to URI objects, so we can actually pass this method either a URI object or a plain string, and it will do the right thing. This is very convenient, especially for things that are more complicated than a 2-file web app.) Now there is the small matter of actually unshortening URLs. The algorithm I chose to do this is to request the URL, follow all redirects, and see where we end up. This is really easy with C. It will handle the redirect following for us, and it does all the network transfers asynchronously (so that requests can run in parallel). It even does non-blocking DNS lookups. All we need to do is this: use AnyEvent::HTTP qw(http_head); method _resolve_url(Uri $url) { my $done = AnyEvent->condvar; http_head $url, sub { my ($data, $headers) = @_; my $long_url = eval { URI->new($headers->{URL}) }; confess 'Failed to get a URL' unless $long_url; $done->send($long_url); }; return $done->recv; } We start by creating a condvar, kicking off the HTTP request, and waiting for the condvar to have a value. This looks like it blocks on C, but C actually enters the event loop and lets other requests run. So if an HTTP request takes forever, we can still handle requests from other users while we are waiting. The final piece of the app is to integrate the URL lookups with KiokuDB. The idea is to see if we've already resolved a URL, and if so, return that instead of actually doing the lookup. If we haven't resolved the URL yet, we should do that, save the result, and then return the unshortened URL. I wrote a function to abstract this process away: method _resolve_or_cache(Uri $url, CodeRef $expander) { my $scope = $self->new_scope; my $expanded = $self->lookup("$url"); return $expanded->{redirects_to} if $expanded; # needs to be expanded $expanded = $expander->($url); # "Unknown reason" because the shortener should die if it knows # the reason confess "Unable to shorten '$url': Unknown reason" unless $expanded; $self->insert("$url" => { redirects_to => $expanded }); return $expanded; } This takes a URL and a function to unshorten that URL, and then implements the caching semantics on top of those. Now we can replace our "testing" C function with the real version: method unshorten(Uri $url) { return $self->_resolve_or_cache($url, sub { $self->_resolve_url(@_), }); } And with that, the app is fully functional. There's a total of 75 lines of code to implement a fully event-driven web app that knows how to configure itself from command-line arguments. Thanks to C and C, there are no boilerplate scripts required to start the app and parse command-line arguments. And thanks to C, we don't even have to say C at the top of each method. Nice! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAknahfEACgkQ2rw+dVvzZm17mQCcDV2MtxhhT0VLxxvBM6Lx6sVu IfAAn11KB+D3w+i1rPrJcADnDf7pvwjy =adYu -----END PGP SIGNATURE-----