You can read more here, or return to The Validator

So Two-Oh

The Story Behind the Web 2.0 Validator

So Dan and I are sitting around the office …

... AKA the Common Ground coffee shop. And we’re commenting on how so many people are talking Web 2.0 this and Web 2.0 that, and while so doing these people point out one or a dozen sites or features or hacks, yet it’s still near impossible to get a good, clear understanding of what really makes something Web 2.0.

Is it the use of AJAX (AKA remote scripting and DHTML)? Hmm. Maybe, but people have been using that since 1998.

Maybe it’s simply using the word AJAX that makes a site Web 2.0. Nah; seems too easy.

Or is it the profuse display of pastel boxes with nice curved corners and gradient backgrounds? Or combining data from two or more other Webs sites (ideally, each also being Web 2.0)?

It would be nice, we thought, if there was a simple tool, a validator, that could examine a Web site and declare, with complete scientific authority, if it was or was not really Web 2.0.

So James gets coding …

... and after about 30 minutes of Og/Nitro hacking the first Web 2.0 validator was born.

The idea was simple:

  • Allow the user to enter a URL
  • Fetch the source for the site (well, the main page at least)
  • Run the page through a battery of tests.

One page, with some modern styling, plus some spiffy DOM scripting tossed in for good effect.

It worked well enough, and the initial code was fairly clever in how validation rules were loaded and executed.

So the First Version Went Like This …

There was a Validation class, which received the URL in question and fetched the source HTML. In addition to a small number of helper methods, the class defined the Web 2.0 validation rules. Each rule method would be given the site HTML, and return true or false if it passed the test.

The code used a bit of Ruby introspection magic to get the list of validation methods to call. To qualify as a validation rule, a method had to have name that began with ‘test_’, and require no arguments; the presumption being that any data required by these methods would be available from instance methods.

A validation test method might like something like this:

  # Simple is good, right?
  def test_uses_superfly_Prototype_script?
    @html =~ /prototype\.js/im ? true : false 
  end

Surely any site that references this JavaScript library must be Web 2.0, no? Mais, oui!

A validation test set was then created like so:

  @tests = self.methods.sort.each { |m|
    if m.to_s =~ /^test_/
      @tests << m.to_s
    end
  }

And execution invoked thusly:

	@html = some_code_that_grabs_the_html
	@tests.each do |test|
	@results[ test ] = self.send( test )
	end

There are, of course, all sorts of presumptions here. First, each test_* method should be returning true or false, depending on whether it deemed the target site to possess some Web 2.0-itudeness. Second, each method needs to know to grab the source HTML from @html, and of course do something reasonably interesting that evaluates to a boolean. And, when render the results a bit later in the code, the method name (minus the leading ‘test_’) should make for a passable English sentence or query.

So Web 2.0 Leads to Carpal Tunnel Syndrome?

Of course, no one item could arbitrate Web 2.0-inality; the validator needed to keep track of multiple tests and their results. More tests were added to the Validator class. But after writing three or four more, it became apparent that they were all quite similar. Each took the source HTML, ran some simple (i.e. a regular expression) test, and returned true or false. The alarm bells of coding tedium erupted. Rather than have to hand-code the repeated test-method scaffolding, it seemed there should be a way to express the essence of a rule in some simpler syntax, some sort of Web 2.0 validation DSL (domain-specific language).

Given that the only parts that changed were the method name and the set of regular expressions, rules really could be defined like this:

	"mentions_Ruby?", [ /ruby/ ]
	"has_blogroll?", [ /bloglines/ ]
	"refers_to_delDOTicioDOTus?", [ /del\.icio\.us/ ]
	"attempts_to_be_XHTML_strict?", [ /xhtml1-strict\.dtd/ ]

Indeed, these are sample arguments to the first version of a code generator method, make_test_method :

	def make_test_method name, rules
		rules.map! do |r| 
			r = Regexp.new( r.to_s ) if r.class != Regexp
			"return false unless @html =~ /#{r.source}/im" 
    	end
		rule_set = rules.join( "\n" )
		code = " def test_#{name}\n #{rule_set}\n return true\n end\n " 
		instance_eval( code )
	end

There are various ways to dynamically create Ruby methods. This is one of them. This iteration of the code allowed one to edit a map of rule_name => regexp_list pairs instead of managing a set of complete methods. This may seem a marginal reduction in typing, but it made it much easier to see what rules were defined; removing the method scaffolding enhanced the meaning. A handy tip: rewriting business logic using a DSL can make the meaning of your code more clear, both because the DSL eliminates scaffolding code that may obscure logic, and, because DSL text typically has a visually different format, highlighting its special role.

But, more important for the lazy, having code capable of interpreting a DSL opens it open for outside contributions and control.

So Grab a Mitt, Join the Game

Once it became apparent how easy it was to add new rules, the next step, (in the spirit of Web 2.0 of course) was to offload that task to others. A well-designed DSL can sufficiently abstract away the ‘codiness’ of task definitions, allowing more regular folks to define commands and instructions, focusing on some desired goal and not on the quirks and needs of any particular full-blown programming language. One needn’t have to know Ruby in order to create Web 2.0 validation rules.

The next question, then, was, how do we allow people to add new rules? Continuing with the Web 2.0 Zeitgeist, and offering a compelling opportunity to use the term ‘mash-up’, the obvious choice was del.icio.us.

Del.icio.us almost certainly qualifies as one of the quintessential Web 2.0 sites (though, naturally, the Web 2.0 Validator is the final decision maker on that). It is a prime example of ‘social software’. One uses it to track Web page bookmarks, with options to assign arbitrary tags to a given URL. The tagging is the first thing that makes the site special. Most conventional bookmark tools, such as are part of standard Web browsers, seem to assume that each bookmark can be neatly assigned to a single category or folder. The reality is that bookmarks, as with much of the data one deals with, may fall into any number of equally valid categories. For example, this very site might be bookmarked under Ruby, Web Development, and Clever Ideas Gone Hideously Astray. Del.icio.us makes such multiple categorization fairly simple.

The other thing that makes Del.icio.us special is that all of this bookmarking happens in public as part of an intertwined collective. Not only can you locate your own bookmarks by what tags have been assigned, you (and everyone else) can see all the bookmarks created for any set of tags. And, given a URL, you can see who else also bookmarked that site, and what tags were used, and what other sites these people bookmarked. As users, driven by personal interest, go about adding data, they are given access to, and expand, a network of related resources.

Del.icio.us offers all sorts of treats on top of this core set of features. Of particular interest to the Web 2.0 Validator is the option to get an RDF feed of posts created for any given URL. All people would have to do was bookmark the validator site URL, and use the ‘notes’ section to define their rule. The validator would then periodically fetch this feed and extract the rules. Wow! Isn’t the future great?

The DSL syntax had to be modified a bit to make this work. Whereas the simplified Ruby version combined a string with an array of regular expressions, the remote version had to be plain text. Now, it could been Ruby plain text, but the idea of slurping in raw Ruby code for possible execution seemed like a bad Idea. But the variation is not that much different. To add a rule via del.icio.us, you write it like this, in the notes field, when bookmarking the Web 2.0 Validator URL:

	Mentions Neurogami and Web 2.0 :  /Web 2\.0/ /Neurogami/ 

The general format is:

	Some Descriptive Rule Name : /regex1/ /regex2/ ... /regexN/

Name of rule, a colon, then one or more regalar expressions seperated by spaces.

So Is This Thing Safe?

Once you’ve invited the general public to inject code of some sort into your Web site, the next trick is to load, parse, and ultimately execute it safely. When the validator grabs the feed, it looks for a description element in each item. Most people seem to leave the notes field blank, so most of the time there is no description element; blanks notes are not included in the feed. (The decription entry for the bookmark becomes the title element in the feed.)

<item rdf:about="http://web2.0validator.com/">
<title>Web 2.0 Validator : Some Clever Title</title>
<link>http://web2.0validator.com/</link>
<description>Mentions Neurogami and Web 2.0 :  /Web 2\.0/ /Neurogami/
</description>
<dc:creator>jamesbritt</dc:creator>
<dc:date>2005-10-26T03:36:10Z</dc:date>
<dc:subject>JamesBritt</dc:subject>
<taxo:topics>
	<rdf:Bag>
		<rdf:li resource="http://del.icio.us/tag/JamesBritt" />
	</rdf:Bag>
</taxo:topics>
</item>

Incidently, you can get such feeds for any URL using this URL format:

	http://del.icio.us/rss/url?url=http://www.theurl.com

This feed is fetched a few times a day. REXML is used to extract those item elements containing a description child element. The rule definition is grabbed and split on that first ’:’ character. The first part is simply the rule name; it is treated as a string and never executed or eval-ed. The second part gets broken up into a set of regular expression strings. Each one is then further altered so that it may be correctly interpreted as YAML syntax for a regular expression. From there it goes through some additonal transformations; if the input is not a simple regex, it will get munged beyond hope, and rejected. Ulimately, the rules are store in a database for later execution.

If the given rule description text is either not an actual rule (for example, someone decides to use the notes field for, say, notes), or for some reason or another does not play well with the process, then the conversion process of remote rule to stored rule quietly fails. The goal is not to make every effort to honor someone’s intention, but to help ensure that potentially dangerous code is not executed.

When offering up a DSL to the general public, you have to work out a balance among language features, security, and complexity. Ideally, people should be able to define validation rules that do more than simply run comparisons with one or more regular expressions; things are quite simple now, but in theory could be more sophisticated. But allowing for more control introduces more risks. Limiting the evaluation process makes things easier all around. And if a better scheme comes along, it can always be changed; this is Web 2.0, the world of eternal beta.

So What is Web 2.0, Really?

The validator came to life as the result of some goofing around, and was primarily intended to provide some amusement. But the implementation is itslef an exploration into some of the themes that seem to pop up around the Web 2.0 meme. Tim O’Reilly makes the plausible claim of inventing the term, though some have pointed out that the “2.0” Web idea exisited earlier (witness the birth of the magazine Business 2.0). Dare Obasanjo has done a good job of summarizing the salient features of at least the O’Reilly take on Web 2.0. (Which reminds me: The Web 2.0 Validator has something of a RESTful Web service exposed by a simple URL query: http://web2.0validator.com/validate?url=http://www.jamesbritt.com returns an XML string with the validation results. See, I told you we were hip. Points off, though, for not using ATOM or FOAF or something with at least some hint of useful semantics. Did I mention this is beta? Oh, wait; having a beta is not Web 2.0 anymore.)

There’s a decent graphic that maps so-called 1.0 aspects to their imagined 2.0 counterparts. Proof is left as an exercise for the reader. My apolgies for having lost track of proper source atribution.

For some, we’re careening towards Web 4.0 any day now. We may be at that Web 3.11 for Workgroups sweetspot right about now.

One person who seems to have a really good take on all this is Peter Merholz . See his article Desiging for the Sandbox, which begat the blog Desiging for the Sandbox

Common, compelling themes among this are that it is decidely not about flashy JavaScript widgets (that stuff, while good and usefull, is really more in tune wth creating Visual Basic for 21st century, and frequant pairing of the terms “AJAX” and “Web 2.0” should become something of a shibbolith, much as people refering to PERL tip their own clueless hand); that the human UI is perhaps secondary to the Web API; that 2.0 sites exist and gain value from the aggragation of user data, which thrives when users are trusted to be in charge.

So, Now It’s Your Turn

Go make some rules and see how different sites fare. Or, better, yet, go make your own Web 2.0 application.

But remember, only the Web 2.0 Validator can tell you for sure.

James Britt