We have written a lot here about the the vision of building a structured layer on
top of the current web. Annotating billions of HTML documents in a bottom-up way or building top-down tools that can automagically
interpret the existing information are the two approaches that we discussed. Together these approaches would result in a global
database which will make the web even more connected.
The ability to correlate content and concepts accross web sites would reduce the time necessary for searching and would enable the discovery of related information.
In previous posts we discussed the difficulties with the bottom-up approach to the Semantic Web - a sophisticated form of annotating information using tools like RDF and OWL. Among the factors that impair the web wide adoption of these tools is complexity and the lack of clear end user benefits.
On the other hand, the top-down approach that we discussed does not place any burden on content owners and delivers instant benefits to end users. Yet, the top-down tools run into a difficulty - interpreting raw information is not that simple. Typical solutions focus on a vertical, but still suffer from imperfections.
What if there was some minimal annotation in the content to help top-down tools interpret it? In this post we look at how content owners can implement simple annotation strategies which can help the top-down tools and search engines to make the web more structured.
It is striking how many sites today do not use meta tags in the head of the document to provide the bare minimum information about a page's content.
Forget building a smarter web, this is just plain bad SEO practice. The work that is being put into generating great content can be offset by lack of a succinct, meaningful description
of that content. Every page on the web should have the following information filled in:
Note that it makes sense to provide different information for the root page and subsequent pages. For example, for a newspaper or a blog, the root page should provide information about the site at large, while individual article and post pages should contain information about that specific page, not the overall site.
The New York Times' web site provides a good example of how to properly use meta tags. For example, this article on Slowdown in US Growth includes the following meta data:
The New York Times is actually a great example of taking the basics of annotation and building on top of them. Each page includes an extended set of rich meta data including, the author of the article, the date it was published, thumbnail image URL, creator, category and even ticker symbols for public companies that are mentioned in the article. Certainly, the New York Times provides a really great set of information, perhaps even wider than needed for most content, but lets focus on the ones that should be used on a wider scale.
author: Web content is produced by people and for people. With the rise of social culture we are increasingly interested in finding bits of everyone's identity around the web. If something piqued your interest enough for you to blog or to write an article, at least you can put your name on it. Having people attached to content would allow seamless navigation from one to another. There is already a standard meta tag for this, with a suggestive name: author.
thumbnail: We love pictures. Since the launch of Flickr we can't live without them. Facebook's success owes a lot to photo sharing. With bandwidth becoming cheap, we are increasingly become more visual. We do not want text we want pictures, so if a news article or blog post contains an image, it is simple to do what the Times did - generate a meta tag for it. There is no standard meta as far as I know, but any of these would do: thumb, image, picture, thumbnail, etc.
date: As we are becoming a real-time culture the freshness of content becomes paramount. Tagging the page with date is important way of helping classify the page in time. Most blog posts and articles contain dates anyways, and having a standard date header would make it simple and obvious.
location: Location is becoming increasingly more important as well. With GPS and widely available Internet access we are able to easily let people know where we are and are able to take advantage of local services. If the article or a post is related to a specific location there is a conventional way of annotating it. The technical term for annotating content with location information is Geotagging. It generally means placing a pair of latitude and longtitude coordinates. A more relaxed form would be specifying country/region/city and is described in detail by the Geo microformat specification. While specifying exact position coordinates may be difficult, even something as simple as the geo header New York, NY would be very helpful.
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
Microformats rock! It's one of those things that you wonder why nobody thought of them sooner. I'm excited to see how the new API in FF3 will allow better exposure of microformats in the future.
Posted by: Rory | January 31, 2008 2:11 AM
A fundamental microformat is still missing: one to represent measurable physical quantities (i.e. Any Data Value!) into semantic HTML.
This would allow browsers/plugin and js scripts to search and collect data from various URLs (in any unit-of-measure) and proceed straight into processing and mashup tasks; no more scraping, nor Readme.txt to describe how to interpret the downloaded .csv file.
Some preliminary notes are on the microformats wiki, but nobody seems interested enough to get the discussion going and the 'hmeasure' thing out of the Draft-state limbo.
Posted by: LucaPost | January 31, 2008 3:26 AM
Regarding the standardising of blogging templates, there has been some excellent research/analysis done on that front to see if there are any significant numbers of common objects in HTML.
After a bunch of folk provided some excellent data, Google thought they'd step in and provide a small billion document set for comparison.
You can find their work and links to others on at Google's Web Authoring Statistics.
Posted by: Al | January 31, 2008 4:16 AM
Bravo - we'll get to that semantic web...even if we wear are fingers down with the extra typing!
Posted by: Don Jones | January 31, 2008 6:46 AM
For the structure for blog posts, I recommend following the microformat hAtom:
http://microformats.org/wiki/hatom
Posted by: Luigi Montanez | January 31, 2008 6:51 AM
Excellent post. Excellent subject. But I do not see any mention of the "anti-structure" forces at work.
I was a Joost pre-beta tester ONLY because of their ( now retracted ) promise of annotated television. Some secret meeting occurred at Joost and the decision was made to abandon RDF and/or Microformat markup for the videos. No explanation. Nothing.
Also, I want to bring to your attention the recent killing off the WC3's plan to have plain mark-up in HTML 5 for videos by Adobe:
http://www.digitalcitizen.info/2007/12/30/ogg-theoravorbis-as-default-for-video-scuttled-in-html5-spec-who-benefits/
Ogg Theora isn't that great of a file format, but it shows that our altruistic attempts at making a "Read Write Web" will be stopped by very greedy, close minded people.
Posted by: Todd | January 31, 2008 7:39 AM
rather than using a class of post for the post container. Wouldn't it make more sense to use a class of hEntry and go with the hAtom microformat?
Posted by: Brian | January 31, 2008 8:27 AM
Microformats, standarized blogging templates, and even HTML meta tags are all great ways of exposing additional "structured" information in web pages.
Additionally, services like Orchestr8's AlchemyGrid ( http://grid.orch8.net/extractions/grab ) make it easy to 'apply semantic structure' to existing websites, exposing their data to semantic webapps, mashups, and other stuff.
Posted by: Dmitriy | January 31, 2008 8:55 AM
Really useful article. Web structuring is more important as it may help site to get good rank in search engine.
Posted by: Ron | January 31, 2008 9:07 AM
Looks like this page is missing the keyword, date, and location info ;)
Posted by: Craig | January 31, 2008 9:56 AM
I agree; this is a great aritcle. I think that having a more structured web will not just help search, but will help usability as more and more people are getting away from the alrge-screened terminals for accessing the web. Having the abiltiy to search distinctly, and then manage what you've found seems to be a ripe recipie for making mobile and vertical applications of the web grow healthier.
Of course, when semantics are entered into the equasion, advertising can become much more targeted, and trends are much easier to notice across users. Privacy folks could have a ball with this kind of info. And that's the cost of being more organized, more people can find ya.
Posted by: Antoine of MMM/Brighthand | January 31, 2008 11:14 AM
Sounds great, but be carefull how much weight you give this meta data. If maybe there was someway the page author (or blog template for that matter) can have a very small blue square in the upper right most part of the page that when clicked allowed the user to thumb up or down the authored meta data as well as supplement.
This can allow you to give more weight to top down meta based on qty of thumbs while allowing a community to validate it from the bottom up. Best of both worlds?
Posted by: Marc | January 31, 2008 1:16 PM
@LucaPost:
Agree on your point about measures, discussion does seem to have stagnated on this. I've looked into the hRecipe spec and that touches on the need to mark up quantities. I see some issues with the abbr pattern in that parsers would have to know the spelling and misseplling of measure units in order to convert them.
Also there are a wide range of units out there and marking all of them up in a standard way is a challenge and I think this is why not much is going on here. I don't think it's apathy from the very small microformat development community, more a case of the difficulties involved with quanity units.
Please feel free to join in on the mailing lists.
Cheers
Lee
Posted by: Lee Jordan | February 13, 2008 2:45 AM
yep, just few of these steps have a great impact on your SERP's.
Posted by: Praveen | February 13, 2008 3:42 AM
In terms of location (geotagging) metadata elements or microformat tags are all very well and good--if you're a machine. They aren't generally visible to humans who may otherwise miss the fact that location-specific information is associated with a particular blog post/ photo/ feed etc. That's why I propose a web "standard" Geotag Icon to add visual identification to geotagged content:
http://www.bioneural.net/2008/02/21/a-web-standard-icon-for-geotagging/
If more people see other people tagging their content ("What's that icon?"), more people will start tagging their own content.
Posted by: Bruce McKenzie | February 21, 2008 1:06 PM
Wordpress.com’s semantic tools such as categories, tags, urls for individual posts, author’s name generated automatically to each post, dates per post, seem to mimic the function of metatags and are Search Engine friendly. As well, when my del.icio.us tags and wordpress.com’s tags and categories are syncronized, I think this performs a similar role to metatagging. To make it even more elegant, del.icio.us offers suggestions for popular tags used by other del.icio.us users on posts and sites that have already been entered into their database. For example, del.cio.us suggests these tags for Alex Iskold’s useful post on structuring the Internet through metatagging: Blog, blogging, code, CSS, Design, development, findability, folksonomy, howto, HTML, marketing, metadata, readwriteweb, semantic, semantic_web, semantics, semanticweb, tag, tags, tips, trends, visualization, web, web3.0, webdesign, XHTML, markup, Internet, microformats.
After reviewing the ReadWriteWeb article on structuring the Internet, I looked up the New York Times metatags offered as a best practice model by ReadWriteWeb and attempted to adapt them to my own Speechless blog. Wordpress quickly eliminated my outlaw codes leaving no trace.
On my wishlist for metatags or semantic web structure would be basic webliographic information for serious content users who want to attribute ideas and content.
I read your informative, up-to-date, well-researched blog regularly, often experimenting with ideas, technology and tools that you recommend.
Iskold, Alex. 2008. “How YOU Can Make the Web More Structured.” >> ReadWriteWeb. Uploaded. January 30, 2008 10:48 PM. Accessed February 2008.
Posted by: Maureen Flynn-Burhoe | February 24, 2008 2:14 PM
Really useful, thanks!
Posted by: VoiD | February 25, 2008 3:29 AM
Great post...excellent info..I learn something new today!
Posted by: Josh | February 25, 2008 5:32 AM
thanks for your subject. it is very important for internet users.i will write your site .. please write
me back. thank you
Posted by: Evden Eve Nakliyat | February 27, 2008 3:36 PM