The following guest post was submitted by Alister Cameron. Read his blog at Alister Cameron – Blogologist.
I was reading through the Google Blogsearch patent application today. It was filed back in September 2005 and never mentions Blogsearch by name, of course. In reading through all the convoluted legaleze, I discovered what I think are some rather intriguing statements, that give us a tantalizing insight into how the architects of Google’s “secret recipe” think…
Now, this post is probably not going to answer as many questions of yours as it may raise new ones. But the point I want to make here is this: Google determines a quality score for every blog post you write based on more factors that most of us have ever really understood. Indeed, the range of measurements contributing to Google’s quality score applied to your blog posts is nothing short of amazing.
Truthfully, I found myself thinking of Big Brother as I tried to grasp the magnitude of Google’s data-gathering capabilities. You’ll see what I mean as we dig deep into the bowels of this patent application. Along the way, I will consider some ramifications for how you blog and how you approach the marketing of your blog.
We need to dive into the text of the patent application here, specifically a section titled Determining a Quality Score for a Blog Document, starting with a summary of what Google will take into consideration when looking at the “positive indicators” of a blog post (we will not look at the negative indicators today):
 Positive indicators as to the quality of the blog document may be identified (act 620). Such indicators may include a popularity of the blog document, an implied popularity of the blog document, the existence of the blog document in blogrolls, the existence of the blog document in a high quality blogroll, tagging of the blog document, references to the blog document by other sources, and a pagerank of the blog document. It will be appreciated that other indicators may also be used.
Each of the “indicators” listed above are now detailed in turn, and it’s here that Google start to be more revealing about their methods and intentions…
A Key Indicator: Feed Readership
 The popularity of the blog document may be a positive indication of the quality of that blog document. A number of news aggregator sites (commonly called “news readers” or “feed readers”) exist where individuals can subscribe to a blog document (through its feed). Such aggregators store information describing how many individuals have subscribed to given blog documents. A blog document having a high number of subscriptions implies a higher quality for the blog document. Also, subscriptions can be validated against “subscriptions spam” (where spammers subscribe to their own blog documents in an attempt to make them “more popular”) by validating unique users who subscribed, or by filtering unique Internet Protocol (IP) addresses of the subscribers.
I’m intrigued that in this section Google seems to suggest that their main way of determining the popularity of a blog is the number of feed subscribers. This is an acknowledgement that a subscription to a blog feed is a much clearer indicator of a reader’s commitment to your blog than inbound traffic, which can be manipulated in all sorts of (e.g. black hat) ways.
However, the really important question to ask here is: How does Google know how many people are subscribed to your feed? Answer: Google Reader, the most popular feedreader at the moment, with a ( ) or the newsreader market. With this kind of marketshare, Google has 100% accurate data on what feeds 30% of people are subscribed to, and can make a very accurate “educated guess” on the other 70%. It’s an enviable and powerful position to hold.
So for starters — thanks to Reader — Google has a very accurate read on how many people are subscribed to your blog feed. But beyond just subscriber numbers, Google is using Reader to analyze the clicking and reading behaviour of feed subscribers. Google doesn’t just know if I’m subscribed to your blog feed; Google knows how often I actually show up to read your posts, when, where and how often I click through to your site, and so forth. And again, Google can make accurate extrapolations from how people interact with your feed on Reader, to how users of other newsreaders are doing the same.
And all this to derive a popularity rating of your blog and individual posts!
It makes me wonder: if Google sees themselves first and foremost as a search company, then Google Reader may exist, in their minds, first and foremost to measure feed readership accurately and thus derive an accurate quality score for Blogsearch… not (just) to give you and me another funky web-based newsreader to use. Think about it.
Clickety Click Click
 An implied popularity may be identified for the blog document. This implied popularity may be identified by, for example, examining the click stream of search results. For example, if a certain blog document is clicked more than other blog documents when the blog document appears in result sets, this may be an indication that the blog document is popular and, thus, a positive indicator of the quality of the blog document.
Another indicator of the popularity of your blog (and of individual posts) is gleaned from a study of how people click on Google’s search results. For sure, Google records these outbound clicks, and somehow rewards the blogs that get the most clicks. Google will be looking for a lower-ranked blog post on a given page of search results that keeps getting more clicks than a higher-ranked listing. To Google, this would suggest that they need to honour the lower-ranked blog with a higher position in the search results, given that it’s getting more of the clicks.
Of course, the important question to ask here is: What can I do to increase the likelihood that a searcher will click on my link when it appears on a page of Google search results? The answer has been well covered, and has to do with search engine optimization (SEO) techniques related to your blog post’s title and the snippet Google displays under the title of your listing in the search results.
Google’s Verdict: Blogrolls Matter
The Google Blogsearch patent application includes a full three paragraphs dedicated to blogrolls and the significance Google ascribes to them in determining the quality of a blog. So we better not miss this:
 The existence of the blog document in blogrolls may be a positive indication of the quality of the blog document. It will be appreciated that blog documents often contain not only recent entries (i.e., posts), but also “blogrolls,” which are a dense collection of links to external sites (usually other blogs) in which the author/blogger is interested. A blogroll link to a blog document is an indication of popularity of that blog document, so aggregated blogroll links to a blog document can be counted and used to infer magnitude of popularity for the blog document.
Google counts how many other blogs’ blogrolls include a link to your blog, and assigns your blog a score accordingly. This implies that Google knows about blogrolls and respects the fact that bloggers use them to indicate respect/trust/honour for/to other blogs. Blogrolls matter.
 The existence of the blog document in a high quality blogroll may be a positive indication of the quality of the blog document. A high quality blogroll is a blogroll that links to well-known or trusted bloggers. Therefore, a high quality blogroll that also links to the blog document is a positive indicator of the quality of the blog document.
If your humble C-list blog is listed on someone else’s blogroll that otherwise contains only A-list blog links… you’re going to get lots of lovin’ from Google! That’s the point of this paragraph, anyway: Google looks at the kind of company your link keeps on blogrolls.
 Simlarly, the existence of the blog document in a blogroll of a well-known or trusted blogger may also be a positive indication of the quality of the blog document. In this situation, it is assumed that the well-known or trusted blogger would not link to a spamming blogger.
If you’re on Scoble‘s blogroll, Google assumes you’re a) not a spam site and b) immediately worthy of respect… Scoble said so.
The critical question here is obvious: How can I get a link to my blog on more and better blogrolls? And again, the answers are many and varied but (I think) all come down to one key point: you have to earn a place on someone’s blogroll… with good content and consistency. Sure, you’ll get a few blogroll links from buddies and work associates perhaps, but if you want an ever growing number of blogroll links, you need to endear yourself to people you’ve never met, who have grown into committed readers over time, and will (perhaps) reward you with a blogroll link, just coz they want to.
Social Bookmarking is Good for Your Blog Rank!
 Tagging of the blog document may be a positive indication of the quality of the blog document. Some existing sites allow users to add “tags” to (i.e., to “categorize”) a blog document. These custom categorizations are an indicator that an individual has evaluated the content of the blog document and determined that one or more categories appropriately describe its content, and as such are a positive indicator of the quality of the blog document.
The shift away from local/desktop URL bookmarking to online services like Technorati, del.icio.us, Stumbleupon, ma.gnolia, reddit (and a ton of others) has created an entirely new “social” experience of shared bookmarks, affinity/interest groups and voting systems (like Digg). And for Google and other search engines, this has meant the ability to compare on-page and link-text keyword analysis with a new third factor: tagging.
So now there are three ways Google can do the keyword analysis to work out what your page is about (and rank you accordingly): a) on-page factors, b) inbound link-text, and c) tagging/categorization on social bookmarking sites. And Google respects tagging because it reflects people’s idea of what you’re blog (or post) is about.
Further, I’m guessing Google is very sophisticated in how they analyze the content of social bookmarking sites. (I bet they have a bot and analytical apparatus purpose-built for this purpose.) Rest assured they factor in the number of times a given post of yours has been bookmarked, and how frequently a given tag is used.
Here, the important question for fellow bloggers is: How can I get my blog posts properly and extensively tagged across the verious social bookmarking sites?
The terms Social Media Optimization (SMO) has been coined to, in part, encompass the various answers to this question. Rohit Bhargava is credited with coining this term and it was he who first suggested a number of rules or goals for SMO. I suggest you start there.
My personal challenge to you would be to see this SMO thing as an exercise in establishing and maintaining mutually beneficial relationships. It’s in the mutuality of these online friendships that people bookmark, tag, vote for and in other ways express support of each other’s blogging efforts. But that’s subject deserves a post of its own.
(Note: it would be remiss of me not to make one more point on tagging: tag your post content properly. That’s the love Technorati, in particular, is looking for. When people bookmark your site to, say del.icio.us, they tag as they see fit. When you tag your own post content, your get the chance to cover all the bases you want covered. So get it right!)
Is Google Reading Your Mail?!
Read this carefully:
 References to the blog document by other sources may be a positive indication of the quality of the blog document. For example, content of emails or chat transcripts can contain URLs of blog documents. Email or chat discussions that include references to the blog document is a positive indicator of the quality of the blog document.
Are you thinking what I’m thinking?! Google has a massively popular hosted email service – GMail. They also have Google Talk, a chat service. You probably knew that. But did you know Google has intentions of crawling the content of your GMail emails and Google Talk chat sessions?! Now, I don’t know if they actually do that or not, and I haven’t gone hunting thru their terms of service seeking clarity, but their stated aim is clear: to find URLs in two key forms of personal online communications (email and chats), and to use these discoveries to further rank blogs and blog posts.
I have to say it makes perfect sense. Why? Because Google is looking to build a more and more accurate profile of your and my blog. And to do this Google wants to see corroborating evidence of popularity across as many different “media” as possible: web pages, blog posts, search results click patterns, blogrolls, social bookmarking services, and now email and chat session content. Wow… that’s called being thorough.
(Note: Google will no doubt also be analyzing chat content from other services where the “transcripts” are indexed. Twitter immediately comes to mind, here.)
So what’s the big question to be asking? I think it’s this: What can I do as a blog author to ensure that my posts are being linked to, in email and chat conversations? And my answer remains the same: consistently write compelling (sometimes controversial) content that people will want to point others to. You just can’t get past this one… you need your own high-quality content pumped out on a regular basis.
Some Concluding Questions
I’ve quoted just a few paragraphs from a much longer (and largely boring) patent application for a product that ended up being called Google Blogsearch. Reading through the bits of the application that made any sense to my non-legal mind, and comparing that to what I know of Google Blogsearch, I was left with a few questions I thought I’d bring to the ProBlogger community:
- Does it make sense to have Blogsearch separate from the main Google search engine? I’m not sure about that, but in dedicating a unique search service to blog content, Google is telling us that in some sense it’s a different kind of content with a different indexing algorithm applied to it.
- As Google Blogsearch gains in popularity, will new (or adjusted) SEO strategies emerge along with it? Is Blogsearch different enough to warrant different strategies? How different are the search results compared to the same query in the main Google search engine?
- How many people actually use Google Blogsearch (http://blogsearch.google.com)? I haven’t seen any data out there on the popularity of that service.
- Do you use it? Why? What do you like about it? Anything you don’t like about it?
- How do you find Google Blogsearch compares with Technorati? What are the major differences you have observed in their search results? Do you have a preference?
Enjoy this post? Get more like it by subscribing to our RSS feed.