Google Patents Indexing, Retrieval of Blogs
This week, Google Inc. was assigned a patent by the United States Patent and Trademark Office USPTO for the invention of a system and method for indexing and retrieval of blogs.
You can find details of this new patent 7765209 in IP.com's Intellectual Property Library, which includes detailed information for this patent, first applied for in 2005 and granted in 2010 for what appears to be the core patent for Google's system and method of indexing blogs, which are now included in the results of a Google search.
The patent is abstracted as follows: "A system may receive a feed associated with a blog. The system may extract information from the feed and the blog and create a hybrid document based on the extracted information. The system may further use the hybrid document to determine a relevance of the blog to a search query."
BACKGROUND OF THE INVENTIONThe World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web pages in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results (e.g., web pages) to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web pages that contain the user's search terms are identified as search results and are returned to the user as links.
Over the past few years, a new medium, called a blog, has appeared on the web. Blogs (short for web logs) are publications of personal thoughts that are typically updated frequently with new journal entries, called posts. The content and quality of blogs and their posts can vary greatly depending on the purpose of the authors of the blogs. As blogging becomes more popular, the ability to provide quality blog search results becomes more important. [emphasis added]
SUMMARY OF THE INVENTION
In accordance with one implementation consistent with the principles of the invention, a method may include receiving a feed; fetching a blog and one or more posts associated with the feed; extracting information from the feed, the blog, and one or more posts; creating a hybrid document based on the extracted information; and using the hybrid document to determine a relevance of the blog or the one or more posts to a search query.
In another implementation consistent with the principles of the invention, a device includes a memory to store instructions and a processor. The processor executes the instructions to receive a search query, determine a relevance of a blog or a blog post to the search query based on information extracted from the blog or blog post and information extracted from at least one other source, and provide information relating to the blog or the blog post when the blog or the blog post is determined to be relevant to the search query.
In yet another implementation consistent with the principles of the invention, a method may include receiving a search query; determining a relevance of a first set of documents to the search query using a second set of documents, where the first set of documents includes blogs and blog posts and the second set of documents includes hybrid documents created from the first set of documents and at least one other source; and providing information regarding documents in the first set of documents determined to be relevant.
In still another implementation consistent with the principles of the invention, a method may include receiving a search query; identifying a first set of documents to provide in response to the search query based on a second set of documents; and providing information relating to the identified first set of documents.
In yet still another implementation consistent with the principles of the invention, a method may include receiving feeds associated with blogs, extracting first information from the feeds, extracting second information from the blogs and associated posts, creating hybrid documents based on the first information and the second information, receiving a search query, determining a relevance of the blogs or posts to the search query based on the hybrid documents, and providing information relating to the blog or posts determined to be relevant.
Click here for details including the specific claims of Google's patent for indexing and retrieving blog posts, which have been included in Google search results for some time now. The patent description concludes with this caveat:
It will be apparent to one of ordinary skill in the art that aspects of the invention, as described above, may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. The actual software code or specialized control hardware used to implement aspects consistent with the principles of the invention is not limiting of the invention. Thus, the operation and behavior of the aspects were described without reference to the specific software code—it being understood that one of ordinary skill in the art would be able to design software and control hardware to implement the aspects based on the description herein.



Congratulations to Dennis Crouch at 





