My thinking about Web search and making the most useful query possible has focused a lot recently on the idea of query-as-cloud-of-topics; instead of thinking about George Washington as a singular search terms you might think of him as a structure encompassing every possible way you can contextually describe George Washington – dentures, cherry trees, his military career, his political endeavors, his slave ownership, etc. Applying any of those contextual elements to a query for George Washington will bring you very different results; we are long past the time when a simple topical Web search would bring you all relevant, credible resources from around the Web.
If you can think about George Washington that way it’s an easy step to place Wikipedia articles within the same metaphor; George Washington’s article as a container, all the other articles it links to as the elemental contexts for the topic of George Washington.
So here we are looking at this George Washington bucket filled to the brim with George Washington vibes. But that’s only a partial step to building a good search query; you need some additional filter to choose which topic elements are best applied to you search. So I’ve been experimenting.
Previously I’ve tried to use Wikipedia page views data to introduce a time/public interest element into my search query building, but that was comparing a topic to itself in terms of public interest over time. Instead, here I want to to compare public interest in one of the contextual elements of a topic — one of the intra-wiki links within a Wikipedia article — against public interest in the topic itself. I want to see if the public interest in a topic “resonates” with one of its contextual elements in the form of similar public interest over a given time period. If it does, you could reasonably expect for that topical element to have a major impact on that topic during that time period.
But how to do the comparisons? I didn’t want to just compare spikes or view surges because those might not parallel perfectly. Instead I tried to analyze the page views more three-dimensionally and give them a “signature” that the other page views could be compared to.
It’s been very interesting but I think it will be more interesting when I can compare multiple time periods at once and see how elemental relevance waxes and wanes against a topic over time.