Tara Calishain

I dream in data flows

Perovskite Plain or Solar?

Experiment time!

Go do a Web search for ‘hairstyles “totally tubular”.’

Then do another search ‘hairstyles groovy.’

Of course the search results are VERY different in their content, but they do have something in common: they’re oriented to a particular era. The first search will get you lots of mentions of the 1980s in the results, while the second search will get you mentions of the 1970s. That’s because you used vocabulary (slang) that’s specific to a particular time period. You have made your search time-oriented without specifically making it a date-bounded search, just by adding a single keyword.

I love the idea of using words that weigh a search query to a certain era without date-restricting results. Using slang is an easy way to do it, as I’ve just shown. Proper names work well too in certain contexts, such as athlete tenure. “DeMar DeRozan” “Chicago Bulls” provides a much different search result compared to “Scottie Pippen” “Chicago Bulls”.

Unfortunately your average topic might not have an easily-applicable word of that type, or you might find you’ve been accidentally using a word which applied temporal weight to your search!

Take perovskite for example. Perovskite is a mineral. I learned about it via reading about innovations in solar panels. In fact, if I wanted to do a search for perovskite I would probably add the keyword solar.

But is that the right thing to do? Perovskite was discovered in the 19th century, so it doesn’t exist because of solar power. It wasn’t created as a result of the renewable energy industry. If I connected a perovskite search to the word “solar” would I be limiting my search temporally as well as topically? Thanks to Stract’s open search API and a common URL pattern I was able to make something to explore this question.

Many CMS platforms offer the option to structure a URL with a date included, like this:

rbfirehose .com/2024/03/19/technical-ly-wharton-created-a-free-series-for-entrepreneurs-to-learn-about-gen-ai/

As you can see, the month, day, and year of the post are included in the URL. If you use the search syntax inurl: to limit a search query to a year, you’re taking advantage of that URL pattern and creating a time-oriented search that’s robust against the inaccuracies introduced by dynamic content.

So I made a tool called UPTM (URL-Powered Time Machine) that lets you specify a base query (like perovskite) and a span of years (like 2005-2015). For each year, UPTM does an inurl: search of the year along with your base query (perovskite inurl:2005 , perovskite inurl:2006, etc.) and gets 100 results for each search. The titles of the 100 results are aggregated and analyzed for the most popular words, which are displayed on a list delineated by year.

Testing UPTM using the query perovskite showed that adding solar to my initial perovskite query would indeed restrict my search results by time as well as topically. Solar does not appear as a popular title keyword for the query perovskite until 2013. Indeed, the query perovskite itself does not appear as a popular title keyword until 2015, indicating that using intitle:perovskite for a query might also inappropriately restrict my search results.

While I’m pleased with how much this tool can teach me I find myself confronted with a barely-scratched surface. I want to do more analysis of the most popular words, maybe do some word analysis with Doug Beeferman’s Datamuse tools. There’s a lot I could do with ChatGPT calls in terms of analysis, but then I couldn’t add it to SearchTweaks. Stay tuned.

...