Wikipedia Categories + YouTube Channels = TubeTerrain.com

Wikipedia Categories + YouTube Channels = TubeTerrain.com

For some vague reason, can’t possibly imagine why, I have been thinking about ways to find voices and perspective outside of the invertebrate media conglomerates which control television. There are several online video platforms which host many voices, the largest of which is probably YouTube.

The problem is that in my opinion YouTube’s search stinks. The filters are limited, there’s not much in the line of advanced syntax, and the search results don’t contain signals useful to the searcher, like number of subscribers and date of last video upload. This is frustrating because YouTube hosts a lot of great content. It’s just harder than it should be to find!

Last December I experimented with making a YouTube search based on Wikipedia. It worked pretty well; the keywords and dates provided by Wikipedia helped bring the YouTube results into a useful focus. (I will put that tool online if there’s interest, but I haven’t yet because it requires a YouTube API key.)

With that in mind, I decided to see if I could use Wikipedia data to find YouTube channels and put them into a directory. Wikidata has a YouTube Channel Property (P2397 for you nerds) and it’s easy to use the Wikidata Query Service to find those pages. I tried that, starting with a giant list of about 68,000 YouTube channel IDs to see if I could turn that into a useful directory.

The answer was no. The channel content was everywhere. A lot of them were low activity. (I checked a sample 3500 IDs and discovered over a third of them hadn’t had a video upload in over a year.) I threw away that list and decided to start over with the bright idea to check the Wikidata Query Service for all the pages with YouTube channels within a particular category, to give the list some kind of topical similarity. But apparently you can’t query Wikipedia category information with the WQS?

After a lot of doodling around which I will not go into here, I built a scraper that uses the Wikipedia API to go through a category and its subcategories and find the pages with a YouTube channel. That list gets fed to another program which uses the YouTube API and Wikipedia API to build a dataset about the channels. And now, there is a browser for those datasets called Tube Terrain.

Tube Terrain in action. The filtering keyword is "science" which contains 65 results. The first result, since they're sorted by number of subscribers, is Mark Rober.

Tube Terrain lets you browse datasets of YouTube channels built from Wikipedia categories/subcategories. The first dataset contains 1948 channels found in the YouTubers category and subcategories.

It’s very easy to use. The dataset is loaded in its entirety so you’re not searching so much as filtering the loaded contents. That makes it really fast after the initial dataset load (5-10 seconds.) The keyword filter looks at both the Wikipedia and the YouTube descriptions, so the search pool is slightly larger. You can filter by a number of other parameters, including language, country, number of subscribers, or even channel age. Thanks to Wikidata you can also filter by gender and occupation when that information is available.

There are plenty of sorting options too; you can sort by subscriber count, title, channel creation date, etc. Each listing item includes keywords, statistics, and a description of the channel when available. (There’s also a quick link to the channel’s RSS feed. Viva RSS!) There’s no Wikipedia-obtained information here; for that click on the Details button.

A detail panel from Tube Terrain. The person being shown is Eddie Woo, a teacher from Australia. A detail list on the right side of the panel shows information about Mr. Woo and his channel, while the left side has an embedded video player.

The details panel blends information from YouTube with Wikipedia information like occupation, gender, and a brief alternate description when available. There’s an embedded video player as well; click the Load Videos button and you’ll get a list of the most recent videos sourced from the channel’s RSS feed. (Channels which don’t allow embedded content will show an error and point you to YouTube to watch the video directly.)

At the moment Tube Terrain supports only one dataset, but I am the process of developing several others, including for scientists, academics, artists, activists, and politicians. Stay tuned. As always, Tube Terrain is free to use and free of ads.

Leave a Reply

Back To Top