Building a Local News Aggregator With 1000+ Sources

Building a Local News Aggregator With 1000+ Sources

As national-level media continues to conglomerate into shapes ever more horrifying, I have been thinking a lot about local news and how it might be made more discoverable and searchable.

Last week I went poking around and found a local news sources dataset from the Media and Democracy Project. I used that as a quarry and mined it for RSS feeds with a couple of Google Apps Scripts. I’m not finished with that dataset (which is why some states aren’t represented yet) but so far I’ve come up with 1008 verified RSS feeds for local news sources in America. And I wanted to do something with them.

Before this project the “back end” of my sites was either a) nothing, b) a JSON file, or c) a Google Sheet. I didn’t feel like what I was doing warranted the complexity of a database. But with this project I wanted to fetch RSS feeds for a thousand sites and do it multiple times a day so they’d stay fresh. Google Drive has a daily limit of fetches it’ll do, and keeping RSS feeds fresh would blow that limit out of the water. I needed to move that function to my own server, which meant I needed more than the shared hosting I have been using for a long time. So I upgraded my hosting to a VPS (Virtual Private Server).

Then I sat down and did some reading and did some searching and talked to Claude a bit, and ended up getting a free tier account at Supabase. And now those 1008 RSS feeds are building up a database of local news, fetching new content every four hours.

Have you ever seen The Three Stooges get stuck in a doorway? That’s my mind trying to think about all the ways I’d like to slice and dice this dataset. I try to think of too many things at a time and my brain reboots.

So I did a basic start at https://localsearchamerica.com/local-news.html . The indexed content is browsable by state, searchable by keyword, or both. Remember the indexed content is RSS feeds, so it’s mostly snippets. Searches should be general and single word or single phrase. I have found it very instructive to do searches of country-wide topics and see what different states are talking about. I just did a search for data center. Here are the results:

Local News Feed (part of Local Search America) showing the results for the search 'data center'. The results show the name of the source, its state, its date, the headline, a snippet, and a link to read the entire article.

In those six results, six states were represented: California, Missouri, Louisiana, Mississippi, Kentucky, and Georgia. I like to know that – the states the sources came from – at a glance, and understand that these are six states with six different angles. From there I might zoom in on a state that looked particularly relevant to my interests. I also found it interesting to take a topic like Epstein and just look at it state by state by state. (That will get even more interesting as the database populates. I just started aggregating.)

I have plans for a number of enhancements, including RSS feeds, topic-specific pages filterable by state, and possibly some careful AI use for topic labeling and concept extraction, which will make things like emerging topics and topic leaderboards possible.

I’ve never made my own growing data pool before and I’m looking forward to splashing about in it!

Leave a Reply

Back To Top