Building a RSS News Aggregator with GitHub Pages
I find myself doom scrolling through content that an algorithm somewhere has decided I want to read. My browser’s “new tab” page often shows articles I don’t care about or don’t want to see. I realized I could take control by setting my own homepage. A page that shows the news I actually want to read, because I curated the list myself. I understand I could likely find an RSS reader website and build a feed that way, but this seemed like a simple enough problem that I should be able to do it myself without worrying about ads or algorithms.
The Problem
I need a webpage that aggregates content I care about without maintaining a database or paying for hosting.
The Solution
Build a static website on GitHub Pages. It doesn’t need to be a Hugo site like my blog, just a static HTML page with some CSS that AI can help me refine since I’m not an HTML expert.
Tech Stack:
- GitHub Pages (free hosting, already have a custom domain for my blog)
- GitHub Actions (cron job to fetch RSS feeds on a schedule)
- Python using feedparser library (parse RSS feeds and store to a JSON file)
- JavaScript (client-side features like light/dark mode)
- Static HTML (fetch and display the JSON file)
Project Structure
gereader-homepage
├── .github
│ └── workflows
│ └── fetch-feeds.yml
├── .gitignore
├── archive.html
├── archive.json
├── feeds.json
├── index.html
└── scripts
├── fetch_rss.py
└── requirements.txt
View the complete project on GitHub
Parsing RSS with Python
Starting out I needed a method to parse content from various sources. I remembered an ancient technology I had heard of in the early 2000s called RSS, so I spent a little time with Google and found that it was still around. Sometimes feeds were in RSS, other times they were in the modern standard Atom. I quickly learned that the feedparser library handled both of these standards in a consistent way.
At a high level, I created a Python script with a list of feeds I want to subscribe to. Each feed is a dictionary containing the URL, the title I want to display, and any manual tags I want to set (we’ll also get tags from the feeds directly).
The feeds are then fetched, parsed, sorted, and filtered by a date range, then output into appropriate JSON files: feeds.json
for my main page and archive.json
for items filtered out by date of posting.
Here’s an example of what an output looks like in JSON. Each article contains metadata that makes it easy to display and filter on the frontend:
"articles": [
{
"title": "Netpicker NetBox Plugin and Automation",
"link": "https://www.packetswitch.co.uk/netpicker-netbox-plugin-and-automation/",
"published": "Sat, 11 Oct 2025 09:58:58 GMT",
"published_parsed": "2025-10-11T09:58:58+00:00",
"source": "Packet Switch",
"image": "https://www.packetswitch.co.uk/content/images/2025/10/netpicker-2-.png",
"summary": "In this post, we'll focus on Netpicker Automation and how to use the Netpicker plugin with Netbox. This post assumes you already have a functioning Netpicker",
"tags": [
"netdevops",
"networking"
]
}
]
GitHub Actions
For my GitHub Actions workflow, I set an arbitrary schedule to pull new content based on when I’m likely to be looking at news to try to keep the feed as fresh as possible.
on:
schedule:
# Runs at Midnight, 7am, 11am, 1pm, 4pm, 6pm Pacific (UTC-7)
- cron: '0 1,7,14,18,23 * * *'
The action runs my Python script at the scheduled interval and commits the feeds.json
and archive.json
files to GitHub.
Permission Requirement
GitHub Actions needs to be able to push changes to the repository.
- Repository Settings → Actions → General → Workflow permissions
- Select: “Read and write permissions”
Frontend HTML Static Site
I have some basic experience with HTML so I relied on AI to help me get this functioning beyond basic tables. With the help of AI, I was able to introduce features like filters based on tags, searching, and light/dark mode toggles. The JavaScript is the biggest component that was defined by AI. You can see the HTML in the repo if you want to define your own, but the basic premise is:
- Read from the
feeds.json
file - Display each article
- Limit articles to 20 per page
- Mark articles as read and store client-side in localStorage
You can validate the website is working locally for development by running a Python web server:
python3 -m http.server 8000
Configure GitHub Pages
You’ll need to set your repo to enable GitHub Pages.
Repo Settings > Code and Automation > Pages
Set your “Source” to Deploy from Branch
and set the branch to your desired branch. I’m using Main.
Setup Custom Domain for GitHub Pages
I covered this in my Set Up a Hugo Blog post in more detail, but the basics are:
- Update GitHub Pages
Custom Domain
to your custom domain - Update the DNS for your domain (my domain registrar is Cloudflare) to have a CNAME pointed to GitHub
- Wait for GitHub to validate DNS and tick the “Enforce HTTPS” option if you want to use an SSL certificate provided by GitHub
Project Repository
comments powered by Disqus