Skip to content

Filtering Out Items in RSS Feeds with PHP

Updated: at 04:12 PM

I have been a devoted fan of RSS feeds for over 15 years, which is why I developed FeedPress. While I appreciate text RSS feeds, podcasts have become the most popular means of consuming RSS content.

Lately, I’ve observed a trend among podcast editors, particularly in France and major media networks, where they insert episodes from other podcasts into popular ones as a form of piggybacking. However, when I subscribe to a specific podcast, it’s because I want to listen to that specific podcast, not others.

After being increasingly bothered by this for several months, I decided to create my own RSS feed proxy to filter out items that are not what I desire in my podcasts. Of course, I could simply skip to the next episode on my podcast app, but where’s the fun in that?

SimpleXML to the rescue

To begin, you will need to retrieve the RSS feed using PHP’s built-in functions. The most common way is by using the simplexml_load_file() function, which reads the XML content and returns a SimpleXMLElement object.

Once you have the feed, you can iterate through its items using a foreach loop. The items are usually represented by child elements within the <item> tag.

A fully functioning script appears as follows. In this example, I am eliminating items that contain the phrase « SPAM ».

<?php

// Replace 'YOUR_FEED_URL' with the actual feed URL
$feedUrl = 'YOUR_FEED_URL';

// Load the feed
$xml = simplexml_load_file($feedUrl);

if (!$xml) {
    echo 'Unable to load the feed. Please check the URL.';
    exit;
}

// Filtering entries  
for ($i = count($xml->channel->item) - 1; $i >= 0; $i--) {  
    $item = $xml->channel->item[$i];  
    // Remove the entry if it contains a bad word
    if (stripos((string) $item->title, 'SPAM') !== false) {  
        unset($xml->channel->item[$i]);  
        break;  
    }
}  
      
// Output the modified feed  
header('Content-type: text/xml');  
echo $xml->asXML();

You can notice that I’m looping through the item elements in reverse order. This reverse loop is crucial because it prevents issues that can arise from modifying a sequence while iterating over it (e.g., skipping elements after a removal).

Wrapping that into a Docker container

Since I want to run this script on my Kubernetes cluster, I need to create the corresponding Dockerfile to build my container. A straightforward Dockerfile based on PHP Alpine should suffice.

Once I configured my cluster according to the feeds I am subscribed to (including specific filters, of course), all I had to do was update the URLs of my feeds in Overcast. Now I can freely enjoy my podcasts without any spam.