Use a Shortcut to parse and extract data for a specific city from an Air Quality RSS feed

Given a web-based RSS feed of Air Quality data for multiple cities, extract the data for a specific city and correct the HTML representations °, ´, " and remove
tags from the result.

The first step is to understand the structure of the RSS content so one can properly format the xmllint XPATH request. This can be achieved with curl and the RSS feed html address:

/usr/bin/curl -s "https://uk-air.defra.gov.uk/assets/rss/forecast.xml" > uk.rss

That structure is:

//rss/channel/item/description/text()

and a typical entry is of the format:

<rss version="2.0">
  <channel>
      <item>
           <title>LONDON CITY AIRPORT</title>
          <description><!CDATA[Location: 51&deg;30&acute;17.28&quot;N    0&deg;03&a
   cute;28.80&quot;E <br />index levels are forecast to be  <br />Thu: 3 Fri: 3 Sat: 3 Sun: 3
  Mon: 3 ]]></description>
          <pubDate>Thu, 19 Mar 2026 06:00:00 +0000</pubDate>
     </item>
  ⋮
  </channel>
</rss>

Now that I have the RSS structure, if I want the Air Quality data for the CITY OF LONDON, I can form an XPATH like this:

/usr/bin/xmllint --xpath "string(//rss/channel/item[title='CITY OF LONDON']/description/text())" --format -

where the text() component gets the content of the CDATA.

When I pipe the output of curl into this xmllint xpath, I get:

Location: 51&deg;30&acute;36.72&quot;N    0&deg;05&acute;1.32&quot;W <br />index levels are forecast to be  <br />Thu: 3 Fri: 3 Sat: 3 Sun: 3 Mon: 3

Too make this more legible, I use a Zsh associative array to replace the HTML jargon with their character equivalents:

Location: 51°30´36.72"N    0°05´1.32"W  index levels are forecast to be Thu: 3 Fri: 3 Sat: 3 Sun: 3 Mon: 3 

The Shortcut to achieve the above does not use curl but a Shortcut Get action and a Run Shell Script that takes that input from STDIN.

1 Like