A lax Web news feed parser
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

292 lines
6.9 KiB

Minimal feed:
input: >
<rss version="2.0"/>
output:
format: rss
version: '2.0'
Channel GUID:
input: >
<rss>
<channel>
<guid isPermalink="false">http://example.com/</guid>
</channel>
</rss>
output:
format: rss
id: 'http://example.com/'
Channel GUID with whitespace:
input: >
<rss>
<channel>
<guid isPermalink="false">
http://example.com/
</guid>
</channel>
</rss>
output:
format: rss
id: 'http://example.com/'
Root GUID: # Any elements on the RSS2 root element should be ignored
input: >
<rss>
<channel/>
<guid isPermalink="false">http://example.com/</guid>
</rss>
output:
format: rss
Bogus GUID before good:
input: >
<rss>
<channel>
<guid/>
<guid isPermalink="false">http://example.com/</guid>
</channel>
</rss>
output:
format: rss
id: 'http://example.com/'
Schedule interval 1:
input: >
<rss><channel>
<ttl>60</ttl>
</channel></rss>
output:
format: rss
sched:
interval: PT60M
Schedule interval 2:
input: >
<rss><channel>
<ttl>bogus</ttl>
<ttl>60</ttl>
</channel></rss>
output:
format: rss
sched:
interval: PT60M
Schedule interval 3:
input: >
<rss><channel>
<ttl>
0120
</ttl>
</channel></rss>
output:
format: rss
sched:
interval: PT120M
Schedule interval 4:
input: >
<rss><channel>
<ttl>0</ttl>
</channel></rss>
output:
format: rss
Skip days:
input: >
<rss><channel>
<skipDays>
<day>MONDAY</day>
<day>sunday</day>
<day>TuE </day>
<day>bogus</day>
</skipDays>
</channel></rss>
output:
format: rss
sched:
skip: 1124073472 # 0b1000011000000000000000000000000
Skip all days:
input: >
<rss><channel>
<skipDays>
<day>mon</day>
<day>tue</day>
<day>wed</day>
<day>thu</day>
<day>fri</day>
<day>sat</day>
<day>sun</day>
</skipDays>
</channel></rss>
output:
format: rss
sched:
expired: true
skip: 2130706432 # 0b1111111000000000000000000000000
Skip hours:
input: >
<rss><channel>
<skipHours>
<hour>24</hour>
<hour>04</hour>
<hour> 9</hour>
<hour>bogus</hour>
<hour>25</hour>
</skipHours>
</channel></rss>
output:
format: rss
sched:
skip: 529 # 0b000000000000001000010001
Skip all hours:
input: >
<rss><channel>
<skipHours>
<hour>00</hour>
<hour>01</hour>
<hour>02</hour>
<hour>03</hour>
<hour>04</hour>
<hour>05</hour>
<hour>06</hour>
<hour>07</hour>
<hour>08</hour>
<hour>09</hour>
<hour>10</hour>
<hour>11</hour>
<hour>12</hour>
<hour>13</hour>
<hour>14</hour>
<hour>15</hour>
<hour>16</hour>
<hour>17</hour>
<hour>18</hour>
<hour>19</hour>
<hour>20</hour>
<hour>21</hour>
<hour>22</hour>
<hour>23</hour>
</skipHours>
</channel></rss>
output:
format: rss
sched:
expired: true
skip: 16777215 # 0b111111111111111111111111
Skip all days and hours:
input: >
<rss><channel>
<skipDays>
<day>mon</day>
<day>tue</day>
<day>wed</day>
<day>thu</day>
<day>fri</day>
<day>sat</day>
<day>sun</day>
</skipDays>
<skipHours>
<hour>00</hour>
<hour>01</hour>
<hour>02</hour>
<hour>03</hour>
<hour>04</hour>
<hour>05</hour>
<hour>06</hour>
<hour>07</hour>
<hour>08</hour>
<hour>09</hour>
<hour>10</hour>
<hour>11</hour>
<hour>12</hour>
<hour>13</hour>
<hour>14</hour>
<hour>15</hour>
<hour>16</hour>
<hour>17</hour>
<hour>18</hour>
<hour>19</hour>
<hour>20</hour>
<hour>21</hour>
<hour>22</hour>
<hour>23</hour>
</skipHours>
</channel></rss>
output:
format: rss
sched:
expired: true
skip: 2147483647 # 0b1111111111111111111111111111111
Feed language:
input: >
<rss><channel>
<language>ja</language>
</channel></rss>
output:
format: rss
lang: ja
Feed link:
input: >
<rss><channel>
<link/>
<link>http://[example.net]/</link>
<link>http://example.com/</link>
</channel></rss>
output:
format: rss
link: 'http://example.com/'
Feed link via GUID 1:
input: >
<rss><channel>
<guid>http://example.com/</guid>
</channel></rss>
output:
format: rss
id: 'http://example.com/'
link: 'http://example.com/'
Feed link via GUID 2:
input: >
<rss><channel>
<guid isPermalink='true'>http://example.com/</guid>
</channel></rss>
output:
format: rss
id: 'http://example.com/'
link: 'http://example.com/'
GUID not a link:
input: >
<rss><channel>
<guid isPermalink='maybe not'>http://example.com/</guid>
</channel></rss>
output:
format: rss
id: 'http://example.com/'
Explicit link preferred:
input: >
<rss><channel>
<guid>http://example.net/</guid>
<link>http://example.com/</link>
</channel></rss>
output:
format: rss
id: 'http://example.net/'
link: 'http://example.com/'
GUID not a url:
input: >
<rss><channel>
<guid>http://[example.com]/</guid>
</channel></rss>
output:
format: rss
id: 'http://[example.com]/'