A lax Web news feed parser
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

188 lines
5.0 KiB

Minimal feed:
input: >
<feed xmlns="http://www.w3.org/2005/Atom"/>
output:
format: atom
version: '1.0'
Atom ID:
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<id>http://example.com/</id>
</feed>
output:
format: atom
version: '1.0'
id: 'http://example.com/'
Atom ID with whitespace:
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<id>
http://example.com/
</id>
</feed>
output:
format: atom
version: '1.0'
id: 'http://example.com/'
Bogus ID before good:
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<id/>
<id>http://example.com/</id>
</feed>
output:
format: atom
version: '1.0'
id: 'http://example.com/'
Feed language:
input: >
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"/>
output:
format: atom
version: '1.0'
lang: en
Bogus feed language:
input: >
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang=""/>
output:
format: atom
version: '1.0'
Canonical URL:
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="self"/>
<link rel="self" href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
url: 'http://example.com/'
Feed link 1:
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="alternate" href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 2: # default relation is "alternate"
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="" href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 3: # default relation is "alternate"
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 4: # other relations are ignored
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="bogus" href="http://example.net/"/>
<link href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 5: # XHTML is preferred
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.net/"/>
<link href="http://example.com/" type="application/xhtml+xml; charset=utf-8"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 6: # HTML is even more preferred
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.net/"/>
<link href="http://example.org/" type="application/xhtml+xml; charset=utf-8"/>
<link href="http://example.com/" type="TEXT/HTML; charset=utf-8"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 7: # No type is better than an unacceptable type
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.net/" type="image/png"/>
<link href="http://example.org/" type="application/xml"/>
<link href="http://example.com/"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 8: # Bad URLs are ignored
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.com/" type="application/xhtml+xml; charset=utf-8"/>
<link href="http://[example.org]/" type="text/html; charset=utf-8"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Feed link 9: # The first matching relation wins
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="http://example.com/" type="text/html"/>
<link href="http://example.org/" type="text/html"/>
</feed>
output:
format: atom
version: '1.0'
link: 'http://example.com/'
Relative feed link:
doc_url: 'http://example.com/path/'
input: >
<feed xmlns="http://www.w3.org/2005/Atom">
<link href="/"/>
</feed>
output:
format: atom
version: '1.0'
meta:
url: 'http://example.com/path/'
link: ['/', 'http://example.com/path/']
Relative feed link with xml:base:
input: >
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://example.com/">
<link href="/" xml:base="path/"/>
</feed>
output:
format: atom
version: '1.0'
link: ['/', 'http://example.com/path/']