PicoFeed content scraper integration #60
Labels
No labels
admin tools
api
bug
documentation
duplicate
enhancement
feature
help wanted
in progress
internals
invalid
packaging
question
testing
trivial
wontfix
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: MensBeam/Arsse#60
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
While not part of the API, extant versions of NextCloud news do support full-text scraping of articles in the Web interface. We should support this in the backend as well, especially as it will be an exposed feature in the v2 API.
One wrinkle: feeds are not owned, so whose setting is authoritative when the feed is shared between users? Contention is probably unlikely, but it is possible.
Ideally changing whether a feed is full-content for one user should not affect other users. Since feeds are deduplicated, there's probably only one way to handle this correctly:
Database::articleMark()
method would need to take these multiple feeds IDs and times into consideration to mark the correct articlesOptions for how to represent and handle scraping preferences are, I believe, as follows:
Scraped content creates a separate feed when needed, as above
Pros:
Cons:
Content is scraped if any subscription requests it; stored as separate full-content column
Pros:
Cons:
Content is always scraped; setting only changes which column to return
Pros:
Cons:
Content scraping has now been exposed for Miniflux using the second strategy above as of
86897af0b3
. If scraping was manually enabled previously it will remain enabled. Subscriptions which have scraping enabled will see scraped content and will also be able to search scraped content; those who do not will not.