Also transfer any existing scraper booleans on database upgrade. It was
previously possible to enable scraping manually by editing the database,
and these settings will be honoured.
- The Database class is now responsible for preparing rules
- Rules are now returned in an array keyed by user
- Empty strings are now passed through during rule preparation
- Whitespace is now collapsed before evaluating rules
- Feed tests are fixed to retrieve a dumy set of rules
- Rule evaluation during feed parsing also filled out
The driver itself has not been expnaded; more is probably required to ensure
metadata is kept in sync and users created when the internal database does
not list a user an external database claims to have
This finally brings PostgreSQL to parity with SQLite and MySQL.
Two tests casting binary data to text were removed since behaviour here
should in fact be undefined
Accountinf for any encoding when retrieving data will be addressed by
a later commit
While IPv6 address normalization was originally planned, this was deemed
too much effort to bother with such a niche feature; IPv6 addresses are
instead passed through unmodified
Experience has proved programmatically setting joins is not useful, and
getting the types and values of query parts was not being maintained.
The programmatic setting of GROUP BY may be useful in future, however.
In theory the import (as opposed to parse) routine could be used for any
format; this could be used to implement an ad hoc JSON format to avoid
the loss of commas in tags with OPML
Folder 0 (the root folder) is a valid, though nonsensical selection:
using it as a positive option is the same as not using the option at
all, and using it as a negative option necessarily yields an empty set.
However, it can in some contexts be validly specified, and so it should
be handled consistently. It had not been previously, but is now.
- Ensure the last refresh time is included in authenticated requests
- Use a partial mock in auth tests so that other processing does not
get in the way of results
- Make sure the group list includes unused groups
- Make sure the update time of subscriptions is correct
Tokens are similar to sessions in that they stand in for users, but the
protocol handlers will manage them; Fever login hashes are the
originating use case for them. These must never expire, for example,
and we need to specify their values.
This commit also performs a bit of database clean-up
- Exec and lock timeouts now apply to MySQL
- Lock timeout now applies to PostgreSQL
- SQLite now uses a generic lock timeout setting which applies to all
- Each parameter is checked for type and normalized
- Interval strings are converted to DateInterval objects
- Timeouts can be specified as interval strings
- Most intervals can be null to signify infinity
- Driver classes are checked that they implement the correct interface
- Short driver names may be used, and are used by default
- Helpful errors messages are printed in case of erroneous configuration
Exporting is currently broken; this will be fixed in an upcoming commit
Three test failures remain, but these are minor and will be resolved
soon. Handling of binary data is also broken, but given that this works
fine with the PDO driver, there is presumably some correct method.
No testing has been performed yet, but changes are extensive enough to
warrant a commit. Of particular note:
- SQL states are enumerated in a separate trait to reduce duplication
- PDOStatement is now an abstract class to avoid duplication of
engine-specific error handling
- Error handling has been cleaned up somewhat
Reasons for failures included an unhandled error code, erroneous sorting
assumptions, and a broken computation of the next insert ID in tests
Five failures remain.
This involved changes to the driver interface as well as the database
schemata. The most significantly altered queries were for article
selection and marking, which relied upon unusual features of SQLite.
Overall query efficiency should not be adversely affected (it may have
even imprved) in the common case, while very rare cases (not presently
triggered by any REST handlers) require more queries.
One notable benefit of these changes is that functions which query
articles can now have complete control over which columns are returned.
This has not, however, been implemented yet: symbolic column groups are
still used for now.
Note that PostgreSQL still fails many tests, but the test suite runs to
completion. Note also that one line of the Database class is not
covered; later changes will eventually make it easier to cover the line
in question.
PDO does not adequately inform PostgreSQL of a parameter's type, so type
casts are required. Rather than adding these to each query manually, the
queries are instead processed to add type hints automatically.
Unfortunately the queries are processed rather naively; question-mark
characters in string constants, identifiers, regex patterns, or geometry
operators will break things spectacularly.
Tests for different drivers will have their own files, but all derive
from a common prototype test series where applicable, similar to the
existing arrangement for database function tests. However, the prototype
will reside with other test cases rather than in the library path. The
database function test series will hopefully be moved as well in time.
The Arsse is a news aggregator server which implements multiple synchronization protocols, including [version 1.2][NCNv1] of [NextCloud News][NCN]' protocol and the [Tiny Tiny RSS][TTRSS] protocol (details below). Unlike most other aggregator servers, The Arsse does not include a Web front-end (though one is planned as a separate project), and it relies on existing protocols to maximize compatibility with existing clients.
The Arsse is a news aggregator server, written in PHP, which implements multiple synchronization protocols. Unlike most other aggregator servers, The Arsse does not include a Web front-end (though one is planned as a separate project), and it relies on existing protocols to maximize compatibility with existing clients.
At present the software should be considered in an "alpha" state: though its core subsystems are covered by unit tests and should be free of major bugs, not everything has been rigorously tested. Additionally, many features one would expect from other similar software have yet to be implemented. Areas of future work include:
Information on how to install and use the software can be found in [the manual](https://thearsse.com/manual/), which is available online as well as with every copy of the software. This readme file instead focuses on how to set up a programming environment to modify the source code.
- Support for more database engines (PostgreSQL, MySQL, MariaDB)
- Providing more sync protocols (Google Reader, Fever, others)
- Complete tools for managing users
- Better packaging and configuration samples
# Installing from source
## Requirements
The main repository for The Arsse can be found at [code.mensbeam.com](https://code.mensbeam.com/MensBeam/arsse/), with a mirror also available [at GitHub](https://github.com/mensbeam/arsse/). The GitHub mirror does not accept bug reports, but the two should otherwise be equivalent.
The Arsse has the following requirements:
[Composer](https://getcomposer.org/) is required to manage PHP dependencies. After cloning the repository or downloading a source code tarball, running `composer install` will download all the required dependencies, and will advise if any PHP extensions need to be installed. If not installing as a programming environment, running `composer install --no-dev` is recommended.
- A Linux server utilizing systemd and Nginx (tested on Ubuntu 16.04)
- PHP 7.0.7 or later with the following extensions:
- [intl](http://php.net/manual/en/book.intl.php), [json](http://php.net/manual/en/book.json.php), [hash](http://php.net/manual/en/book.hash.php), and [pcre](http://php.net/manual/en/book.pcre.php)
- [dom](http://php.net/manual/en/book.dom.php), [simplexml](http://php.net/manual/en/book.simplexml.php), and [iconv](http://php.net/manual/en/book.iconv.php) (for picoFeed)
- [sqlite3](http://php.net/manual/en/book.sqlite3.php) or [pdo_sqlite](http://ca1.php.net/manual/en/ref.pdo-sqlite.php)
- Privileges to create and run daemon processes on the server
# Repository structure
## Installation
## Library code
At present, installation of The Arsse is rather manual. We hope to improve this in the future, but for now the steps below should help get you started. The instructions and configuration samples assume you will be using Ubuntu 16.04 (or equivalent Debian) and Nginx; we hope to expand official support for different configurations in the future as well.
The code which runs The Arsse, contained in `/arsse.php`, is only a short stub: the application itself is composed of the classes found under `/lib/`, with the main ones being:
| `REST.php` | General protocol handler for CORS, HTTP auth, etc. |
| `Arsse.php` | Singleton glueing the parts together |
1. Extract the tar archive to `/usr/share`
2. If desired, create `/usr/share/arsse/config.php` using `config.defaults.php` as a guide. The file you create only needs to contain non-default settings. The `userPreAuth` setting may be of particular interest
3. Copy `/usr/share/arsse/dist/arsse.service` to `/lib/systemd/system`
4. In a terminal, execute the following to start the feed fetching service:
The `/lib/Database.php` file is the heart of the application, performing queries on behalf of protocol implementations or the command-line interface.
``` sh
sudo systemctl enable arsse
sudo systemctl start arsse
```
Also necessary to the functioning of the application is the `/vendor/` directory, which contains PHP libraries which The Arsse depends upon. These are managed by Composer.
### Configuring the Web server and PHP
## Supporting data files
Sample configuration parameters for Nginx can be found in `arsse/dist/nginx.conf` and `arsse/dist/nginx-fcgi.conf`; the samples assume [a server group](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream) has already been defined for PHP. How to configure an Nginx service to use PHP and install the required PHP extensions is beyond the scope of this document, however.
The `/locale/` and `/sql/` directories contain human-language files and database schemata, both of which are occasionally used by the application in the course of execution. The `/www/` directory serves as a document root for a few static files to be made available to users by a Web server.
### Adding users
The `/dist/` directory, on the other hand, contains samples of configuration for Web servers and init systems. These are not used by The Arsse itself, but are merely distributed with it for reference.
The Arsse currently includes a `user add <username> [<password>]` console command to add users to the database; other user management tasks require manual database edits. Alternatively, if the Web server is configured to handle authentication, you may set the configuration option `userPreAuth` to `true` and The Arsse will defer to the server and automatically add any missing users as it encounters them.
## Documentation
## Installation from source
The source text for The Arsse's manual can be found in `/docs/`, with pages written in [Markdown](https://spec.commonmark.org/current/) and converted to HTML [with Daux](#building-the-manual). If a static manual is generated its files will appear under `/manual/`.
If installing from the Git repository rather than a download package, you will need to follow extra steps before the instructions in the section above.
In addition to the manual the files `/README.md` (this file), `/CHANGELOG`, `/UPGRADING`, `/LICENSE`, and `/AUTHORS` also document various things about the software, rather than the software itself.
First, you must install [Composer] to fetch required PHP libraries. Once Composer is installed, dependencies may be downloaded with the following command:
PHPUnit's configuration can be customized by copying its configuration file to `/tests/phpunit.xml` and modifying the copy accordingly.
## License
## Tooling
The Arsse is made available under the permissive MIT license. See the `LICENSE` and `AUTHORS` files included with the distribution or source code for exact legal text and copyright holders. Dependencies included in the distribution may be governed by other licenses.
The `/vendor-bin/` directory houses the files needed for the tools used in The Arsse's programming environment. These are managed by the Composer ["bin" plugin](https://github.com/bamarni/composer-bin-plugin) and are not used by The Arsse itself. The following files are also related to various programming tools:
| `/.gitattributes` | Git settings for handling files |
| `/.gitignore` | Git file exclusion patterns |
| `/.php_cs.dist` | Configuration for [php-cs-fixer](https://cs.symfony.com) |
| `/.php_cs.cache` | Cache for php-cs-fixer |
| `/composer.json` | Configuration for Composer |
| `/composer.lock` | Version synchronization data for Composer |
| `/RoboFile.php` | Task definitions for [Robo](https://robo.li/) |
| `/robo` | Simple wrapper for executing Robo on POSIX systems |
| `/robo.bat` | Simple wrapper for executing Robo on Windows |
Please refer to `CONTRIBUTING.md` for guidelines on contributing code to The Arsse.
In addition the files `/package.json`, `/yarn.lock`, and `/postcss.config.js` as well as the `/node_modules/` directory are used by [Yarn](https://yarnpkg.com/) and [PostCSS](https://postcss.org/) when modifying the stylesheet for The Arsse's manual.
## Protocol compatibility notes
# Common tasks
### General
We use a tool called [Robo](https://robo.li/) to simplify the execution of common tasks. It is installed with The Arsse's other dependencies, and its configured tasks can be listed by executing `./robo` without arguments.
#### Type casting
## Running tests
The Arsse does not guarantee it will handle type casting of input in the same way as reference implementations for its supported protocols. As a general rule, clients should endeavour to send only correct input.
The Arsse has an extensive [PHPUnit](https://phpunit.de/) test suite; tests can be run by executing `./robo test`, which can be supplemented with any arguments understoof by PHPUnit. For example, to test only the Tiny Tiny RSS protocol, one could run `./robo test --testsuite TTRSS`.
The Arsse does, however, guarantee _output_ to be of the same type. If it is not, this is [a bug][newIssue] and should be reported.
There is also a `test:quick` Robo task which excludes slower tests, and a `test:full` task which includes redundant tests in addition to the standard test suite
#### Content sanitization
### Test coverage
The Arsse makes use of the [picoFeed] newsfeed parsing library to sanitize article content. The exact sanitization parameters may differ from those of reference implementations for protocols The Arsse supports.
Computing the coverage of tests can be done by running `./robo coverage`, after which an HTML-format coverage report will be written to `/tests/coverage/`. Either [PCOV](https://github.com/krakjoe/pcov), [Xdebug](https://xdebug.org), or [phpdbg](https://php.net/manual/en/book.phpdbg.php) is required for this. PCOV is generally recommended as it is faster than Xdebug; phpdbg is faster still, but less accurate. If using either PCOV or Xdebug, the extension need not be enabled globally; PHPUnit will enable it when needed.
### NextCloud News v1.2
## Enforcing coding style
As a general rule, The Arsse should yield the same output as the reference implementation for all valid inputs (otherwise you've found [a bug][newIssue]), but there are exception, either because the NextCloud News (hereafter "NCN") [protocol description][NCNv1] is at times ambiguous or incomplete, or because implementation details necessitate it differ; this section along with the General section above detail these differences.
The [php-cs-fixer](https://cs.symfony.com) tool, executed via `./robo clean`, can be used to rewrite code to adhere to The Arsse's coding style. The style largely follows [PSR-2](https://www.php-fig.org/psr/psr-2/) with some exceptions:
#### Differences
- Classes, methods, and functions should have their opening brace on the same line as the signature
- Anonymous functions should have no space before the parameter list
- Article GUID hashes are not hashes like in NCN; they are integers rendered as strings
- Article fingerprints are a combination of hashes rather than a single hash
- When marking articles as starred the feed ID is ignored, as they are not needed to establish uniqueness
- The feed updater ignores the `userId` parameter: feeds in The Arsse are deduplicated, and have no owner
- The `/feeds/all` route lists only feeds which should be checked for updates, and it also returns all `userId` attributes as empty strings: feeds in The Arsse are deduplicated, and have no owner