API: Filter parameter (GQL filter queries spec)

This spec/issue describes the `filter` parameter originally mentioned in issue [#5463](https://github.com/TryGhost/Ghost/issues/5463), however it contains a more in-depth spec than usual, due to the complexity of the feature. This issue serves as a basis for documentation whilst the feature is in active development, as well as giving a basis for testing.

Table of Contents:
- [Summary](#summary)
- [Key Contextual Information](#context)
- [Proposal](#proposal)
- [Spec](#spec)
- [Implementation](#implementation)
- [Examples](#examples)

---
## Summary

The idea behind filter queries is including additional parameters in `Browse` API requests, causing the result to be filtered in some useful way. This provides an additional level of power and flexibility for users interacting with their data in themes, apps, and in the admin UI.

The initial focus is on the ability to do advanced queries against Posts, Tags, and Users. The following document outlines a detailed proposal for a 'query language' exposed via a top-level `filter` property available when accessing the API from various tools.

<a id="context"></a>
### Key Contextual Information

The Ghost JSON API supports full BREAD operations, however filtering only applies to Browse requests.
#### Resources

There are 3 main resources in the Ghost JSON API which are 'public' resources, and have clear use cases which require advanced filtering:
- Posts
- Tags
- Users
#### Relations

The following relations need special treatment when querying across resources.
- **Posts** have many **Tags**
- **Posts** have one Author (**Users**)
- **Tags** have many **Posts**
- **Users** have many **Posts**

Additionally, the relations & business logic in a blog reveal the following important relational data concepts: 
- Each **Post** has a **Next Post** and a **Previous Post**.
- Each **Post** has a **Tag Count**
- Each **User** has a **Post Count** 
- Each **Tag** has a **Post Count**
#### API Access Tools

There are 3 key tools which can be used to request data from the Ghost JSON API which need to be considered:
1. RESTful HTTP calls - E.g. `GET /api/posts?limit=5`
2. Method calls (used internally & in apps) - E.g. `api.posts.browse({limit: 5})`
3. The (in progress) 'get' handlebars helper - E.g. `{{#get "posts" limit="5"}}`

<a id="proposal"></a>
## Proposal

Adding advanced filtering to Ghost's JSON API via a top-level `filter` parameter which accepts a string representation of an SQL query, highly optimised for the kinds of queries which we expect will be common to Ghost, but flexible enough to grow outside of anticipated use cases.
### Top-Level Filter Parameter

A top-level `filter` parameter will provide advanced querying capabilities available in all 3 tools in a standard way, such that the parameter's value is always a string:
- HTTP: `GET /api/posts?limit=5&filter=???`
- Method Call: `api.posts.browse({limit: 5, filter: "???"})`
- Get Helper: `{{#get "posts" limit="5" filter="???"}}`

In both the method call and get helper syntaxes, the filter parameter value will be surrounded by double-quotes. Avoiding double-quotes inside the string is therefore a requirement.

In the HTTP version, the string will need to be url safe. Url encoding & decoding will be required. 
### Filter Parameter Values

The key requirement of the `filter` parameter is to accept values which can represent the rough equivalent of the `where` clause of an SQL statement ([language diagram](https://www.sqlite.org/syntaxdiagrams.html#expr)). The fundamentals of this proposal are therefore the definition of a language which fits Ghost's use cases and can be converted into SQL through Ghost's Bookshelf/Knex model layer.

SQL's capabilities are extensive. To start with, the focus for Ghost is on supporting logical **and** & **or** combinations, comparisons using **equals**, **greater than** & **less than** (with support for **not**). The importance of the relation between Post & Tags also drives the need to support **in**.
#### Filter Anatomy

An individual filter is composed of 3 parts, the **property** that is being filtered, the **operator** (currently always `=`) and the **value**. Filters need to be combined in groups of **or** & **and** rules. 

The **property** needs to support path matching: e.g. `tag` currently really means `tag.slug`. 

The **operators** need to include equals, not equals, and numerical comparisons. 

The **values** are strings, numbers, nulls etc that are matched/compared against.
#### Filter Style / Syntax

There are a number of common styles for combining properties, operators and values into expressions. This proposal borrows heavily from the style used in GitHub, Gmail & Slack amongst others: E.g.`label:api` or `from:-hannah` or `stars:<10`. 

The syntax of this style is the use of property-operator-value triplets in the form: property:operatorvalue, where the colon is a separator, and operator is optional. 

This syntax is short and compact for the common use cases in Ghost, for example: `featured:true`, `tags:photo`, `author:john`, `image:-null`, `posts.count:>10`

This syntax is also flexible enough to be extended to support multiple complex expressions by combining the property-operator-value triplets together with other symbols to represent **and** & **or.** Following on from using `-` for negation, the proposal is to use `+` for **and** as well as `,` for **or** e.g: `featured:true+tags.count:>10`, `tags:photo,tags:video`.

This is then used inside of the `filter=""` parameter, E.g: 
- HTTP: `GET /api/posts?limit=5&filter=tags:photo,featured:true`
- Method Call: `api.posts.browse({limit: 5, filter: "tags:photo,featured:true"})`
- Get Helper: `{{#get "posts" limit="5" filter="tags:photo,featured:true"}}`

<a id="spec"></a>
## Filter Query Language Spec
### Syntax
#### Filter Expressions

A **filter expression** is a string which provides the **property**, **operator** and **value** in the form property:operatorvalue:
- **property** - a path representing the key to filter on
- `:` - separator between **property** and an **operator**-**value** expression 
 - **operator** is optional, so `:` on its own is roughly `=`
#### Property

Matches: `[a-zA-Z_][a-zA-Z0-9_.]`
- can contain only alpha-numeric characters and `_`
- cannot contain whitespace
- must start with a letter
- supports `.` separated paths, E.g. `author.name` or `posts.count`
- is always lowercase, but accepts and converts uppercase

**Filter expressions** can be combined using the following:
- `+` - represents **and**
- `,` - represents **or**
- `(` **filter expression** `)` - overrides operator precedence
#### Operator

The following operators are spec'd for the first version:
- `-` - **not** operator
- `>` - **greater than** operator 
- `>=` - **greater than or equals** operator
- `<` - **less than** operator
- `<=` - **less than or equals** operator
#### Value

Can be one of the following
- **null**
- **true**
- **false**
- a *_number *_(integer)
- a **literal** 
 - Any character string which follows these rules:
 - Cannot start with `-` but may contain it
 - Cannot contain any of these symbols: `'"+,()><=[]` unless they are escaped
 - Cannot contain whitespace
- a **string** surrounded by single quotes
 - Any character except a single or double quote surrounded by single quotes
 - Single or Double quote *_MUST *_be escaped*
 - Can contain whitespace
 - A string can contain a date any format that can be understood by `new Date()`

Special characters:
- `:` - **filter expression** separator
- `+` - *_and *_symbol
- `,` - *_or *_symbol
- `(` - left parenthesis (**operator precedence**)
- `)` - right parenthesis (**operator precedence**)
- `-` - **not** operator
- `>` - **greater than** operator 
- `>=` - **greater than or equals** operator
- `<` - **less than** operator
- `<=` - **less than or equals** operator
- `[` - left bracket (*_in *_operator)
- `]` - right bracket (*_in *_operator)
- `'` - single quote (*_string *_syntax)
- `"` - double quote - **NOT PERMITTED UNESCAPED**
### Whitespace rules

Whitespace is not permitted inside of a **property**, or inside a **value** unless the value is surrounded with quotes. Everywhere else, whitespace is ignored and therefore becomes optional.

This allows for whitespace to be used to improve readability if necessary, as any of the following are equivalent and will be accepted.
- `featured:true+posts.count:>10`, `tags:photo,tags:video`
- `featured:true + posts.count:>10`, `tags:photo, tags:video`
- `featured: true + posts.count: >10`, `tags: photo, tags: video`
- `featured: true + posts.count :> 10`, `tags: photo, tags: video`

<a id="implementation"></a>
## Implementation

_Hold on to your hats, this is where shit gets real..._
### Creating a Parser

As far as I can determine, the set of rules above describes a language that can be represented by an unambiguous context-free grammar and therefore understood by an LALR(1) parser. 

What this means is that we can represent the set of rules in a standard way, and then use [JISON](http://zaach.github.io/jison/) to generate a parser in JavaScript, which can in turn be used in Ghost to understand **filter query strings**, and convert them into something more useful. Essentially, all that has to be done is write a set of rules & the hard part of this is done. Neat.

The parser should live in its own repository, with a full suite of tests, and become a dependency of Ghost. The repository for it is here: https://github.com/TryGhost/GQL
### Parser Output

The next step is to determine what the 'something more useful' that the parser is supposed to output looks like. The essence of what we're doing is representing SQL, so we could output the SQL equivalent of a **filter query string** and then pass that to `knex.whereRaw()` , which will then run the query for us inside the model layer of Ghost.

_In fact, this is what I did for a first test that all this works. I wrote a rudimentary version of the rules, output raw SQL, and ran it through knex. And it worked._

The downside to parsing straight to SQL is two-fold: firstly, by translating from one string format to another, there is little opportunity for semantic analysis - that is to validate or operate on the **property** and **value** parts of the query to check they exist, or are permitted (filtering by the `password` property of the User resource should not be permitted, for example). Secondly, it seems to me that running raw sql queries, even through knex, is likely to result in security problems.

Instead of churning out SQL, the parser needs to output an intermediary format that we can use to get all of the useful and contextual information out of the query. This format should be something we can easily pass to knex's query builder functions, like `.where()` and `.orWhere()`, (which take a **property**, and **operator** and a **value**) so that we can run the queries through the query builder. 

The intermediary format must not lose any information from the query. 
### JSON Format

The proposal is to use an array. Each **filter expression** will be converted into an object with key value pairs representing the **property**, **operator**, and **value**.

Arrays are ordered, and so the order of the array will match the order in which the query should be built.

All except the first item of the array should also indicate whether the expression is to be combined with **and** or **or**. Something like this: 

```
{statements: [
 {prop: 'tags.slug', op: '=', value: 'photo'},
 {prop: 'featured', op: '=', value: true, func: 'and'},
 {prop: 'tag.count', op: '>', value: 5, func: 'or'}
]}
```

To maintain precedence overrides, a single object in the array can alternatively represent a group that contains a sub-array of objects. The group object must also indicate whether it is combined using **and** or **or**.

```
{statements: [
 {prop: 'author.slug', op: '!=', value: 'joe'},
 {
 group: [
 {prop: 'tags.slug', op: '=', value: 'photo'},
 {prop: 'image', op: 'IS NOT', value: null, func: 'or'},
 {prop: 'featured', op: '=', value: true, func: 'or'}
 ], func: 'and'
 }
]}
```

This format lends itself nicely to building queries using the knex query builder functions, as we can cycle through the array (and recursively through groups) calling `where` or `orWhere` as necessary. Having a JSON format with named key-value pairs for **property**, **operator**, and **value**, makes it easy to use lodash magic to pull out parts of the query (e.g. all **properties** or all **values**) and validate them.
### Post-Processing

Once we have our filter in a JSON format, and we can pull out properties (table & field names) and values, we can start to do some validation and processing of the information in order to ensure the queries are safe and sensible.
#### Public vs Private

Certain properties should not be filtered upon, e.g. `user.email`, `user.password`, `user.status` and `post.status` as all of these properties are affected by permissions / not intended to be available to public endpoints. 

For now, as we're only intending to offer 'official' public-only beta API access via HTTP and the get helper, these properties could be santised out entirely if it's easier, or otherwise they should be santised out if the context is public (see the change coming with [public permissions](https://github.com/ErisDS/Ghost/commit/0d1ae6e163308975749cc352597c93faab6e0e2e#diff-6fb80b4d82076d007bffbc34d762d12aR23) for more context for what I mean here).
#### Standard validation

All of the properties should be known attributes of the models, and all of the values should be valid values for those attributes, according to our standard validation mechanisms.
#### Joins

Any reference to a related table needs to be translated into a join. 
#### Dates

Any String value used along with a date property E.g. `created_at`, `updated_at`, `published_at`, `last_login` should be converted into a Date that SQL can understand, providing it is something that can be parsed using JavaScript's `new Date()`. Some care needs to be taken here to check this will work across SQLite, MySQL and pg as they treat dates differently and Bookshelf/Knex is not good at handling this (yet).
#### Special Rules

There are a few queries which should have special rules for how they affect other parts of the query. The single example at the moment is filtering posts by tags where tag.slug is `IN` a set (or in fact, any use of `IN`).

Where using `IN` we need to be sure that we still return distinct items in the set (and the correct number). When using `IN` the default order should naturally change to be ordered by the number of matches in descending order.

E.g. when requesting posts with tags `IN` a group, the query output should be similar to (or have the same affect as) this:

`select * from "posts" 
inner join "posts_tags" on "posts_tags"."post_id" = "posts"."id" 
inner join "tags" on "posts_tags"."tag_id" = "tags"."id" 
where "page" = false and "status" = 'published' and "posts"."id" != 11 and "tags"."slug" in ('markdown', 'list') 
group by "posts"."id" 
order by count(tags.id) DESC, "posts"."status" ASC, "posts"."published_at" DESC, "posts"."updated_at" DESC, "posts"."id" DESC limit 5`

Note the `group by` and `order by count(tags.id)` parts. 

<a id="examples"></a>
## Examples

Here is a separate [Gist](https://gist.github.com/ErisDS/f516a859355d515aa6ad) outlining some example queries in HTTP, method call & get helper form: https://gist.github.com/ErisDS/f516a859355d515aa6ad, feel free to suggest more example use cases in the comments.

Uh oh!

API: Filter parameter (GQL filter queries spec) #5604

Description

Summary

Key Contextual Information

Resources

Relations

API Access Tools

Proposal

Top-Level Filter Parameter

Filter Parameter Values

Filter Anatomy

Filter Style / Syntax

Filter Query Language Spec

Syntax

Filter Expressions

Property

Operator

Value

Whitespace rules

Implementation

Creating a Parser

Parser Output

JSON Format

Post-Processing

Public vs Private

Standard validation

Joins

Dates

Special Rules

Examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions