Skip to content

API: Filter parameter (GQL filter queries spec) #5604

@ErisDS

Description

@ErisDS

This spec/issue describes the filter parameter originally mentioned in issue #5463, however it contains a more in-depth spec than usual, due to the complexity of the feature. This issue serves as a basis for documentation whilst the feature is in active development, as well as giving a basis for testing.

Table of Contents:


Summary

The idea behind filter queries is including additional parameters in Browse API requests, causing the result to be filtered in some useful way. This provides an additional level of power and flexibility for users interacting with their data in themes, apps, and in the admin UI.

The initial focus is on the ability to do advanced queries against Posts, Tags, and Users. The following document outlines a detailed proposal for a 'query language' exposed via a top-level filter property available when accessing the API from various tools.

Key Contextual Information

The Ghost JSON API supports full BREAD operations, however filtering only applies to Browse requests.

Resources

There are 3 main resources in the Ghost JSON API which are 'public' resources, and have clear use cases which require advanced filtering:

  • Posts
  • Tags
  • Users

Relations

The following relations need special treatment when querying across resources.

  • Posts have many Tags
  • Posts have one Author (Users)
  • Tags have many Posts
  • Users have many Posts

Additionally, the relations & business logic in a blog reveal the following important relational data concepts:

  • Each Post has a Next Post and a Previous Post.
  • Each Post has a Tag Count
  • Each User has a Post Count
  • Each Tag has a Post Count

API Access Tools

There are 3 key tools which can be used to request data from the Ghost JSON API which need to be considered:

  1. RESTful HTTP calls - E.g. GET /api/posts?limit=5
  2. Method calls (used internally & in apps) - E.g. api.posts.browse({limit: 5})
  3. The (in progress) 'get' handlebars helper - E.g. {{#get "posts" limit="5"}}

Proposal

Adding advanced filtering to Ghost's JSON API via a top-level filter parameter which accepts a string representation of an SQL query, highly optimised for the kinds of queries which we expect will be common to Ghost, but flexible enough to grow outside of anticipated use cases.

Top-Level Filter Parameter

A top-level filter parameter will provide advanced querying capabilities available in all 3 tools in a standard way, such that the parameter's value is always a string:

  • HTTP: GET /api/posts?limit=5&filter=???
  • Method Call: api.posts.browse({limit: 5, filter: "???"})
  • Get Helper: {{#get "posts" limit="5" filter="???"}}

In both the method call and get helper syntaxes, the filter parameter value will be surrounded by double-quotes. Avoiding double-quotes inside the string is therefore a requirement.

In the HTTP version, the string will need to be url safe. Url encoding & decoding will be required.

Filter Parameter Values

The key requirement of the filter parameter is to accept values which can represent the rough equivalent of the where clause of an SQL statement (language diagram). The fundamentals of this proposal are therefore the definition of a language which fits Ghost's use cases and can be converted into SQL through Ghost's Bookshelf/Knex model layer.

SQL's capabilities are extensive. To start with, the focus for Ghost is on supporting logical and & or combinations, comparisons using equals, greater than & less than (with support for not). The importance of the relation between Post & Tags also drives the need to support in.

Filter Anatomy

An individual filter is composed of 3 parts, the property that is being filtered, the operator (currently always =) and the value. Filters need to be combined in groups of or & and rules.

The property needs to support path matching: e.g. tag currently really means tag.slug.

The operators need to include equals, not equals, and numerical comparisons.

The values are strings, numbers, nulls etc that are matched/compared against.

Filter Style / Syntax

There are a number of common styles for combining properties, operators and values into expressions. This proposal borrows heavily from the style used in GitHub, Gmail & Slack amongst others: E.g.label:api or from:-hannah or stars:<10.

The syntax of this style is the use of property-operator-value triplets in the form: property:operatorvalue, where the colon is a separator, and operator is optional.

This syntax is short and compact for the common use cases in Ghost, for example: featured:true, tags:photo, author:john, image:-null, posts.count:>10

This syntax is also flexible enough to be extended to support multiple complex expressions by combining the property-operator-value triplets together with other symbols to represent and & or. Following on from using - for negation, the proposal is to use + for and as well as , for or e.g: featured:true+tags.count:>10, tags:photo,tags:video.

This is then used inside of the filter="" parameter, E.g:

  • HTTP: GET /api/posts?limit=5&filter=tags:photo,featured:true
  • Method Call: api.posts.browse({limit: 5, filter: "tags:photo,featured:true"})
  • Get Helper: {{#get "posts" limit="5" filter="tags:photo,featured:true"}}

Filter Query Language Spec

Syntax

Filter Expressions

A filter expression is a string which provides the property, operator and value in the form property:operatorvalue:

  • property - a path representing the key to filter on
  • : - separator between property and an operator-value expression
    • operator is optional, so : on its own is roughly =

Property

Matches: [a-zA-Z_][a-zA-Z0-9_.]

  • can contain only alpha-numeric characters and _
  • cannot contain whitespace
  • must start with a letter
  • supports . separated paths, E.g. author.name or posts.count
  • is always lowercase, but accepts and converts uppercase

Filter expressions can be combined using the following:

  • + - represents and
  • , - represents or
  • ( filter expression ) - overrides operator precedence

Operator

The following operators are spec'd for the first version:

  • - - not operator
  • > - greater than operator
  • >= - greater than or equals operator
  • < - less than operator
  • <= - less than or equals operator

Value

Can be one of the following

  • null
  • true
  • false
  • a *number *(integer)
  • a literal
    • Any character string which follows these rules:
    • Cannot start with - but may contain it
    • Cannot contain any of these symbols: '"+,()><=[] unless they are escaped
    • Cannot contain whitespace
  • a string surrounded by single quotes
    • Any character except a single or double quote surrounded by single quotes
    • Single or Double quote *_MUST _be escaped
    • Can contain whitespace
    • A string can contain a date any format that can be understood by new Date()

Special characters:

  • : - filter expression separator
  • + - *_and *_symbol
  • , - *_or *_symbol
  • ( - left parenthesis (operator precedence)
  • ) - right parenthesis (operator precedence)
  • - - not operator
  • > - greater than operator
  • >= - greater than or equals operator
  • < - less than operator
  • <= - less than or equals operator
  • [ - left bracket (*_in *_operator)
  • ] - right bracket (*_in *_operator)
  • ' - single quote (*_string *_syntax)
  • " - double quote - NOT PERMITTED UNESCAPED

Whitespace rules

Whitespace is not permitted inside of a property, or inside a value unless the value is surrounded with quotes. Everywhere else, whitespace is ignored and therefore becomes optional.

This allows for whitespace to be used to improve readability if necessary, as any of the following are equivalent and will be accepted.

  • featured:true+posts.count:>10, tags:photo,tags:video
  • featured:true + posts.count:>10, tags:photo, tags:video
  • featured: true + posts.count: >10, tags: photo, tags: video
  • featured: true + posts.count :> 10, tags: photo, tags: video

Implementation

Hold on to your hats, this is where shit gets real...

Creating a Parser

As far as I can determine, the set of rules above describes a language that can be represented by an unambiguous context-free grammar and therefore understood by an LALR(1) parser.

What this means is that we can represent the set of rules in a standard way, and then use JISON to generate a parser in JavaScript, which can in turn be used in Ghost to understand filter query strings, and convert them into something more useful. Essentially, all that has to be done is write a set of rules & the hard part of this is done. Neat.

The parser should live in its own repository, with a full suite of tests, and become a dependency of Ghost. The repository for it is here: https://github.com/TryGhost/GQL

Parser Output

The next step is to determine what the 'something more useful' that the parser is supposed to output looks like. The essence of what we're doing is representing SQL, so we could output the SQL equivalent of a filter query string and then pass that to knex.whereRaw() , which will then run the query for us inside the model layer of Ghost.

In fact, this is what I did for a first test that all this works. I wrote a rudimentary version of the rules, output raw SQL, and ran it through knex. And it worked.

The downside to parsing straight to SQL is two-fold: firstly, by translating from one string format to another, there is little opportunity for semantic analysis - that is to validate or operate on the property and value parts of the query to check they exist, or are permitted (filtering by the password property of the User resource should not be permitted, for example). Secondly, it seems to me that running raw sql queries, even through knex, is likely to result in security problems.

Instead of churning out SQL, the parser needs to output an intermediary format that we can use to get all of the useful and contextual information out of the query. This format should be something we can easily pass to knex's query builder functions, like .where() and .orWhere(), (which take a property, and operator and a value) so that we can run the queries through the query builder.

The intermediary format must not lose any information from the query.

JSON Format

The proposal is to use an array. Each filter expression will be converted into an object with key value pairs representing the property, operator, and value.

Arrays are ordered, and so the order of the array will match the order in which the query should be built.

All except the first item of the array should also indicate whether the expression is to be combined with and or or. Something like this:

{statements: [
    {prop: 'tags.slug', op: '=', value: 'photo'},
    {prop: 'featured', op: '=', value: true, func: 'and'},
    {prop: 'tag.count', op: '>', value: 5, func: 'or'}
]}

To maintain precedence overrides, a single object in the array can alternatively represent a group that contains a sub-array of objects. The group object must also indicate whether it is combined using and or or.

{statements: [
    {prop: 'author.slug', op: '!=', value: 'joe'},
    {
        group: [
            {prop: 'tags.slug', op: '=', value: 'photo'},
            {prop: 'image', op: 'IS NOT', value: null, func: 'or'},
            {prop: 'featured', op: '=', value: true, func: 'or'}
        ], func: 'and'
     }
]}

This format lends itself nicely to building queries using the knex query builder functions, as we can cycle through the array (and recursively through groups) calling where or orWhere as necessary. Having a JSON format with named key-value pairs for property, operator, and value, makes it easy to use lodash magic to pull out parts of the query (e.g. all properties or all values) and validate them.

Post-Processing

Once we have our filter in a JSON format, and we can pull out properties (table & field names) and values, we can start to do some validation and processing of the information in order to ensure the queries are safe and sensible.

Public vs Private

Certain properties should not be filtered upon, e.g. user.email, user.password, user.status and post.status as all of these properties are affected by permissions / not intended to be available to public endpoints.

For now, as we're only intending to offer 'official' public-only beta API access via HTTP and the get helper, these properties could be santised out entirely if it's easier, or otherwise they should be santised out if the context is public (see the change coming with public permissions for more context for what I mean here).

Standard validation

All of the properties should be known attributes of the models, and all of the values should be valid values for those attributes, according to our standard validation mechanisms.

Joins

Any reference to a related table needs to be translated into a join.

Dates

Any String value used along with a date property E.g. created_at, updated_at, published_at, last_login should be converted into a Date that SQL can understand, providing it is something that can be parsed using JavaScript's new Date(). Some care needs to be taken here to check this will work across SQLite, MySQL and pg as they treat dates differently and Bookshelf/Knex is not good at handling this (yet).

Special Rules

There are a few queries which should have special rules for how they affect other parts of the query. The single example at the moment is filtering posts by tags where tag.slug is IN a set (or in fact, any use of IN).

Where using IN we need to be sure that we still return distinct items in the set (and the correct number). When using IN the default order should naturally change to be ordered by the number of matches in descending order.

E.g. when requesting posts with tags IN a group, the query output should be similar to (or have the same affect as) this:

select * from "posts" inner join "posts_tags" on "posts_tags"."post_id" = "posts"."id" inner join "tags" on "posts_tags"."tag_id" = "tags"."id" where "page" = false and "status" = 'published' and "posts"."id" != 11 and "tags"."slug" in ('markdown', 'list') group by "posts"."id" order by count(tags.id) DESC, "posts"."status" ASC, "posts"."published_at" DESC, "posts"."updated_at" DESC, "posts"."id" DESC limit 5

Note the group by and order by count(tags.id) parts.

Examples

Here is a separate Gist outlining some example queries in HTTP, method call & get helper form: https://gist.github.com/ErisDS/f516a859355d515aa6ad, feel free to suggest more example use cases in the comments.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions