Skip to content

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Apr 25, 2023

Problem

HogQL properties (e.g. properties.$browser) are weird:

  1. If a property is not set, it's returned as an empty string.
  2. If a property is set to the JSON null, it's returned as the string null
  3. Numbers, booleans and datetimes are cast if specified under "data management"
  4. Everything else is always a string

This results in an environment where you use properties.$browser != '' to check if the field is not set, and can't rely on JS none values. Consequently the HogQL-translated "is set" and "is not set" event filters (in use in the event explorer) don't work as those rely on null checks.

Changes

Both unspecified (undefined) and null properties are now returned as the SQL null.

The previous code for property access:

"replaceRegexpAll(JSONExtractRaw(events.properties, '$browser'), '^\"|\"$', '')"

Got replaced with:

"replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$browser'), ''), 'null'), '^\"|\"$', '')"

The new part is nullIf(nullIf(x, ''), 'null'). This converts the property to a null if it's an empty string or the word "null", without quotes.

The changes are only in HogQL (including expressions). Nulls are checked differently (JSONHas) in non-HogQL filters.

How did you test this code?

I added a test case to check the new behaviour.

At first I wanted to just get undefined fields to return as null, but when adding the testcase I realised also just null fields don't work.

I validated quickly that I could only read the column once. Even code like if(properties.x, properties.x, 'null') slowed down the query significantly. I validated this by running a HogQL query like this on app.posthog.com:

   select count(*)
     from (
             select JSONExtractRaw(events.properties, '$browser')
               from events
              where timestamp > now() - interval 1 day
          )

Every extra property access resulted in more seconds on average. Only adding nullIf-s didn't. My data is however anecdotal and could use double checking.

Copy link
Contributor

@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved assuming my comment is the case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I understand - this only fixes it for HogQL?

I tried the filters that weren't working for me and they still don't work so assuming that standard Query stuff is still as is i.e.:

Looks legit. Only thing is my specific issue didn't seem to be solved. Checked it out and ran

// returned results
{
    key: '$session_id',
    value: '',
    operator: 'exact',
    type: 'event',
}

// did not return
{
    key: '$session_id',
    value: 'is_not_set',
    operator: 'is_not_set',
    type: 'event',
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it should fix it for just HogQL. However all filters on the "event explorer" page are converted to HogQL. Thus in this case everything should work.

However! This $session_id is actually stored as a materialized column... and that column is stored without the new nullIf checks :'(.

A proper fix would require altering the table in an async migration and retroactively changing all these columns as well. However this is above my pay grade right now, so calling in the big guns for support. CC @yakkomajuri

@mariusandra mariusandra marked this pull request as draft April 27, 2023 08:19
@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@mariusandra
Copy link
Collaborator Author

mariusandra commented May 11, 2023

I think I have the best/worst compromise we can make here.

  1. Nothing changes in the non-HogQL world. This means non-hogql "is set" and "is not set" filters keep working as they did now by checking if a property exists with JSONHas if not materialized or the string comparison notEmpty if materialized. For example (copied from an insight querty): AND (((notEmpty("$session_id")) AND (JSONHas(e.properties, 'realm')))). Please note that we have been equating empty strings with undefined properties after materialization, yet do make the distinction before.

  2. In HogQL, all non-materialized properties are handled properly. Empty strings are empty strings (''). Nulls and undefined properties are NULL.

  3. In HogQL, all materialized properties are converted to a NULL if they're an empty string or the word null. Effectively we're losing the support for empty strings. However we never truly supported empty strings after materialization anyway.

  4. We will work towards rematerializing all columns and bringing back support for empty strings. Work has started here: feat(materialization): remove backfill #15484 and RFC: HogQL null access and re-materialising all columns #15474

This will at least unblock us enough to go on with the HogQL public beta launch.

@mariusandra mariusandra marked this pull request as ready for review May 11, 2023 08:54
@mariusandra
Copy link
Collaborator Author

As a result of this PR, all is set checks work again in the event explorer.

Since we now actually return null in the event explorer and elsewhere, I had to add a special null display:

Before:
image

After:
image

@mariusandra mariusandra merged commit 5221eda into master May 11, 2023
@mariusandra mariusandra deleted the hogql-null-handling branch May 11, 2023 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants