Skip to content

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Apr 4, 2022

Opening a PR as a mini-RFC

Dashboards Model relations

Currently

At the model level

Insights can be saved by themselves or on a Dashboard

If they are saved to a dashboard and the dashboard is deleted the insight is deleted too

The relationship is different depending on which side you start at

erDiagram
    INSIGHT ||--|o DASHBOARD : ""
    DASHBOARD ||--o{ INSIGHT: "items"
Loading
  • A single Insight can be on zero or one Dashboard
  • A single Dashboard can have zero to many Insights

The foreign key is specified on Insight. The relationship specifies a "related_name" so Dashboard has a relationship back to Insight

At the API level

loading a dashboard

https://app.posthog.com/api/projects/:team_id/dashboards/:dashboard_id/?refresh=false

Viewing a dashboard makes an API call for that dashboard by ID

That call returns a structure

{
  "id": dashboard_id,
  "items": [
    // list of insights on dashboard
  ],
  "created_at": "2022-01-12T19:39:35.014190Z",
  "created_by": {
    // user
  },
  // .... other fields
}

loading an insight

https://app.posthog.com/api/projects/:team_id/insights/?short_id=the-short-id

Viewing an insight directly makes an API call by short id

This call returns a paginated structure (ideally (always?) with one result)

{
  "count": 1,
  "next": null,
  "previous": null,
  "results": [
    {
      "short_id": "the-short-id",
      "dashboard": dashboard_id,
        // .... other fields
    }
  ]
}

The API maps the same relationship:

  • Dashboards have many "items"
  • Insights have a single dashboard

Desired relationship

erDiagram
    INSIGHT }o--o{ DASHBOARD : ""
Loading

The desired relationship is that:

  • Dashboards have zero to many insights
  • Any Insight can be on many dashboards

Both insights and dashboards can exist without the other

Django many-to-many relationships implicitly create a join table

erDiagram
    INSIGHT ||--o{ DASHBOARD_INSIGHT: ""
    DASHBOARD_INSIGHT }o--|| DASHBOARD: ""
Loading

Manually creating that join model would allow future changes which could store data unique to that dashboard/item relation

E.g. a different date range applied to the same insight on different dashboards

Complications

The insight model is stored in a posthog_dashboarditem table

Dashboard Item might not be the correct name for the relation. Dashboards may in future hold more than only insights. E.g. Markdown blocks

Leaving Insights on "dashboard item" constrains either future naming, or means a markdown block would be modelled as an insight in code.

To change this we would need a migration to rename the table or to copy the data to a new table

That migration would need to work for cloud and self-host

the insight model already has a one way relationship direct to dashboard

Any migration to a join table would need to migrate the items relationship off Insight and keep any existing relationship

Proposed Approach

Ultimately, the approach is to add a new relation onto Dashboard

insights = models.ManyToManyField(Insight)

and deprecate the existing dashboard field on Insight

This allows querying in both directions. And can have a manual "through" table specified in case future changes need to store data associated with a particular insight/dashboard combo. E.g. different date filters on the "last month" dashboard and "this week" dashboard

(optional) Rename the posthog_dashboarditems table

PostgreSQL documentation on alter table for rename says:

"There is no effect on the stored data."

It would need testing, but this suggests we can have a single migration and code change that renames the insight table and changes the table name in the model

TODO This could lead to a DB that has had the table renamed and running instances that are expecting the old table name. That would need to be addressed, adding complication to this work

It is quicker to leave Insights on this table until PostHog 2.0 allows breaking changes and add a TODO to that effect on the model

change the Insight API to present an array of dashboard ids

Currently, the API presents a single dashboard ID. The UI uses this to provide a button to jump to the dashboard. This would need to change to a dropdown button showing the one to many dashboards you could visit from a given insight

The following UI interactions would need to change:

  • adding to a dashboard becomes multiselect
  • showing what dashboard an insight is on can't be a single button

2022-04-04 14 42 30

There are 17 places that the UI code knows that dashboard ID is number | null and changing to number | number[]| null or adding a new number[] property would need to be accounted for

Put a new relation on dashboards to a new DashboardItem model

insights = models.ManyToManyField(Insight)

Initially empty, the dashboard API would read from both insights and items relations and union and deduplicate the results of this relation and the existing Items relation

3. Migrate existing dashboard-insight relation onto the new model

In a perfect world this could be as simple as looping through Insight models, prefetching their dashboard, and storing the relation onto the new insights relation on that instance of Dashboard

Based on the number of Insights stored on cloud. This would take more than long enough to cause deployments to fail or data to appear to be inconsistent.

The join migration would need to be manually written as a fast set based SQL operation see comment

Or the Insight model would union and deduplicate its old and new relation and present that to the outside world. While an async migration paged through the DB updates needed

@pauldambra
Copy link
Member Author

@clarkus the only UI impact I can see of this is the flow of showing what dashboards an insight is on, navigating to them from the insight, and adding it to more

https://user-images.githubusercontent.com/984817/161557524-11b44b2a-4032-4da0-8c5d-e7a30f80d818.gif

@Twixes
Copy link
Member

Twixes commented Apr 5, 2022

Yeah, we'll definitely need a reworked modal to go from a boolean approach (either is on a dashboard or isn't) to a number (anywhere between 0 and n dashboards).
One tricky thing we definitely need to keep in mind in this project is dashboard permissions: currently when an insight belongs to a dashboard the user can't edit, that user also can't edit that insight. But what if only one out of a few dashboards the insight is on is restricted (e.g. user can edit "Funnels usage" but can't edit "Board meeting metrics")? 🤔

@pauldambra
Copy link
Member Author

Added many insights using

insert into posthog_dashboarditem (short_id, team_id, filters, favorited, is_sample, deleted, saved, updated_at,
                                   layouts, last_modified_at, refreshing, dashboard_id)
select left(md5((i::int+3000000)::text), 12),
       1,
       to_jsonb('{}'::text),
       False,
       False,
       False,
       False,
       now(),
       to_jsonb('{}'::text),
       now(),
       False,
       1
from generate_series(1, 1000000) s(i);

and inserted them in one step as a synchronous migration

insert into posthog_dashboardinsight (insight_id, dashboard_id)
select id as insight_id, dashboard_id from posthog_dashboarditem
where dashboard_id is not null
ON CONFLICT ON CONSTRAINT posthog_dashboardinsight_dashboard_id_insight_id_675423e4_uniq
DO NOTHING;

This took between 18 and 25 seconds over three attempts

A million is a lot of insights but this demonstrates that a single insert migration could timeout and cause issues with upgrades

@Twixes
Copy link
Member

Twixes commented Apr 5, 2022

That's my main worry with splitting posthog_insight out of posthog_dashboarditem… It shouldn't be an issue for self-hosted setups, but Cloud is tricky. Chunking the migration into more doable portions may help, but if that still takes more than 10 seconds, I worry about the impact of (and on) users editing insights at that time. @guidoiaquinti may have some more/better tips here.

@guidoiaquinti
Copy link
Contributor

That's my main worry with splitting posthog_insight out of posthog_dashboarditem… It shouldn't be an issue for self-hosted setups, but Cloud is tricky. Chunking the migration into more doable portions may help, but if that still takes more than 10 seconds, I worry about the impact of (and on) users editing insights at that time. @guidoiaquinti may have some more/better tips here.

That's a valid concern. As we don't know what's our customer p100 sort by row count on self-hosted, we should make this migration idempotent and split it into chunks in a loop (e.g. 100 rows/query) that will be retried in case of failures.

There's also a (probably marginal) race condition:

t0 - app is running
t1 - user starts to upgrade the app -> migration starts, web is still running the old version
t2 - user adds an insight (as the app is the old version we sill write the record to the "old" table)
t3 - migration completes without migrating the insight
t4 - app migration completes, web is now running the new version
t5 - there's an orphan record

@clarkus
Copy link
Contributor

clarkus commented Apr 5, 2022

I made some progress from the insights side of things. This allows for adding an insight to one or more dashboards, or creating a new dashboard and then adding. I am working through the dashboard side of things next.

Share Modal - single share link
Group 8972
Screen Shot 2022-04-05 at 4 22 24 PM

@pauldambra
Copy link
Member Author

pauldambra commented Apr 6, 2022

TO-DO

  • add new many-to-many relation
  • hide insight/item split from read side
  • hide insight/item split from write side
  • add migration of existing relation into new one
  • add new dashboards property on insight API
  • if someone's access to an insight is read-only on any dashboard, it is read-only everywhere
  • update filter based hash key to cope with more than one dashboard
  • update insight caching to cope with multiple dashboards (in progress)
  • update layouts to cope with multiple dashboard relation

TO FOLLOW-UP_WITH

  • migrate FE data onto new dashboards property
  • update UI so insights can be added to multiple dashboards

@neilkakkar
Copy link
Contributor

Q: Is there a link to somewhere I can see where we decided doing this makes sense / solves a good use case / etc. etc. ? This mini-RFC feels more implementation focused, and I'm missing context 😅

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably the least disruptive way to bring about this database change 👍.

This is still a draft, so I'm not sure what is in the TODO list and what is not, nevertheless here and inline are some comments/questions. Happy to address them myself as well.

  1. @pauldambra when you did your test here,... did the table you were inserting to have an index or not? Those 18-25 seconds for copying 1M rows from one table to another seem like a lot. Could it be that this was the case, and this insert could be made much faster by cleverly dropping and creating indexes? This feels way too long.

  2. We already need an explicit table for the relation, as the fields layouts and color should go there, and be removed from insight. They can be unique for each dashboards.

Comment on lines +22 to +25
# check permissions on any dashboard on legacy dashboard relation or the many-to-many dashboards relation
dashboards: List[Dashboard] = [insight.dashboard] + list(insight.dashboards.all())
edit_permissions = [d.can_user_edit(cast(User, request.user).id) for d in dashboards if d is not None]
return all(edit_permissions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue you should be able to edit an insight if you can edit any dashboard the insight is on, not only if you can edit all dashboards the insight is on.

As an example, when someone from the company's board adds an insight onto their private dashboard, you wouldn't expect everyone else to automatically lose edit rights? I'd expect the opposite: the private board dashboards should also show, that this specific insight has been shared more broadly.

from posthog.test.base import BaseTest


class TestDashboardInsightModel(BaseTest):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the DashboardInsight itself. Should this be the table between dashboards and insights? If so, is this the right name for such a table, given it'll contain 1) what to show 2) coordinates of the item on the dashboard, 3) color of the item. The "item" is currently just an insight, but it can also be a block of text, the events list table, or whatever. Should we make a new many-to-many model for every new type of dashboard element?

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@pauldambra
Copy link
Member Author

superceded by #9416

@pauldambra pauldambra closed this Apr 20, 2022
@pauldambra
Copy link
Member Author

@neilkakkar I missed your comment, sorry

There's not much more there but the RFC for the insights work is here https://github.com/PostHog/product-internal/blob/5c4efc9f46416dabd239a5ef931613e035596a18/requests-for-comments/2022-03-14%20nail%20insights.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants