Skip to content

Avoid linear scans through all users, streams, etc. #3339

@gnprice

Description

@gnprice

There are a number of places where we make a linear search through an array of potentially thousands of items, when we really just want a specific item with a known ID or name.

This is a pattern that can be devastating for performance -- in particular, as soon as it happens in a loop, we have quadratic behavior. This is a common class of performance bug found in lots of software:
https://accidentallyquadratic.tumblr.com/post/113840433022/why-accidentally-quadratic
and we should avoid it.

Here's a rough catalog of where we do this (as of c5eb6a5):

$ git grep -ohP '\w+\.find\(.*?\)'  upstream/master src/ | sort | grep --color '^\w*'
actionSheetButtons.find(x => x.title === title)
allStreams.find(stream => stream.stream_id === subscription.stream_id)
allUsers.find(u => u.email === narrow[0].operand)
allUsers.find(x => x.email === email)
allUsers.find(x => x.email === r)
handlers.find(x => x.isFunc(narrow)
messages.find(msg => !isMessageRead(msg, flags, subscriptions, mute)
messages.find(x => x.id === messageId)
messages.find(x => x.id === messageId)
messages.find(x => x.isFunc(narrow)
mute.find(x => x[0] === stream.name && x[1] === name)
selected.find(user => user.user_id === item.user_id)
selected.find(x => x.email === email)
state.find(item => item.timestamp === action.outbox.timestamp)
state.find(y => x.stream_id === y.stream_id)
state.find(y => x.stream_id === y.stream_id)
streams.find(s => s.name === narrow[0].operand)
streams.find(s => s.name === narrow[0].operand)
streams.find(s => s.name === narrow[0].operand)
streams.find(s => s.name === narrow[0].operand)
streams.find(s => s.name === narrow[0].operand)
streams.find(stream_ => narrow[0].operand === stream_.name)
streams.find(sub => narrow[0].operand === sub.name)
streams.find(x => x.name === narrow[0].operand)
streams.find(x => x.name === narrow[0].operand)
streams.find(x => x.name === narrow[0].operand)
streams.find(x => x.name === narrow[0].operand)
streams.find(x => x.name === narrow[0].operand)
streams.find(x => x.stream_id === streamId)
streams.find(y => x && x.stream_id === y.stream_id)
subscriptions.find(sub => narrow[0].operand === sub.name)
subscriptions.find(sub => streamName === sub.name)
subscriptions.find(x => x.name === display_recipient)
subscriptions.find(x => x.name === message.display_recipient)
subscriptions.find(x => x.name === message.display_recipient)
subscriptions.find(x => x.name === message.display_recipient)
subscriptions.find(x => x.name === message.display_recipient)
subscriptions.find(x => x.name === narrow[0].operand)
subscriptions.find(x => x.stream_id === streamId)
subscriptions.find(y => x && y && x.stream_id === y.stream_id)
unreadHuddles.find(x => x.user_ids_string === userIds)
unreadPms.find(x => x.sender_id === sender.user_id)
users.find(user => user.email === email)
users.find(user => user.email === narrow[0].operand)
users.find(user => user.email === userEmail)
users.find(user => user.user_id === userId)
users.find(u => u.email === narrow[0].operand)
users.find(x => x.email === email)

A few of those are innocuous, searching through a fixed list of a handful of entries. But the majority of them appear to be searching through our entire list of known users, or non-bot users, or streams, or subscriptions -- all of them lists of unbounded length -- just looking for an ID, email, or name.

Instead, we should make these lookups through efficient, indexed data structures. A JS object works great. A Map works equally great, and can be a bit cleaner in the code.

Of these, the searches through users are the highest priority to fix; that can be a very long list indeed, and I suspect these are already a cause of sluggishness for users of large orgs like chat.zulip.org.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions