Skip to content

Collection update_if_needed() reform #4726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

simonask
Copy link
Contributor

@simonask simonask commented May 31, 2021

What, How & Why?

This PR shifts around the responsibility of updating internal state of collection accessors.

Accessors are now initialized lazily when they are first used. This makes certain things simpler, especially copy-construction of collection accessors, which previously required the source to be updated (or it would throw an exception). With this change, the new copy can be in an uninitialized state until it is first used.

Information about how to respond to an update to the underlying storage now flows upwards (through UpdateStatus). This isolates the responsibility of detecting an update to the accessor that actually manages that data.

The "detached" state is now explicitly called out during accessor updates (UpdateStatus::Detached).

The flow of accessor update is now:

  • The user calls a method on the collection accessor (such as insert() etc.)
  • The accessor calls update_if_needed() on its inner accessor (in the case of LnkLst) or its parent Obj.
  • The parent object tells the accessor via the return value whether it updated itself.
    • If the parent object was updated, the accessor must always update itself and return UpdateStatus::Updated.
    • If the parent object was not updated, and was not detached, the accessor has the opportunity to update itself based on other criteria (e.g. whether storage_version changed in the case of most collections). If it decided to update itself, it should return UpdateStatus::Updated to instruct potential users of the class to update themselves in turn (this is currently only relevant for Lst<ObjKey> and Set<ObjKey>, which are used by LnkLst and LnkSet, respectively).
    • If the parent object was detached, the accessor may choose to de-initialize itself and return to a detached state, freeing up resources associated with the accessor.
  • The accessor then performs the operation requested by the user, or throws an exception if it cannot be carried out.

One drawback of this design is that accessor updates cannot be initiated by the bottom-most accessor (unless it is a base class of the topmost accessor, and the topmost accessor calls Base::update_if_needed() in its implementation of update_if_needed()). In practice, it means that LnkLst must call its own update_if_needed() before carrying out any operations on its internal Lst<ObjKey>, or risk that its list of unresolved links becomes out of date.

The upside is that the accessor update flow becomes significantly easier to reason about, with fewer details leaking between different accessor types and a clearer flow of information.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)

@simonask simonask self-assigned this May 31, 2021
Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// `size()` will return 0 if we are detached, so the bounds check will fail.
auto sz = size();
if (from >= sz || to >= sz) {
throw std::out_of_range{"index out of bounds"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw std::out_of_range{"index out of bounds"};
throw std::out_of_range{util::format("index out of bounds when moving from %1 to %2 on list of size %3", from, to, sz)};

@@ -297,8 +292,7 @@ size_t Dictionary::find_any(Mixed value) const

Mixed Dictionary::min(size_t* return_ndx) const
{
update_if_needed();
if (m_clusters) {
if (update() && m_clusters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I didn't want to encode too much of the knowledge of what a NULL m_clusters means.

Copy link
Contributor

@jedelbo jedelbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we re-evaluate the need for ensure_writable. We no longer take the approach to ensure writability up front in Obj. Potential check for parent update is done as a check afterwards. I think we could simplify things a bit more if we replace ensure_writablewith ensure_created where appropriate. I make a small test to demonstrate the idea in je/collection-update-if-needed branch.

@@ -555,20 +538,18 @@ Dictionary::Iterator Dictionary::find(Mixed key)
void Dictionary::erase(Mixed key)
{
validate_key_value(key);
update_if_needed();
ensure_writable(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a check on return value here.

void do_set(size_t ndx, T value);
void do_insert(size_t ndx, T value);
void do_remove(size_t ndx);

friend class LnkLst;

// FIXME: BPlusTree must be wrapped in an `std::unique_ptr` for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rule must be that we should only introduce FIXMEs if this is something that should be fixed before next release. This is probably not. I could be turned into a TODO, but then I think a possible solution should be outlined. Otherwise it can just be left an a informative comment.

REALM_UNREACHABLE();
}

UpdateStatus ensure_writable(bool allow_create) final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather introduce a function called ensure_created and only make a call to that in places where subsequent actions requires an existing bplustree - like insert. In other places like remove you implicitly call size() that will ensure that the object is updated. It is no longer necessary to make the owning m_obj writable. That should be handled when the ref is set from possible copy_on_write.

@@ -515,7 +496,10 @@ const Mixed Dictionary::operator[](Mixed key)
ret = get(key);
}
catch (const KeyNotFound&) {
create();
// FIXME: This may throw `KeyNotFound` again, but for a different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be fixed? If yes, let's fix it in this PR or file a ticket to follow up on it. Same with the other FIXME-s - we should either file follow-up tickets and add links in the code or have them as regular comments.

@jedelbo jedelbo assigned jedelbo and unassigned simonask Aug 31, 2021
@jedelbo
Copy link
Contributor

jedelbo commented Sep 28, 2021

superseded by #4891

@jedelbo jedelbo closed this Sep 28, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
@tgoyne tgoyne deleted the su/collection-update-if-needed branch May 29, 2024 18:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants