Skip to content

Conversation

mary3000
Copy link
Contributor

@mary3000 mary3000 commented Mar 18, 2021

Update operations can't insert with gaps.
But there exist "virtual nulls" in tarantool -
i.e. fields that are specified in format and marked as is_nullable.
They are not actually stored in the database, and currently
tarantool refuses to do update ops with such nulls as "gaps".

But from the user side these nulls exist. So this patch changes
behaviour to such that update ops will fill virtual nulls with
the real ones.

Closes #3378

@mary3000 mary3000 requested review from alyapunov and Gerold103 March 18, 2021 09:07
@Gerold103
Copy link
Collaborator

Hi! I didn't review it in details yet, but I see you can simplify and speed up xrow_update_op_append_nulls() a lot. MP_NIL is a single byte, so you can insert a big rope batch in the end (no need to store the nils in individual rope nodes I think), and memset() it with the nil byte code.

@mary3000 mary3000 force-pushed the mary3000/gh-3378-absent-null-update branch from 1e1b3e1 to ad8f01e Compare March 22, 2021 20:44
@mary3000
Copy link
Contributor Author

Hi! I didn't review it in details yet, but I see you can simplify and speed up xrow_update_op_append_nulls() a lot. MP_NIL is a single byte, so you can insert a big rope batch in the end (no need to store the nils in individual rope nodes I think), and memset() it with the nil byte code.

Ok, fixed

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for working on this!

Please, add a docbot request and a changelog file.

@@ -310,6 +310,36 @@ xrow_update_array_store(struct xrow_update_field *field,
return out - out_begin;
}

/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of functions we always use /** as comment starter. Lets conform.

* Helper function that appends nils in the end so that op will insert
* without gaps
*/
int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make the function static. Also, as you could see, for update field type the functions has xrow_update_array prefix and pass the field as a first argument (like this in C++). Please, try to keep up.

xrow_update_alloc(rope->ctx, sizeof(*item));
if (item == NULL)
return -1;
uint32_t nil_size = mp_sizeof_nil();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add an assertion that mp_sizeof_nil() == 1, and drop nil_size.

return -1;
uint32_t nil_size = mp_sizeof_nil();
uint32_t byte_count = nil_size * nil_count;
char *pos = (char *) region_alloc(rope->ctx, byte_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Region_alloc does not set diag. You need to do it manually.

return -1;
memset(pos, 0xc0, byte_count);
xrow_update_array_item_create(item, XUPDATE_NOP, pos, byte_count, 0);
if (xrow_update_rope_insert(rope, op->field_no, item, nil_count) != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do return xrow_update_rope_insert() instead of the if. It might even activate the tail-call optimization.

...
s:upsert({0}, {{'=', 42, 42}})
---
...
s:select{}
---
- - [0, 1, 2]
- - [0, 1, 2, null, null, null, null, null, null, null, null, null, null, null, null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of the test with setting a value with an index out of range was to ensure it does not work. Now it works, so better just drop these upsert() with 42th field.

s:replace{1, 2}
s:update({1}, {{'!', 4, 0}})
s:update({1}, {{'#', 4, 1}})
s:drop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add more tests with json paths. What if I use ! and = on a field whose parent's index is out of range? For example [5][1] = 100, where field [5] does not exist. The same with maps ([5].key).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Even behavior isn't considered to be expected (I mean like @Gerold103 suggest to do right now), test on the subj would be nice to see.

@mary3000 mary3000 force-pushed the mary3000/gh-3378-absent-null-update branch from ad8f01e to 71f3a55 Compare March 23, 2021 16:07
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for working on this!

You still didn't add a changelog file and didn't add a docbot request. Please, do. See more comments below.

if (item == NULL)
return -1;
assert(mp_sizeof_nil() == 1);
char *item_data = (char *) region_alloc(rope->ctx, nil_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new code for unary operators (including cast) we don't use whitespace after them.

}
memset(item_data, 0xc0, nil_count);
xrow_update_array_item_create(item, XUPDATE_NOP, item_data,
nil_count, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misaligned line wrap.

...
s:update({1}, {{'!', '[2][42]', 0}})
---
- error: Field ''[2][42]'' was not found in the tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not seem right. Field [2] exists and it is an array. It means there should no be a reason why it can't auto-fill the fields until [42]. Tuple's root array is not any different from the nested arrays. At least this was the idea behind the xrow update code.

Also I realized that probably # should not throw an error. Because if the feature's main purpose is to fix the 'virtual' fields assignment and = will work on them now, then # should work as well. But now it fails. For example:

t = box.tuple.new({1, 2, 3})
t:update({{'#', 5, 1}})

This fails. But if I replace it with = or ! - it works. That makes # and ! not symmetric. Maybe such deletion should become nop when out of range or a key wasn't found (if it deletes from a map). I am afraid you need to ask Mons about that. 2 questions in total:

  1. Do we auto-fill the nested arrays or only the root array?
  2. Should we make # nop when it is out of range? To be symmetric with the new behaviour of !.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should auto-fill nested structures in every case, except type collision (update array field on map)
  2. Agree with you, It's ok to update # also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean as every case? For instance, I have a tuple {1, 2, 3}. And I make update {{'=', '[5].key[3]', 100}}. Should it also auto-create the intermediate array and map so it would result into this: {1, 2, 3, nil, {key = {nil, nil, 100}}}? Or is this an error, and we only auto-fill arrays and never create new arrays and maps on the path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mons, see above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question.
By example from Perl (called autovivification) I know, that feature may lead to unexpected results, but in most cases, it is rather handy.
But since we give an option to update the deep field and give no handy option to update such a field in case of it's absence, I consider we should support auto-creation of missing structures and raise an error only in case of conflicting types (array vs map)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to our last conversation with Mons and Alexander, we decided to make changes regarding auto-fill nested structures in a separate task

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the example I showed above about filling a non-root array? It does not need creation of the nested structures, the array exists already, it is just is not in the root. And yet it does not work now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case will also be implemented in a separate task, according to our discussion. @Mons, @alyapunov i am right?

Copy link
Collaborator

@Gerold103 Gerold103 Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are not going to implement it now, it will work 50/50. For example, it will work if I will json-update some internal array using 2 operations instead of one. The second one would turn it from a bar into a rope in the code, and the auto-fill will work like in the root. But it won't work if the update operation is just one. Which looks broken and inconsistent, when result of an operation depends on presence of other independent operations. That brings the question why is such a half-done and half-working feature even needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move discussion of the issue to the issue.

@EvgenyMekhanik EvgenyMekhanik force-pushed the mary3000/gh-3378-absent-null-update branch from 71f3a55 to ed7fab9 Compare April 6, 2021 15:49
@EvgenyMekhanik EvgenyMekhanik self-assigned this Apr 6, 2021
@EvgenyMekhanik

This comment has been minimized.


* Update operations can't insert with gaps. This patch changes
the behavior so that the update operation fills the missing
fields with zeros.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing (gh-3378) mark at the end

-- gh-3378: allow update absent nullable fields

-- '!'
s = box.schema.create_space('test')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is engine test, let's check it for both memtx and vinyl (just pass engine to create_space method)

pk = s:create_index('pk')
s:replace{1, 2}
s:update({1}, {{'!', 4, 0}})
s:drop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to drop space, just use another tuple or delete it - it should be a bit faster.

---
- [1, 2, null]
...
s:update({1}, {{'+', 3, 0}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get the purpose of the test: I believe the fact that update operation involving null and actual value is checked somewhere else.

---
- [1, 2, null, 0]
...
s:update({1}, {{'#', 4, 1}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about another one 'trim' operation (so that we verify that null is actually stored in the tuple - result should be [1, 2] and not [1, null])?

@EvgenyMekhanik EvgenyMekhanik force-pushed the mary3000/gh-3378-absent-null-update branch 2 times, most recently from 8f96b63 to 3fd2984 Compare April 9, 2021 16:44
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! In the commit message you duplicated the same paragraph. Please, don't. Just keep the docbot request only.

Here is the docbot request with my 2 comments inlined.

    Update operations can't insert with gaps. This patch changes the
    behavior so that the update operation fills the missing fields
    with zeros.

Firstly, the docbot request is not a 'patch'. Secondly, it fills the fields with nulls, no zeros.

    For example we create space s = box.schema.create_space('s'),
    then create index for this space pk = s:create_index('pk'), and then
    insert tuple in space s:insert{1, 2}. After all of this we try to update
    this tuple s:update({1}, {{'!', 5, 6}}), in previous version this operation
    fails with ER_NO_SUCH_FIELD_NO error, and after this patch it's finished
    with success and there is [1, 2, null, null, 6] tuple in space.

The code snippets must be wrapped into the ` signs. Otherwise some of the parts of the code might turn on a formatting rule and the text would turn into a broken mess.


* Update operations can't insert with gaps. This patch changes
the behavior so that the update operation fills the missing
fields with zeros (gh-3378).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with zeros. With nulls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -393,8 +426,11 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op,

struct xrow_update_rope *rope = field->array.rope;
uint32_t size = xrow_update_rope_size(rope);
if (xrow_update_op_adjust_field_no(op, size) != 0)
if (xrow_update_op_adjust_field_no(op, size) != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate behaviour change. Better do it in another commit with its own tests.

--
-- gh-3378: allow update absent nullable fields
-- '!'
s = box.schema.create_space('test', {engine = engine})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need spaces, indexes, or any other storage things. The updates work on tuples. So for non-formatted tuples you could simply use t = box.tuple.new(...) and t:update(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is is engine test, and @Korablev77 write (see his comment above) that i need to check both memtx and vinyl engines. So i need to create space with specified engine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an engine test only because you made so. You could have a separate file, or use box/ folder. Talking of the "need to check both engines" - why? It is not related to engines anyhow as I demonstrated. But up to you.

@Gerold103
Copy link
Collaborator

When I repeatedly said that this implementation leads to inconsistent behaviour, I meant this example:

t = box.tuple.new({1, 2, {11, 22}})
op1 = {'=', '[3][1]', 11}
op2 = {'=', '[3][4]', 44}

-- The operation does not do anything and "works".
t:update({op1})
---
- [1, 2, [11, 22]]
...

-- The operation does not work since the array is not in the root.
t:update({op2})
---
- error: Field ''[3][4]'' was not found in the tuple
...

-- But when I combine them - magic, both work.
t:update({op1, op2})
---
- [1, 2, [11, 22, null, 44]]
...

This is not because of a format, or nullable/not nullable. This is because a single update operation in the xrow update code is optimized as so called 'bar' update (see xrow_update_bar.c). And when there is a second operation touching the same field, it is expanded to a proper array/map. The first version of the patch didn't address that, and the current version also ignores it. The example above is not a feature, it is rather a bug. The behaviour in the example literally depends on whether I add a nop update operation. And I don't see what is so hard in fixing that by making the auto-fill with nils work properly. If you got a bar update, and you see you need nils, you just need to expand it to a proper array right away.

@EvgenyMekhanik
Copy link
Contributor

When I repeatedly said that this implementation leads to inconsistent behaviour, I meant this example:

t = box.tuple.new({1, 2, {11, 22}})
op1 = {'=', '[3][1]', 11}
op2 = {'=', '[3][4]', 44}

-- The operation does not do anything and "works".
t:update({op1})
---
- [1, 2, [11, 22]]
...

-- The operation does not work since the array is not in the root.
t:update({op2})
---
- error: Field ''[3][4]'' was not found in the tuple
...

-- But when I combine them - magic, both work.
t:update({op1, op2})
---
- [1, 2, [11, 22, null, 44]]
...

This is not because of a format, or nullable/not nullable. This is because a single update operation in the xrow update code is optimized as so called 'bar' update (see xrow_update_bar.c). And when there is a second operation touching the same field, it is expanded to a proper array/map. The first version of the patch didn't address that, and the current version also ignores it. The example above is not a feature, it is rather a bug. The behaviour in the example literally depends on whether I add a nop update operation. And I don't see what is so hard in fixing that by making the auto-fill with nils work properly. If you got a bar update, and you see you need nils, you just need to expand it to a proper array right away.

When I repeatedly said that this implementation leads to inconsistent behaviour, I meant this example:

t = box.tuple.new({1, 2, {11, 22}})
op1 = {'=', '[3][1]', 11}
op2 = {'=', '[3][4]', 44}

-- The operation does not do anything and "works".
t:update({op1})
---
- [1, 2, [11, 22]]
...

-- The operation does not work since the array is not in the root.
t:update({op2})
---
- error: Field ''[3][4]'' was not found in the tuple
...

-- But when I combine them - magic, both work.
t:update({op1, op2})
---
- [1, 2, [11, 22, null, 44]]
...

This is not because of a format, or nullable/not nullable. This is because a single update operation in the xrow update code is optimized as so called 'bar' update (see xrow_update_bar.c). And when there is a second operation touching the same field, it is expanded to a proper array/map. The first version of the patch didn't address that, and the current version also ignores it. The example above is not a feature, it is rather a bug. The behaviour in the example literally depends on whether I add a nop update operation. And I don't see what is so hard in fixing that by making the auto-fill with nils work properly. If you got a bar update, and you see you need nils, you just need to expand it to a proper array right away.

Yes, but what we must do in this case for example:
t = box.tuple.new({1, 2, {11, 22}})
op = {'=', '[3][3][1]', 11}
create third field with array or check that first to field are not arrays and fail?
and if we have case ?
t = box.tuple.new({1, 2, {{11, 11}, {22, 22}}})
op = {'=', '[3][3][1]', 11}

@Gerold103
Copy link
Collaborator

Both examples are invalid, because it was decided by Mons, that we should not create maps and arrays along the JSON path. The only thing you can do is to add nils to an array. It does not matter what are the types of the fields in array. Only field count matters.

Prepare this test for upcoming #3378 fix:
bad upserts will become good, so we need
another way to do them.
@EvgenyMekhanik EvgenyMekhanik force-pushed the mary3000/gh-3378-absent-null-update branch from 3fd2984 to dee2ad1 Compare April 13, 2021 11:35
mary3000 and others added 2 commits April 13, 2021 15:20
Update operations could not insert with gaps. This patch changes
the behavior so that the update operation fills the missing fields
with nulls.
Part of #3378

@TarantoolBot document
Title: Allow update absent nullable fields
Update operations could not insert with gaps. Changed the behavior
so that the update operation fills the missing fields with nulls.
For example we create space `s = box.schema.create_space('s')`,
then create index for this space `pk = s:create_index('pk')`, and
then insert tuple in space `s:insert{1, 2}`. After all of this we
try to update this tuple `s:update({1}, {{'!', 5, 6}})`. In previous
version this operation fails with ER_NO_SUCH_FIELD_NO error, and now
it's finished with success and there is [1, 2, null, null, 6] tuple in
space.
In previous patch update(insert) operation for absent nullable fields
was allowed. This patch allows to update(delete) operation for absent
nullable fileds.
Closes #3378
@EvgenyMekhanik EvgenyMekhanik force-pushed the mary3000/gh-3378-absent-null-update branch from dee2ad1 to 1fe1ce3 Compare April 13, 2021 12:20
Copy link
Contributor

@Korablev77 Korablev77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should push patch-set to master as (since it solves the most of user's pain) is and consider what to do with maps/arrays in scope of next release. LGTM

Copy link
Contributor

@alyapunov alyapunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Need to create follow-up.

@Gerold103
Copy link
Collaborator

The patch still introduces inconsistent behaviour as I showed in the example above. So I can't vote it up. But you don't need my opinion here anyway, I see there are 2 other LGTMs overriding my complaint.

@Korablev77 Korablev77 merged commit 6451411 into master Apr 14, 2021
@Korablev77 Korablev77 deleted the mary3000/gh-3378-absent-null-update branch April 14, 2021 08:05
@Gerold103
Copy link
Collaborator

Btw, this commit also introduces a bug similar to this one: #5105. In vinyl, if the operations in my inconsistency example are in separate upserts, the behaviour will depend on whether they were merged and applied, or applied separately. I will file a ticket later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update: allow update absent nullable fields
6 participants