-
Notifications
You must be signed in to change notification settings - Fork 15
Fix using outdated space schema #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
crud/common/utils.lua
Outdated
local space = replicaset.master.conn.space[space_name] | ||
|
||
local master = replicaset.master | ||
if not master.conn:ping({timeout = ping_timeout}) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we'll do an extra PING call on each crud operation?
Although it is a working solution, we need to measure the performance impact. What about pinging once, saving the schema internally and update it when the schema version changes, as it is done inside the connection?
Also, it is possible to fetch schema from replicas too. Probably it may be faster and save time for the master a bit.
I don't believe that issuing ping requests on every operation is a good measure. Instead, we can do the same thing regular box operations do: send our schema ID along with the call and if they diverge with what is currently on the storage, return a special error and the router will then retry. |
That's still not a general purpose solution. There are some examples when schema could be dynamic and schema will be changed quite often - this will lead to continuous retries. while true do
local schema_id = conn.schema_version
local data, err = conn:call(...)
if err ~= nil and conn.schema_version == schema_id then
return nil, err
elseif data ~= nil then
return data
end
end I don't like it. I could update some spaces that don't related to my queries but it will be cause some retries. |
Ugh… I was sure that So calling However we still can send a request without the client side checking of a space presence and it'll be the correct solution. You can even use the local Sure, there are cases, when you need to use some schema information to produce a request, but I don't see such cases here. |
b3b067f
to
fddb05e
Compare
4bf1b1b
to
a54215a
Compare
19bf303
to
a80f6c2
Compare
crud/common/schema.lua
Outdated
return digest.md5(msgpack.encode(space_info)) | ||
end | ||
|
||
function schema.wrap_box_space_func_result(add_space_schema_hash, space, func, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we need to describe why every function in this module are needed in comment. Otherwise it's your personal secret knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what do you mean =(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently you don't understand me. But without good comments you will not understand you in several weeks/months. And nobody won't understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I didn't understand that you mean adding comments for functions, sorry, that's my fault.
Anyway, I've added comments.
crud/delete.lua
Outdated
@@ -21,14 +22,55 @@ local function delete_on_storage(space_name, key) | |||
return nil, DeleteError:new("Space %q doesn't exist", space_name) | |||
end | |||
|
|||
local tuple = space:delete(key) | |||
return tuple | |||
return schema.wrap_box_space_func_result(false, space, space.delete, space, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "false"? Please describe for all places where you use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false
? What?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? Describe why did you pass "false" as the first argument here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it looks bad now (I mean passing this flag as a first argument).
I can add schema.wrap_box_space_func_result_no_schema_hash
for such places, but it will require a comment anyway (to describe a logic "why it don't need space hash").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to all calls.
I've also changed schema.wrap_box_space_func_result
signature to avoid passing space, space:<func>, space
.
a80f6c2
to
9d96220
Compare
crud/common/const.lua
Outdated
local const = {} | ||
|
||
const.RELOAD_RETRIES_NUM = 1 | ||
const.RELOAD_SCHEMA_TIMEOUT = 30 -- half of minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with 1k rps you will create 30k fibers. It's too huge amount. I think you need to decrease to 3-5 seconds timeout.
if result == nil then | ||
result = nil | ||
if tuple == nil then | ||
tuple = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tuple
-> tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, indentation was wrong, but now it's OK.
46bbc14
to
ffa237f
Compare
crud/common/schema.lua
Outdated
|
||
local ok, reload_schema_err = reload_schema(vshard.router.routeall()) | ||
if not ok then | ||
log.debug("Failed to reload schema: %s", reload_schema_err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest warn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In CRUD we trust" say them.
Currently it's not so.
ffa237f
to
a257bd8
Compare
310f820
to
cf454cb
Compare
The problem is that if schema is changed on storages, router still uses
outdated schema from vshard connection. It leads to errors like
"non-existent space" when space is already created.
The solution is quite simple - all such errors leads to
conn:reload_schema()
call and then crud retries. In case of error on the storage-side (e.g. in
*_object
functions when object was flattened using outdated space format) space schema
hash is returned. Router compares this hash with hash of schema that was used
on router-side, and reloads schema if hashes are different.
conn:reload_schema()
is called by one fiber at time - if reloading is in progress,other fibers just wait for it to end.
Closes #98