Skip to content

Conversation

zhongweiy
Copy link

Set 2 type values (~3 and ~4) to lightud type. It is enabled only for
ARM64, whose kernel uses 48bit virtual address.

Change-Id: I331eb1997af35c3ed63d6b15d6d59cfc5bc355e0

Set 2 type values (~3 and ~4) to lightud type. It is enabled only for
ARM64, whose kernel uses 48bit virtual address.

Change-Id: I331eb1997af35c3ed63d6b15d6d59cfc5bc355e0
@MikePall
Copy link
Member

This changeset is way too complex and probably incomplete. And it doesn't solve the general issue, e.g. for 52 bit VA spaces.

lightuserdata is a legacy data type, primarily intended for the classic Lua/C API. And even there, it doesn't have many good use cases.

It's easy enough to work around the 47 bit limit. The Solaris/x64 community has had the same problem, due to their use of the negative part of the address space in user mode. Have a look what they did to change the (few) libraries that didn't play nicely:

  • lightuserdata used as a unique table key, e.g. for the registry: with lua_pushlightuserdata(L, &x) and x on the C stack, replace x with a static variable.
  • lightuserdata used as some kind of cross-object pointer: don't do that -- it doesn't play nicely with the GC, anyway. Replace with proper userdata references.
  • lightuserdata used as a misguided userdata replacement, possibly even setting the lightuserdata metatable (ugh): throw away any such library, replace with a proper FFI binding.

@MikePall MikePall closed this Nov 13, 2016
@Zheng-Xu
Copy link

Currently, we see crashes on 48-bit VA in at least :

  1. LuaJIT-test-cleanup : https://github.com/LuaJIT/LuaJIT-test-cleanup/blob/master/test/misc/catch_wrap.lua
  2. OpenResty core tests : https://github.com/openresty/lua-resty-core , https://github.com/openresty/test-nginx
  3. wrk : https://github.com/wg/wrk

I am afraid there will be too much work if we have to do the work around in the applications.

Is it possible for us to treat lightuserdata differently and do not tag this type?

@MikePall
Copy link
Member

Ignore the failure in catch_wrap.lua.

The OpenResty author is pretty responsive. Just tell him the spots where lightuserdata causes trouble.

There's not a single use of lightuserdata in the current wrk codebase. Any failure must be due to something else.

@Zheng-Xu
Copy link

Zheng-Xu commented Nov 14, 2016

Thanks for your comment!
I see wrk works on 39-bit but crashes on 48-bit. It works with the patch applied on 48-bit. So I guess there might be something wrong with the lightuserdata. We will investigate the root cause.

@zhongweiy
Copy link
Author

@MikePall After some investigation, it turns out that the failure in wrk is caused by lightuserdata use in LuaJIT itself.

I try to work around it in this pull request: #245 . Could you take a look? Thanks!

@Zheng-Xu
Copy link

@MikePall Can we put the limitation and the way of work around somewhere publicly on the LuaJIT website? So that it will be much easy for the users to fix the issue themselves.

@daurnimator
Copy link

lightuserdata used as a unique table key, e.g. for the registry: with lua_pushlightuserdata(L, &x) and x on the C stack, replace x with a static variable.

Will this work with lua_cpcall? It's good practice to allocate a struct on the stack and pass it through from a library callback (where you shouldn't longjmp out) by using lua_cpcall. To use a static variable to work around this would not only be bad code, but break re-entrancy.

ZyX-I added a commit to ZyX-I/neovim that referenced this pull request Jan 20, 2018
vt-alt pushed a commit to altlinux/specs that referenced this pull request Aug 30, 2018
- mike:
  introduce emotion knob (on by default)
  E2K: avoid not-ported-yet BRs (libunwind, luajit)
- disabled elua for aarch64 (LuaJIT/LuaJIT#230)
@waruqi
Copy link

waruqi commented Aug 4, 2020

We can also store the address in string to solve this problem, although there will be some performance loss, but
Usually, the address of most aarch64 devices does not exceed 47bits, we can still use lua_pushlightuserdata normally.

My workaround implementation:

 static __tb_inline__ tb_void_t xm_lua_pushpointer(lua_State* lua, tb_pointer_t ptr) 
 { 
     tb_uint64_t ptrval = (tb_uint64_t)ptr; 
     if ((ptrval >> 47) == 0) 
         lua_pushlightuserdata(lua, ptr); 
     else 
     { 
         tb_char_t str[64]; 
         tb_long_t len = tb_snprintf(str, sizeof(str), "%p", ptr); 
         lua_pushlstring(lua, str, len); 
     } 
 } 

The full implmentation:

https://github.com/xmake-io/xmake/blob/96b0eae26058482a83f5bbf57bab15a834339464/core/src/xmake/prefix.h#L36-L83

@daurnimator
Copy link

We can also store the address in string to solve this problem, although there will be some performance loss, but
Usually, the address of most aarch64 devices does not exceed 47bits, we can still use lua_pushlightuserdata normally.

The main point of a lightuserdata instead of a full userdata is that the allocation of one cannot fail (lua_pushlstring can fail).

@MikePall MikePall added obsolete and removed wontfix labels Sep 29, 2020
@MikePall
Copy link
Member

Resolved in a different way.

sbabic pushed a commit to sbabic/swupdate that referenced this pull request Sep 9, 2021
Current aarch64 (distro) kernels ship with CONFIG_ARM64_VA_BITS_48
enabled, see, e.g., [1] for Debian's kernel, resulting in a LuaJIT
panic "bad light userdata pointer" on aarch64 with distro-packaged
old(er) LuaJIT versions that limit lightuserdata to 47 bit pointers.

One mitigation option is to use a kernel with less VA bits, ruling
out using distro kernels untouched.
Another mitigation is to use the latest LuaJIT as this issue has
been resolved, see [2,3], ruling out using the (older) distro's
LuaJIT package untouched -- until distros eventually catch up.

Hence, to allow SWUpdate on systems with >47 bits VA to run on current
distros' kernel and LuaJIT packages, replace lightuserdata with "full"
userdata. For other systems with with <=47 bits VA, this change has no
effect other than Lua's gc having to clean up the "full" userdata of
sizeof(struct dict*) when it becomes unreferenced:
For sw-description embedded scripts, this is happens after having
interpreted sw-description.
For a Lua handler, this happens at its next call as an explicit free
and gc cycle call would occupy more permanent space.

[1] https://salsa.debian.org/kernel-team/linux/-/blob/bullseye/debian/config/arm64/config#L13
[2] LuaJIT/LuaJIT#49
[3] LuaJIT/LuaJIT#230

Signed-off-by: Christian Storm <christian.storm@siemens.com>
Signed-off-by: Michael Adler <michael.adler@siemens.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants