Skip to content

Conversation

ggcrunchy
Copy link
Contributor

In this post I attached some work in progress on a MoonNuklear plugin, and more specifically its Solar backend. (Links are one post up.) @Shchvova This is the same thing I sent you earlier.

Nuklear outputs to a couple big memory blocks: index and vertex data, respectively. I was able to use a memory-aware serialization library to extract the results and feed them into meshes, which are the best fit since the data involves indices and texture coordinates.

This is quite expensive: there are a lot of Lua calls just to deserialize from the memory blocks, but then we also need to stuff the results back into arrays and / or use a lot of per-element set*() calls.

More serious still, we're getting successive bulk loads out of the same memory buffers, so Nuklear has to offset into them. So with the current mesh API, we need to pad the front of our vertices / uvs arrays with a lot of junk after the first call, with the cost that incurs both when populating and when reading it back out.

With the above test, it's laggy even on a desktop machine.


This PR lets a mesh use memory buffers directly, both from display.newMesh() and mesh.path:update(). At the moment these must be full userdata—so originating from a plugin—but ideally would only need to be "readable bytes" down the road.

These buffers can be supplied to indices, vertices, and uvs in lieu of the current table input, and are checked for before resorting to that. A raw buffer may be used, or else a table with the following keys:

  • buffer should be a memory buffer as described, and its presence distinguishes this table from the usual input.
  • count is optional and says how many elements to use, if it's fewer than what would fit in the buffer (the default).
  • offset says how far into the buffer the first element starts, and is 0 by default.
  • stride says how far to go from the start of one element to the next, and is the element size by default.

Only the count values are provided to the output.

The element sizes are assumed to be:

  • index = 2 (unsigned 16-bit integer)
  • vertex or uv = 8 (two single-precision floats)

(I've tried to error-check this quite thoroughly but might still be missing a case or two.)

This approach cuts out a huge amount of the overhead described above, and the results I'm seeing are quite smooth. Here's a revised test for the plugin with this PR:

TEST_MOONNUKLEAR.zip

(No changes were needed to the plugin binaries, only the Lua backend.)


I haven't attempted anything in this regard, but I imagine at least some of this could be applied to a native Spine library, like what @zhiyangyou was exploring.

There are probably other new avenues to explore, like fast mesh native plugins. I'm thinking of some old deformation stuff I've looked at, or maybe a "Take Three" at fur rendering.


I also added support for fillVertexColors to mesh.path:update().

It follows the same buffer policy as the other inputs, with element size = 4 (four unsigned 8-bit integers).

There were a couple small functions added to Rtt_ShapePath to accommodate this. (One for stroke colors, which I haven't added, but probably wouldn't be too much work, really.)

So far this is buffer-only, but there's a stub for a table approach too.


I alluded to Nuklear offseting into the buffer. You could do that manually with the offset for everything, but this can be mostly automated in the indexed case by limiting any ranges according to the indices available.

If indices is present and narrowIndexedRanges is true, the lowest and highest index values are found. The indices are then rebased, i.e. the lowest value is subtracted from each index. Furthermore, only the referenced vertices, uvs, and / or colors —i.e. those in the range from the lowest to highest index—are used.

In mesh.path:update() we might not want to revisit the indices, so lowestIndex can be provided instead of indices.

This required a little rearrangment of the logic to put the indices first. @depilz has tested some recent results and said it was still working well. 🙂


On that note, mesh:getLowestIndex() is available. This value is updated after a display.newMesh() or mesh.path:update() when it has indices + narrowIndexedRanges.


I discovered some speed issues with Array::PadToSize() early on. I slightly revised it (if the type is "plain old data") to just set the length up-front and then fill memory in a loop. (I explored memset if all bytes of the pad value matched, but it was a bit complex and I didn't want to do much testing.) Otherwise it'll still do the iterated Append()s with constructors.

More concerning, I saw that Reserve() was passing the element count and size in reverse order to Preallocate(). Allocating 128K indices this way (full buffer, during early usage), fLength was getting 2 (size of a U16) and the element size 128K! Doing the aforementioned Append() loop had to keep bringing the length up in size and reallocating. (I don't know if the overblown element size still figured into it, but for a while my attempts just weren't running, period, so maybe it did.)

ggcrunchy and others added 28 commits February 7, 2019 17:09
…nd buffer and assignment logic

Next up, prepare details from graphics.defineEffect()
…from Program

Slight redesign of TimeTransform with caching (to synchronize multiple invocations with a frame) and slightly less heavyweight call site
Err, was calling Modulo for sine method, heh

That said, time transform seems to work in basic test
Fixed error detection for positive number time transform inputs

Relaxed amplitude positivity requirement
Maintenance Revert Test
My mistake adding back changes made
Some additions to vertex colors for same
Little refinements to InitializeMesh() and update()... but still nothing visible
Relocated buffer hacks to display library

Some rearrangement and fixes for buffer stuff, in particular adding more size support in InitializeMesh()
… removed associated hacks

Added getLowestIndex() method to mesh path, which will contain the lowest index (or 0, by default) as of last update

A tiny bit of cleanup of logic with some utilities
…lly, positions) due to details of vertex offset (still okay for uvs, fill colors)

Owing to that, removed any padding logic

Added lowestIndex as update parameter (explicit, rather than using method)

Unifying the vector lengths referred by GetBuffer() in update(), since they will all be the vertices count

update() now also takes adjustIndices, rather than rebase... might still change both to some better name
Cleanup of development notes-to-self

Fixup of some warnings
…e been 0xFFFF, new check handles 0 instead, since it's maxIndex's default value)
@ggcrunchy ggcrunchy requested a review from Shchvova as a code owner December 22, 2022 03:41
@depilz
Copy link
Contributor

depilz commented Dec 30, 2022

I tested it on Simulator and Android running some complex Spine animations. It works flawlessly.

@Shchvova Shchvova changed the base branch from master to experimental January 24, 2023 04:27
@Shchvova Shchvova merged commit 92129e3 into coronalabs:experimental Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants