-
Notifications
You must be signed in to change notification settings - Fork 238
moc3 and caff: add decoder #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick look and think it looks great overall 👍 had some minor comments. I will take a deeper look and try the decoder myself later today.
BTW do you know if there are MIT licenses etc model files somewhere we can use for tests? or could we alternatively create some simple models in live2d and save and use those? the reason i really want test files is to be able to refactors and do changes with some kind of insurance to not break or change things in a bad way.
obfsU64 := func(d *decode.D) uint64 { return d.U64() ^ (obfsKey<<32 | obfsKey) } | ||
obfsBool := func(d *decode.D) bool { return obfsU8(d) != 0 } | ||
|
||
// "Big Endian Base 128" - LEB128's strange sibling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄 i guess strange enough to not include in *decode.D
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, but dealing with the obfuscation makes it awkward and I don't know of any other format that uses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, lets keep it here
format/moc3/moc3.go
Outdated
moc3Version3_00_00: {Sym: "3_00_00", Description: "3.0.00 - 3.2.07"}, | ||
moc3Version3_03_00: {Sym: "3_03_00", Description: "3.3.00 - 3.3.03"}, | ||
moc3Version4_00_00: {Sym: "4_00_00", Description: "4.0.00 - 4.1.05"}, | ||
moc3Version4_02_00: {Sym: "4_02_00", Description: "4.2.00 - 4.2.02"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want sym can be a number also but if it's major.patch.minor etc i guess it make no sense... maybe could be struct with "value" ints (d.FieldValue*
) but this is probably fine also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly chose the V3_00_00
, etc. strings to match the enum from the spec.
format/moc3/moc3.go
Outdated
if !isBigEndian { | ||
d.Endian = decode.LittleEndian | ||
} | ||
d.SeekRel(58 * 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause "unknown" fields? one way i've done is to explicitly add something like "unused" fields for bit ranges that i known to be unused. "unknown" fq adds automatically for ranges no field as "claimed"
The i386 failure is just about test for probe order and help text is failing. Should be fixable with |
https://github.com/RavioliMavioli/archlive2d https://github.com/shuhaiwen/live2dw
And I have to admit that I am not involved in the development of this PR, nor have experiences in the moc3 format, which means I do not know if these models are enough for testing. -- EDIT -- https://github.com/imuncle/live2d |
@Richardn2002 Hey! thanks that is good pointers. Ideally for tests some kind of simplest possible files that still use lots of features is best. Maybe easiest is to download live2d, run it and save some simple files? |
Yeah, would be nice to use Live2D with all functionalities possible to create a test model. I sadly have not purchased Live2D Cubism. I hope @Ronsor has one copy. |
Yeap would be great, but i see there is a free for 42 days version 🤔 |
Besides the old format, they seem to be taken from Live2D's previous sample downloads page. |
…can't be decoded as a format
The obfuscation key is actually a signed integer, and thanks to two's complement arithmetic and sign extension, this is actually significant.
I made some improvements:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this, already looked good and gets better, have a look at my comments.
Would be great to cut down a bit on gap fields, make into explicit unknown/reserved/align/padding and whatnot is possible.
And great with test files, will make future maintenance much easier. Are they ok licensewise? test files are not shipped in the binary but i guess its good to keep the source distribution ok licensewise also.
d.FieldStruct("archive_version", decodeVersion) | ||
d.FieldUTF8("format_id", 4) | ||
d.FieldStruct("format_version", decodeVersion) | ||
obfsKey = d.FieldS32("obfuscate_key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If signness is not important i would probably do d.FieldU32("obfuscate_key", scalar.UintHex)
to display it as hex. Have look if there are more fields that would be better as hex like addresses, checksums etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out the hard way that it needs to be signed otherwise obfsU64()
can return bad data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha i see 👍
format/caff/caff.go
Outdated
d.FieldStruct("preview_image", func(d *decode.D) { | ||
d.FieldU8("image_format", imageFormatNames) | ||
d.FieldU8("color_type", colorTypeNames) | ||
d.SeekRel(2 * 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this and some of the other d.SeekRel
be a unused/alignment/padding etc field? i usually try to have no or as few gaps as possibly but i understand it might hard when reversing a format.
BTW when i was talking about "unknown" fields earlier i was confused, they called "gap" fields now, was renamed some time ago so that decoders can use "unknown".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change those to unusedN
fields since I know they're unused here.
For moc3, I should probably handle the section alignment (seems to be 64 bytes).
} | ||
} | ||
|
||
br := bitio.NewBitReader(rawBytes, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for confusing APIs for doing this atm. Hopefully in the future there could be some helpers for dealing with compressed/massaged data
format/moc3/moc3.go
Outdated
func decodeMOC3(d *decode.D) any { | ||
d.FieldUTF8("magic", 4, d.StrAssert("MOC3")) | ||
version := d.FieldU8("version", moc3VersionNames) | ||
isBigEndian := d.FieldBoolFn("is_big_endian", func(d *decode.D) bool { return d.U8() != 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe want to use sym map to a bool? i usually try to make actual the decoded value and then sym the interpretation of the value. That way a user has access to both (toactual
/tosym
, default is sym valie if set, otherwise actual)
@@ -0,0 +1,1766 @@ | |||
package moc3 | |||
|
|||
// https://github.com/OpenL2D/moc3ingbird/blob/master/src/moc3.hexpat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 interesting to see two implementations, are the comparable completnesswise? and as someone who has worked with both, how do they compare? the go code feels a bit more verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Go code is largely autogenerated from the original implementation and then edited. They're at feature parity.
Some of the verbosity of the go decoder is probably due to some code duplication I can fix.
sectionOffsets.countInfo = int64(d.FieldU32("count_info")) | ||
sectionOffsets.canvasInfo = int64(d.FieldU32("canvas_info")) | ||
|
||
d.FieldStruct("parts", func(d *decode.D) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code makes me think that maybe the decode API should have something similar to Read
in encoding/binary
hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldUnmarshalStruct(field string, out any)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap something like that. Auto camel_case names and maybe use struct tags to support some casting. Something for the future, no need for this PR.
The MOC3 test is CC0 data, the CAFF test is my own work. Licensing should be no problem. |
Great! |
@wader What do you think is the best way to add alignment for arrays (at the end, not per field)? |
format/moc3/moc3.go
Outdated
for i := int64(0); i < countInfo.parts; i++ { | ||
d.FieldUTF8NullFixedLen("id", 64) | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ronsor you mean alignment here for example? easiest is probably just an extra "ids_alignment" etc field after, other maybe more correct but less user friendly is to make "ids" into struct with "entries" and "alignment" i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there. I'll try both ways to see which is more usable.
I settled on making each section a struct with both |
LGTM! Think i'm ready to merge this once CI is green. Do you have something more you want to change or add? The CI failure seems to be just some field that has been renamed |
Whoops, forgot to update the test. I think it's ready to be merged now. 🥳 |
Thanks! Great work and hope you find fq usefull! any plans for future contributions? |
Right now there's about three other proprietary formats I plan to add decoders for in the near future. |
Interesting! looking forward BTW forgot to say that you can add documentation and authorship to a decoder by doing something like https://github.com/wader/fq/blob/master/format/apple/macho/macho.md and https://github.com/wader/fq/blob/master/format/apple/macho/macho.go#L17-L28 and it will show up in cli help and |
This PR adds support for the proprietary MOC3 and CAFF formats used by the Live2D Cubism software (https://live2d.com/en/).
Notes:
.cmo3
and.can3
.