Skip to content

moc3 and caff: add decoder #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Aug 20, 2023
Merged

moc3 and caff: add decoder #747

merged 20 commits into from
Aug 20, 2023

Conversation

Ronsor
Copy link
Contributor

@Ronsor Ronsor commented Aug 18, 2023

This PR adds support for the proprietary MOC3 and CAFF formats used by the Live2D Cubism software (https://live2d.com/en/).

Notes:

Copy link
Owner

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look and think it looks great overall 👍 had some minor comments. I will take a deeper look and try the decoder myself later today.

BTW do you know if there are MIT licenses etc model files somewhere we can use for tests? or could we alternatively create some simple models in live2d and save and use those? the reason i really want test files is to be able to refactors and do changes with some kind of insurance to not break or change things in a bad way.

obfsU64 := func(d *decode.D) uint64 { return d.U64() ^ (obfsKey<<32 | obfsKey) }
obfsBool := func(d *decode.D) bool { return obfsU8(d) != 0 }

// "Big Endian Base 128" - LEB128's strange sibling
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 i guess strange enough to not include in *decode.D?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but dealing with the obfuscation makes it awkward and I don't know of any other format that uses it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, lets keep it here

moc3Version3_00_00: {Sym: "3_00_00", Description: "3.0.00 - 3.2.07"},
moc3Version3_03_00: {Sym: "3_03_00", Description: "3.3.00 - 3.3.03"},
moc3Version4_00_00: {Sym: "4_00_00", Description: "4.0.00 - 4.1.05"},
moc3Version4_02_00: {Sym: "4_02_00", Description: "4.2.00 - 4.2.02"},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want sym can be a number also but if it's major.patch.minor etc i guess it make no sense... maybe could be struct with "value" ints (d.FieldValue*) but this is probably fine also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly chose the V3_00_00, etc. strings to match the enum from the spec.

if !isBigEndian {
d.Endian = decode.LittleEndian
}
d.SeekRel(58 * 8)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause "unknown" fields? one way i've done is to explicitly add something like "unused" fields for bit ranges that i known to be unused. "unknown" fq adds automatically for ranges no field as "claimed"

@wader
Copy link
Owner

wader commented Aug 18, 2023

The i386 failure is just about test for probe order and help text is failing. Should be fixable with WRITE_ACTUAL=1 go test ./format and commit the changes to those tests

@Richardn2002
Copy link

Richardn2002 commented Aug 18, 2023

https://github.com/RavioliMavioli/archlive2d
This is a repo containing one moc3 file under the CC0 1.0 license.

https://github.com/shuhaiwen/live2dw
This is a repo containing tons of live2d models but:

  1. The author does not specify the sources clearly, nor specify a license.
  2. They are moc instead of moc3 files.

And I have to admit that I am not involved in the development of this PR, nor have experiences in the moc3 format, which means I do not know if these models are enough for testing.

-- EDIT --

https://github.com/imuncle/live2d
Tons of moc3 files under live2d_3/model/Azue Lane(JP). Still, the license for this repo is questionable, and I suspect these models are Azure Lane game properties.

@wader
Copy link
Owner

wader commented Aug 18, 2023

@Richardn2002 Hey! thanks that is good pointers. Ideally for tests some kind of simplest possible files that still use lots of features is best.

Maybe easiest is to download live2d, run it and save some simple files?

@Richardn2002
Copy link

Yeah, would be nice to use Live2D with all functionalities possible to create a test model. I sadly have not purchased Live2D Cubism. I hope @Ronsor has one copy.

@wader
Copy link
Owner

wader commented Aug 18, 2023

Yeap would be great, but i see there is a free for 42 days version 🤔

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 18, 2023

https://github.com/shuhaiwen/live2dw This is a repo containing tons of live2d models but:

  1. The author does not specify the sources clearly, nor specify a license.
  2. They are moc instead of moc3 files.

Besides the old format, they seem to be taken from Live2D's previous sample downloads page.

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 19, 2023

I made some improvements:

  • Included CAFF test data (test.cmo3)
  • Fixed bug with CAFF deobfuscation
  • Updated the MOC3 specification to add the new fields for version 5 (released a couple days ago).
  • Updated the MOC3 decoder to reflect this.
  • Included MOC3 test data (based on the Arch Chan MOC3, converted to version 5)
  • Improved generated tree structure for MOC3 with better names for array elements and grouping section fields together regardless of version

Copy link
Owner

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this, already looked good and gets better, have a look at my comments.

Would be great to cut down a bit on gap fields, make into explicit unknown/reserved/align/padding and whatnot is possible.

And great with test files, will make future maintenance much easier. Are they ok licensewise? test files are not shipped in the binary but i guess its good to keep the source distribution ok licensewise also.

d.FieldStruct("archive_version", decodeVersion)
d.FieldUTF8("format_id", 4)
d.FieldStruct("format_version", decodeVersion)
obfsKey = d.FieldS32("obfuscate_key")
Copy link
Owner

@wader wader Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If signness is not important i would probably do d.FieldU32("obfuscate_key", scalar.UintHex) to display it as hex. Have look if there are more fields that would be better as hex like addresses, checksums etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out the hard way that it needs to be signed otherwise obfsU64() can return bad data.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha i see 👍

d.FieldStruct("preview_image", func(d *decode.D) {
d.FieldU8("image_format", imageFormatNames)
d.FieldU8("color_type", colorTypeNames)
d.SeekRel(2 * 8)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this and some of the other d.SeekRel be a unused/alignment/padding etc field? i usually try to have no or as few gaps as possibly but i understand it might hard when reversing a format.

BTW when i was talking about "unknown" fields earlier i was confused, they called "gap" fields now, was renamed some time ago so that decoders can use "unknown".

Copy link
Contributor Author

@Ronsor Ronsor Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change those to unusedN fields since I know they're unused here.

For moc3, I should probably handle the section alignment (seems to be 64 bytes).

}
}

br := bitio.NewBitReader(rawBytes, -1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusing APIs for doing this atm. Hopefully in the future there could be some helpers for dealing with compressed/massaged data

func decodeMOC3(d *decode.D) any {
d.FieldUTF8("magic", 4, d.StrAssert("MOC3"))
version := d.FieldU8("version", moc3VersionNames)
isBigEndian := d.FieldBoolFn("is_big_endian", func(d *decode.D) bool { return d.U8() != 0 })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe want to use sym map to a bool? i usually try to make actual the decoded value and then sym the interpretation of the value. That way a user has access to both (toactual/tosym, default is sym valie if set, otherwise actual)

@@ -0,0 +1,1766 @@
package moc3

// https://github.com/OpenL2D/moc3ingbird/blob/master/src/moc3.hexpat
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 interesting to see two implementations, are the comparable completnesswise? and as someone who has worked with both, how do they compare? the go code feels a bit more verbose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go code is largely autogenerated from the original implementation and then edited. They're at feature parity.

Some of the verbosity of the go decoder is probably due to some code duplication I can fix.

sectionOffsets.countInfo = int64(d.FieldU32("count_info"))
sectionOffsets.canvasInfo = int64(d.FieldU32("canvas_info"))

d.FieldStruct("parts", func(d *decode.D) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code makes me think that maybe the decode API should have something similar to Read in encoding/binary hmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldUnmarshalStruct(field string, out any)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap something like that. Auto camel_case names and maybe use struct tags to support some casting. Something for the future, no need for this PR.

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 19, 2023

Are they ok licensewise?

The MOC3 test is CC0 data, the CAFF test is my own work. Licensing should be no problem.

@wader
Copy link
Owner

wader commented Aug 19, 2023

Are they ok licensewise?

The MOC3 test is CC0 data, the CAFF test is my own work. Licensing should be no problem.

Great!

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 19, 2023

@wader What do you think is the best way to add alignment for arrays (at the end, not per field)?

for i := int64(0); i < countInfo.parts; i++ {
d.FieldUTF8NullFixedLen("id", 64)
}
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ronsor you mean alignment here for example? easiest is probably just an extra "ids_alignment" etc field after, other maybe more correct but less user friendly is to make "ids" into struct with "entries" and "alignment" i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there. I'll try both ways to see which is more usable.

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 20, 2023

I settled on making each section a struct with both array and padding fields, and I refactored the code to be a lot less repetitive. Also fixed an issue with decoding version 5 files (the correction is reflected in the specification).

@wader
Copy link
Owner

wader commented Aug 20, 2023

LGTM! Think i'm ready to merge this once CI is green. Do you have something more you want to change or add?

The CI failure seems to be just some field that has been renamed

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 20, 2023

Whoops, forgot to update the test. I think it's ready to be merged now. 🥳

@wader wader merged commit 2eae4c2 into wader:master Aug 20, 2023
@wader
Copy link
Owner

wader commented Aug 20, 2023

Thanks! Great work and hope you find fq usefull! any plans for future contributions?

@Ronsor
Copy link
Contributor Author

Ronsor commented Aug 20, 2023

Right now there's about three other proprietary formats I plan to add decoders for in the near future.

@wader
Copy link
Owner

wader commented Aug 20, 2023

Right now there's about three other proprietary formats I plan to add decoders for in the near future.

Interesting! looking forward

BTW forgot to say that you can add documentation and authorship to a decoder by doing something like https://github.com/wader/fq/blob/master/format/apple/macho/macho.md and https://github.com/wader/fq/blob/master/format/apple/macho/macho.go#L17-L28 and it will show up in cli help and make doc will generate documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants