Skip to content

Add overlay modes for metadata; overwrite, add missing etc. #598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 18, 2024

Conversation

mizaki
Copy link
Contributor

@mizaki mizaki commented Feb 4, 2024

This is the barebones of some overlay options. It's missing the settings which I will add if you are happy with the general idea. It's also missing any tests for the new options. I have done nothing with the notes as yet.

When it comes to the notes, using CVDB xxxxx will still work with Mylar3 and as long as Issue ID is not used (such as Metron Issue ID and instead Metron ID) then the multi-source Tagged with CT etc. etc. CVDB xxxxx, GCD xxxxxx, Metron xxxxx will not cause problems. I've checked Komga, Kavita and LANraragi and none appear to use the notes field.

@mizaki
Copy link
Contributor Author

mizaki commented Feb 14, 2024

I've tried to simplify things with ddbc5d8 (just the assign_ changes for now) I didn't know if it's better check if a string is 0 rather that just set the cur to whatever new is even if it was an empty string?

@mizaki
Copy link
Contributor Author

mizaki commented Feb 14, 2024

When it comes to saving the setting, is it best to use the Enum number or name?

@lordwelch
Copy link
Member

When it comes to saving the setting, is it best to use the Enum number or name?

parser.add_setting("value", default=ENUM.VALUE, type=ENUM, cmdline=False)

This won't work with int values with our current file loading code.
We would need or isinstance(default, Enum) or (isinstance(setting.type, type) and issubclass(setting.type, Enum)) added to

if isinstance(value, str):

@mizaki
Copy link
Contributor Author

mizaki commented Feb 14, 2024

Is 75c825b in the right area with the class?

Copy link
Member

@lordwelch lordwelch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a single test for each mode that covers each type so that we know what it actually does, it will also make some of the mistakes a little more obvious.

Comments are in the wrong place now but still apply.

@mizaki
Copy link
Contributor Author

mizaki commented Feb 15, 2024

I'm trying to understand the current overlay:

    def assign_overlay(self, cur: str, new: Any) -> None:
        """Overlay - When the new object has non-None values, over-write them to this one."""
        if new is not None:
            if isinstance(new, str) and len(new) == 0:
                if isinstance(getattr(self, cur), (list, set)): # If an empty string is passed to a list/set field then it is cleared
                    getattr(self, cur).clear()
                else:
                    setattr(self, cur, None) # Otherwise it's any empty string in a string field and should set to None
            elif isinstance(new, (list, set)) and len(new) == 0: # If an empty list/set is passed, do nothing. Why does this not clear the field the same as an empty string?
                pass
            else:
                setattr(self, cur, new) # No type checking, just set the `cur` to `new`

Should an empty list/set not clear it? Am I missing something?

And in general, should assign be doing type checking?

@lordwelch
Copy link
Member

An empty list is the default

@mizaki
Copy link
Contributor Author

mizaki commented Feb 16, 2024

An empty list is the default

But should a list/set expect an empty string? That should be a type error?

@mizaki
Copy link
Contributor Author

mizaki commented Feb 16, 2024

I've added some tests and given each one a doc string so hopefully it makes more sense now. Credit tests are left to do.

@lordwelch
Copy link
Member

But should a list/set expect an empty string? That should be a type error?

I believe this was set in order to make the -m option work for clearing lists

@mizaki
Copy link
Contributor Author

mizaki commented Feb 16, 2024

I had to duplicate the StrEnum code from resulttype.py as using it caused an import loop. I guess the best thing to do is move it to a new/another file, if so, which one?

@lordwelch
Copy link
Member

I had to duplicate the StrEnum code from resulttype.py as using it caused an import loop. I guess the best thing to do is move it to a new/another file, if so, which one?

Copying it is fine, it'll get removed eventually when CT no longer supports 3.10

@lordwelch
Copy link
Member

Also as a nitpick (because I don't remember to do this all the time anyways) you can change:

        if new is not None:
            ...

to

    if new is None:
        return None
    ...

to remove another indent level.

As it is now it's 10x better than the 8 levels of indent originally.

@mizaki
Copy link
Contributor Author

mizaki commented Feb 17, 2024

Part of the problem with overwrite and overwrite_no_clear was me not understanding the issue of a string clearing a list/set.

Currently

def parse_metadata_from_string(mdstr: str) -> GenericMetadata:
doesn't do any type conversions which it probably should here?
setattr(md, key, value)

Or, the GenericMetadata.__post_init__ could do checks and conversions?
That way, overlay doesn't need to worry about the wrong type when replacing/adding/merging.

@mizaki
Copy link
Contributor Author

mizaki commented Mar 2, 2024

I've think I finally see the issue with the -m and clearing list/sets. Using the wrong type for the commandline metadata for a list/set does seem the easiest solution. The other option might be: Filename, commandline, read style(s) and talker metadata have different OverlayModes. Maybe not all need to be a user available settings (but they could).

I've put comments of Should always be "overlay" mode? where I thought the option should be static.

Another (bad) idea would be to mark the origin of the data (maybe re-purpose/use TagOrigin) in the GenericMetadata.

@lordwelch
Copy link
Member

You use the quirk on the commandline that an empty string value clears a list as a reason for why these options need to exist..... And then you explicitly say that none of the options you created should be used......

        md.overlay(self.config.Runtime_Options__metadata)  # Should always be "overlay" mode?

@mizaki
Copy link
Contributor Author

mizaki commented Mar 3, 2024

You use the quirk on the commandline that an empty string value clears a list as a reason for why these options need to exist..... And then you explicitly say that none of the options you created should be used......

        md.overlay(self.config.Runtime_Options__metadata)  # Should always be "overlay" mode?

It depends on the answer to the question of keep using the quirk or not? If it is used then OverlayMode.overwrite for example is required for that reason (i.e. having to check if a list is a list). If, say, TagOrigin is used to demarcate if the data is from commandline, file name, etc. then OverlayMode.overwrite could be the same (and thus removed and do a copy). It could also be that

# finally, use explicit stuff
md.overlay(self.config.Runtime_Options__metadata)

Does not use the user available setting and is always OverlayMode.overlay with the list/set == str in whereas the other modes do not check.

Having the user choose for all four of the origins (file name, read style, commandline and talker) might lead to confusion?

@lordwelch
Copy link
Member

It depends on the answer to the question of keep using the quirk or not

No. You have several different modes that you have created. When I said to remove them or asked why they exist you explicitly said that they existed because of the quirk and then you explicitly don't use them the only place the quirk happens. Remove the useless modes. It's a stupid quirk that probably shouldn't ever have been added in the first place.

If something needs to change about how metadata is merged from different sources that needs to change where the decision is made, in this case cli.py create_local_metadata. Creating an entire complicated system to facilitate a quirk is stupid. It was implemented as a tiny 7 line change in 539aac1 to allow unsetting the credits #587. If you can turn this into a 50 line change (minus tests) I'll merge it as is but this is over 700 lines changed, most of which is just to satisfy a stupid quirk that doesn't even get executed.

@mizaki mizaki force-pushed the metadata_overlay branch from 3203103 to 6063553 Compare March 5, 2024 01:00
@mizaki
Copy link
Contributor Author

mizaki commented Mar 5, 2024

I've squished and targeted what is currently already there. The overlay options will only apply to source (talker) information currently. So this option will become more relevant when the multi-read of metadata styles is done.

I have also renamed Comic Book Lover to Metadata Options and moved those that are to do with metadata into it.

# otherwise, add it!
else:
self.add_credit(c["person"], c["role"], primary)
cur_credits.append(Credit(person=c["person"], role=c["role"], primary=primary))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the credit appends need to do the same checks as self.add_credit

It might be better to merge them and then clear the current credits and just add them at the end

credits = assign_credit_func(cur_credits, new_credits)
self.credits.clear()
for credit in credits:
    self.add_credit(**credit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept it a return value so it's not different to the others. I've also removed the empty person to remove role.

As the add_credit is done to each md I did a dupe check so using add_credit isn't used again. I think it makes sense but if you want to still use it, I'll change it.

Comment on lines 534 to 538
<string>Overlay - When the new source metadata has non-empty values, overwrite the current metadata

Add Missing - Any values that are empty in the current metadata that are in the new source metadata, add those values

Combine - Combine lists and use the new (non-empty) source metadata for all other values</string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the wording here. What do you think of

Overlay - Overlays new metadata on top of the original.
(Series=batman, Issue=1, Tags=[batman,joker,robin])
+ (Series=Batman, Publisher=DC Comics, Tags=[mystery,action])
= (Series=Batman, Issue=1, Publisher=DC Comics, Tags=[mystery,action])

Add Missing - Metadata missing from the original is added from the the new metadata.
(Series=batman, Issue=1, Tags=[batman,joker,robin])
+ (Series=Superman, Issue=10, Publisher=DC Comics, Tags=[superman,lois lane,lex luthor])
= (Series=batman, Issue=1, Publisher=DC Comics, Tags=[batman,joker,robin])

Combine - Same as Overlay except lists (tags, characters, genres, credits, etc...) are combined.
(Series=batman, Issue=1, Tags=[batman,joker,robin])
+ (Series=Batman, Publisher=DC Comics, Tags=[mystery,action])
= (Series=Batman, Issue=1, Publisher=DC Comics, Tags=[batman,joker,robin,mystery,action])

But now that's a lot more text....

How about making this a radio box so each item can have it's own description on the screen and we can make the example the tool tip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples should make things easier to understand.

I'm thinking of a text box (on the right as there is space) that changes to reflect the current chosen or, a tabbed text box for the options.

This is kind of related but I think it makes sense to have a choice for the read styles and the source data (with commandline always in overlay with clearing of lists etc. and having primacy over read styles). So having two combo boxes makes the most sense then rather than radio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I went with. Could possible have it switch tab when a combo box is changed.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read style and source info need to have similar text as they are the same operation but the metadata comes from a different place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about covering both? Something like:

Source Info: Overlays the new 'source info'  metadata on top of the current metadata as taken from the file name parser or read style (unless 'Overwrite metadata was selected)
Read Style: Overlays subsequent metadata after the first read style from the next read style as set by the user

The read style text above probably needs some simplification but that general idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have the origin of the data "read", "source" and then we have "style" indicating there is some sort of style to the reading, then we have "info" which doesn't contribute any real meaning and doesn't follow the other control. They need to have consistent names. Imagine if on the front page we had this
Screenshot 2024-03-17 at 4 19 18 PM
That just adds confusion, because instead of keeping consistent names for the controls it uses completely different terms. I really care what terminology you use but it needs to be consistent. And if the name can indicate that "source" is from comicvine (or one of the others) then it needs a description as well (either as a tooltip or a label)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you said to use "style" for the read and modify when it's user facing so I did that. "Data Source" IIRC I changed because of the space it takes up in the tagger window. And then in the settings it's still "Metadata Source", so I think that's the root of the problem I made. In the auto-tag window the option is labelled Overwrite metadata so maybe removing the word "Source" from the tagger window and simply put "Metadata" is the answer?

The tool tip for both the label and combox for source info is:

The operation to perform when overlaying source data (Comic Vine, Metron, GCD, etc.)

and for read style(s):

The operation to perform when overlaying read styles (ComicRack, ComicBookInfo, etc.)

Are they okay as is?

And the general format of using the one tab to explain both "Metadata" and "Read Style(s)" is okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the point. That was just an example. I don't really care what terminology you use, it just needs to be consistent with the control it is next to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to "Data Source" to match the tagger window.

I've changed the description to this:

Read Style: Overlays all the none empty values of the read style metadata on top of the current metadata (i.e. file name parsing, other read style(s)).
Data Source: Overlays all the none empty values from the data source (e.g. Comic Vine) on top of the current metadata (i.e. file name parsing, read style(s)).

I think now that with the wording there is no need for separate descriptions for "Read Style" and "Data Source"?

@mizaki mizaki force-pushed the metadata_overlay branch from 6fb2d58 to 1eee20d Compare March 16, 2024 01:54
<item>
<widget class="QPlainTextEdit" name="combineTextEdit">
<property name="plainText">
<string>Same as Overlay except lists (tags, characters, genres, credits, etc.) are combined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate with the current code anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this:

Combines lists (tags, characters, genres, credits, etc.) replacing any duplicates with the new record. All other non-empty values replace any current.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the code to not match the description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I missing?

Combines lists (tags, characters, genres, credits, etc.) replacing any duplicates with the new record. All other non-empty values replace any current.

I think that is correct? "replacing" maybe isn't the perfect word, "using the new value with any duplicate items"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current description says "Same as Overlay except for lists...." but, it isn't, can we just make the code match the description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you now. It was a 50/50; change the description or change the code and I chose poorly.

If the command line empty string is taken out (presuming that command line metadata will always be used in overlay mode) then they look to do the same outside of lists/sets?

    def assign_overlay(self, cur: Any, new: Any) -> Any:
        """Overlay - When the 'new' object has non-None values, overwrite 'cur'(rent) with 'new'."""
        if new is None:
            return cur
        # An empty string can be used to clear a list or set via the commandline
        if isinstance(new, str) and len(new) == 0:
            if isinstance(cur, (list, set)):
                cur.clear()
                return cur
            else:
                return None
        elif isinstance(new, (list, set)) and len(new) == 0:
            return cur
        else:
            return new
    def assign_combine(self, cur: Any, new: Any) -> Any:
        """Combine - Combine lists and sets. All non-None values from new replace cur(rent)"""
        if new is None:
            return cur
        if isinstance(new, (list, set)) and isinstance(cur, (list, set)):
            if len(new) > 0 and isinstance(new, list) and isinstance(new[0], Url):
                return list(set(new).union(cur))
            return self.assign_dedupe(new, cur)
        else:
            return new

Apologies if I'm missing the obvious.

@@ -1166,7 +1166,8 @@ def query_online(self, autoselect: bool = False, literal: bool = False) -> None:
description=cleanup_html(
new_metadata.description, self.config[0].Sources__remove_html_tables
),
)
),
self.config[0].Metadata_Options__source_overlay, # command line?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the question here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a question I meant to ask: On the command line the styles from the file setting can be overridden but not saved. Should overlay do the same?

Comment on lines 337 to 328
if len(new) > 0 and isinstance(new, list) and isinstance(new[0], Url):
return self.weblinks_dedupe(new, cur) # type: ignore[arg-type]
return self.assign_dedupe(new, cur)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails when there are no new links and there is a current link. Probably a good test case.

@mizaki mizaki force-pushed the metadata_overlay branch from 5f1b4f7 to b0cc000 Compare April 21, 2024 00:08
@mizaki mizaki force-pushed the metadata_overlay branch 2 times, most recently from 67bb38a to 2072150 Compare May 6, 2024 21:58
@mizaki
Copy link
Contributor Author

mizaki commented May 6, 2024

Only other possible change is tag_origin to a list?

@mizaki mizaki force-pushed the metadata_overlay branch from 7298a67 to 250d777 Compare May 11, 2024 21:26
@mizaki
Copy link
Contributor Author

mizaki commented May 11, 2024

I've added the "lists" option. I feel like something is missing now but I have no idea what.

Two dropdowns with only two options seems too few!

@lordwelch lordwelch closed this May 11, 2024
@lordwelch lordwelch reopened this May 11, 2024
@lordwelch lordwelch closed this May 12, 2024
@lordwelch lordwelch reopened this May 12, 2024
@mizaki
Copy link
Contributor Author

mizaki commented May 14, 2024

Thinking about "lists", perhaps it should be three options: merge, replace or ignore (leave original data)?

@lordwelch
Copy link
Member

replace is the wrong word. It falls back to whatever the merge mode is

@mizaki
Copy link
Contributor Author

mizaki commented May 15, 2024

"ignore" might be the final option needed for attributes and lists. That way a user can update only attributes or lists. That would bring lists to a select the same as attributes and bring attributes to three options as well.

Shall I add an ignore and convert "lists" options from a bool?

@lordwelch
Copy link
Member

I don't see why someone would treat lists like that. Let's wait until someone has a usecase for it

@lordwelch lordwelch merged commit 250d777 into comictagger:develop May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants