Skip to content

Conversation

cthoyt
Copy link
Contributor

@cthoyt cthoyt commented Sep 4, 2024

Closes #10152

Enable annotating author roles using the Contribution Role Taxonomy (CRediT) and export this information in conformant JATS (#10153, Charles Tapley Hoyt and Jez Cope).

Motivation

I'm motivated to add this to Pandoc since I want the Journal of Open Source Software (JOSS), which is built on top of Pandoc, to be able to create compliant JATS. We're already adding support for encoding this information in article metadata in parallel in openjournals/inara#87.

Demo

Now, you can specify an authors' role(s) in the metadata like this:

title: CRediT Test
author:
  - name: Max Mustermann
    roles:
      - credit: software
        credit-name: Software    # optional, looked up automatically
        degree: Lead             # optional
        name: Programas          # optional, useful for internationalization

where the only required key inside each roles dictionary is credit. If that's there, then the primary English label can be automatically looked up and put into credit-name. If you want to override for internationalization purposes, you can use name.

The degree field is optional, and can be one of Lead, Supporting, or Equal as defined in the JATS standard.

Testing

The tests (with narrative documentation) can be found in https://github.com/cthoyt/pandoc/blob/patch-1/test/command/10152.md. Run these tests specifically with cabal test --test-options="-p 10152.md"

I also put the 7 outputs in 10152.md as XML files inside examples.zip. I tested all of them are valid/conformant JATS using the web-based tool at https://jats4r-validator.niso.org. The zip archive can be downloaded and you can run these tests yourself, if desired.

Acknowledgements

huge shout-out to John for many rounds of helping me get comfortable with Haskell and the Pandoc codebase, and @jezcope for helping me get across the finish line. This PR would have only been a half-finished idea without them.

@jgm
Copy link
Owner

jgm commented Sep 5, 2024

You should add a "command test" to test the proposed feature, and to illustrate how it works.

See test/command/*.md for examples.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 24, 2024

Hi @jgm , I think I figured out making a test file. I have a few questions I bet you could answer very quickly but would take me a long time to figure out:

  1. Is there a specific naming convention I should use for my test command file?
  2. I created a Credit.hs that contains a dictionary that I want to inject into the JATS output process. Where should I look to do that? you can see in the diff what I am currently trying in the JATS writer

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

Is there a specific naming convention I should use for my test command file?

Usually we try to name them after the associated issue. So, if there's an open issue that this fixes you could use that number; otherwise you could use the PR's number, 10153.md. That makes it easy to figure out later why the test was added.

I created a Credit.hs that contains a dictionary that I want to inject into the JATS output process. Where should I look to do that? you can see in the diff what I am currently trying in the JATS writer

Instead of creating a separate module, I'd prefer if you just included this map in the JATS writer module itself.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 24, 2024

@jgm thanks for that feedback, I addressed both of your points. Now, the only thing left is how to make this dictionary (renamed to creditNames) available inside the template article.jats_publishing.

My first try was $ defField "creditNames" creditNames because this group is also where some other things that appear in the template are referenced, but I am unfortunately not familiar with Haskell and this is a very complex codebase! Any help appreciated :)

https://github.com/cthoyt/pandoc/blob/71d53c6e2635ab23929287a94087bb50ee56b79b/src/Text/Pandoc/Writers/JATS.hs#L183

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

My first try was $ defField "creditNames" creditNames

what happens when you try that?

By the way, our usual convention for variable names would favor credit-names.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 24, 2024

It seems the issue was the definition of creditNames :: M.Map String String - it didn't seem to have a way to handle strings and gave this error:

src/Text/Pandoc/Writers/JATS.hs:182:17: error: [GHC-39999]
    • Could not deduce ‘Text.DocTemplates.Internal.ToContext
                          Text (M.Map String String)’
        arising from a use of ‘defField’
      from the context: PandocMonad m
        bound by the type signature for:
                   docToJATS :: forall (m :: * -> *).
                                PandocMonad m =>
                                WriterOptions -> Pandoc -> JATS m Text
        at src/Text/Pandoc/Writers/JATS.hs:124:1-68

Switching the signature to use Text instead of String in creditNames :: M.Map Text Text and finally got it to compile, but it seems like the Text values in the dictionary don't get displayed in the template and instead only evaluate to true. Is there either:

  1. A way to add handling for regular dictionaries
  2. Unpack the Pandoc Text to be a regular string inside the template?

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

I don't understand why you'd be getting this behavior. Nested Maps seem to work fine in ToContext: here I explore using cabal repl in jgm/doctemplates:

ghci> toContext $ fromList [("a"::Text, fromList [("b"::Text, fromList [("c"::Text,"d"::Text)])])] :: Context Text
Context {unContext = fromList [("a",MapVal (Context {unContext = fromList [("b",MapVal (Context {unContext = fromList [("c",SimpleVal (Text 1 "d"))]}))]}))]}
ghci> Right (tpl :: Template Text) <- compileTemplate "." "$a.b.c$" 
ghci> renderTemplate tpl context
Text 1 "d"

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

Btw, you can see what instances are defined for ToContext here:
https://hackage.haskell.org/package/doctemplates-0.11.0.1/docs/Text-DocTemplates.html

As you can see, it's only defined for Map with Text keys.

ToContext typeclass is relevant because of the type of defField:

defField   :: ToContext a b => Text -> b -> Context a -> Context a

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

Are you saying that $credit-names.conceptualization$ is coming out as true?

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 24, 2024

Are you saying that $credit-names.conceptualization$ is coming out as true?

Yes! This is what I'm experiencing

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

I'm a bit confused because originally you had a nested map and now it seems not to be nested. With a nested map, in which the value of $credit-names.conceptualization$ is itself a map, it would render as true in the template. But with the non-nested map it should render as a string value.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 24, 2024

I thought that the nested map was a bit over-complicated and unnecessary so I simplified it to a regular map. This means that the $credit-names[it.type]$ becomes true and the testing suite gives the error:

--- test/command/10152.md
+++ pandoc -s -t jats
+  25       vocab-term="true"
-  25       vocab-term="Software"
+  28     true
-  28     Software
+  33       vocab-term="true"
-  33       vocab-term="Methodology"
+  36     true
-  35     Methodology
+  45       vocab-term="true"
-  44       vocab-term="Software"
+  48     true
-  47     Software

ref: https://github.com/jgm/pandoc/actions/runs/11021333152/job/30608170051?pr=10153#step:9:1926

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

Ah, I see what you are trying to do. I don't think variable[field] is a valid syntax for doctemplates.

The way to do this is to resolve these names in the writer. Probably the simplest approach would be to use your Map to add a name field to the role in the Context, before you pass it to the template. (You could do this only if a name field isn't already present, which would give the user the ability to override the name, or just do it always.)

@estedeahora
Copy link

estedeahora commented Sep 24, 2024

Ah, I see what you are trying to do. I don't think variable[field] is a valid syntax for doctemplates.

The way to do this is to resolve these names in the writer. Probably the simplest approach would be to use your Map to add a name field to the role in the Context, before you pass it to the template. (You could do this only if a name field isn't already present, which would give the user the ability to override the name, or just do it always.)

I believe that it may be useful to re-iterate this comment regarding this issue.

@jgm
Copy link
Owner

jgm commented Sep 24, 2024

Yes, overriding is actually important for people who don't want it to be in English!

@cthoyt cthoyt marked this pull request as ready for review September 25, 2024 09:23
@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 25, 2024

Thanks @jgm and @estedeahora for the feedback. I've opted to make the input YAML format a bit more explicit, so you have to say exactly what the CRediT name is. You also have the option to override, to support the multi-language things.

I also improved the docs to give all of the different possibilities and added to the tests one for each. The tests (with narrative documentation) can be found in https://github.com/cthoyt/pandoc/blob/patch-1/test/command/10152.md.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 25, 2024

cc @sneakers-the-rat - your input is also greatly appreciated!

@estedeahora
Copy link

It seems to me that this project is a significant and good modification for Pandoc. However, I don't seem to modify Pandoc without modifying the code in JATS.hs to generate vocab-term-identifier attributes from vocab-term (or vice versa) automatically. This change should leave the possibility to generate content automatically or custom.

Personally I think that if this change is not possible, it is better not to generate the change. If someone needs to implement it, they can read these discussions with alternative solutions with Lua filters and template modifications.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 25, 2024

It seems to me that this project is a significant and good modification for Pandoc. However, I don't seem to modify Pandoc without modifying the code in JATS.hs to generate vocab-term-identifier attributes from vocab-term (or vice versa) automatically. This change should leave the possibility to generate content automatically or custom.

Personally I think that if this change is not possible, it is better not to generate the change. If someone needs to implement it, they can read these discussions with alternative solutions with Lua filters and template modifications.

Apologies @estedeahora, but I'm not quite sure what you're asking for.

@estedeahora
Copy link

I see that in the last commits you opted to generate the changes by incorporating everything manually in the yaml header and then these in the template (leaving aside the modification of JATS.hs). This is a “correct” solution. However, as you pointed out in #10152 it is an inefficient solution because writing three times variations of the same information can generate errors. I think it would be a contribution to generate this in Haskell, but I don't know how to do it yet. On the other hand, I don't think it is advisable to add this functionality to Pandoc if this point is not solved, because it generates too many fields making it not very concise.

As an alternative and provisional solution I think that users can use a lua filter and “inject” these fields through a dictionary. Then this can be placed in a custom template. I think our discussion provides elements for others to do this if they need to.

@jgm
Copy link
Owner

jgm commented Sep 25, 2024

It should be completely feasible to add the credit-name field automatically, from a standard map, if it is missing. @cthoyt if you can't see how to do this I may be able to help.

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 25, 2024

I am feeling like figuring out haskell and the specifics of this codebase might be too much of a lift for me at the moment. @jgm if you are willing to give a hand, then I would be very grateful.

I had squashed the commits earlier, but just added back the name lookup dictionary in JATS.hs

@jgm
Copy link
Owner

jgm commented Sep 25, 2024

Here are some tips. In Text.Pandoc.Writers.Shared you'll find

-- | Retrieve a field value from a template context.
getField   :: FromContext a b => Text -> Context a -> Maybe b

and

-- | Reset a field of a template context.  If the field already has a
-- value, the new value replaces it.
-- This is a utility function to be used in preparing template contexts.
resetField :: ToContext a b => Text -> b -> Context a -> Context a

So the basic template for adding these fields to the roles variable would be something like:

addCreditNames :: Context a -> Context a
addCreditNames context =
  case getField "roles" context of
    Nothing -> context
    Just roles -> resetField "roles" (map addCreditName roles) context

addCreditName :: M.Map Text Text -> M.Map Text Text
addCreditName rolemap =
  case M.lookup "credit-name" rolemap of
    Just _ -> rolemap
    Nothing -> maybe id (M.insert "credit-name")
      (M.lookup "credit-id" rolemap >>= flip M.lookup creditNames)

Then you'd put addCreditNames in the pipeline that currently has stuff like defField "blah" blah.

@jezcope
Copy link
Contributor

jezcope commented Sep 25, 2024

I'm happy to give this a look, since I'm 100% in favour of getting more useful metadata into the journal ecosystem and getting CRediT roles into more JATS seems like a good thing. Probably won't have time to get properly into it until the weekend though!

@cthoyt
Copy link
Contributor Author

cthoyt commented Sep 25, 2024

Then you'd put addCreditNames in the pipeline that currently has stuff like defField "blah" blah.

Do you mean this section?

let context = defField "body" main
$ defField "back" back
$ resetField "title" title'
$ resetField "date" date
$ defField "mathml" (case writerHTMLMathMethod opts of
MathML -> True
_ -> False) metadata

I tried sticking it in the middle like

let context = defField "body" main
              $ defField "back" back
              $ addCreditNames
              $ resetField "title" title'
              $ resetField "date" date
              $ defField "mathml" (case writerHTMLMathMethod opts of
                                        MathML -> True
                                        _      -> False) metadata

but this doesn't appear to work correctly - what's the right way to apply it?

@jgm
Copy link
Owner

jgm commented Sep 25, 2024

The thing you said doesn't appear to work should work.

Each line in this pipeline has type Context Text -> Context Text. What error did you get?

@cthoyt cthoyt requested a review from jgm January 27, 2025 15:49
@jezcope
Copy link
Contributor

jezcope commented Jan 27, 2025

Don't know if it's useful but I've now pushed up another commit with a small refactor and some doc comments

@cthoyt
Copy link
Contributor Author

cthoyt commented Jan 28, 2025

@jezcope it looks like the recent commit you pushed broke CI, can you check that please?

@jezcope
Copy link
Contributor

jezcope commented Jan 28, 2025

@cthoyt Oh yeah, I missed that warning, fixed now (and it makes the code clearer too, good compiler). I should have remembered to build with -Werror before committing. Funny that it only breaks the Windows build though.

@cthoyt
Copy link
Contributor Author

cthoyt commented Jan 29, 2025

@tarleb this might also be interesting for you to take a look at

changelog.md Outdated
Comment on lines 3 to 11
## UNRELEASED

* JATS writer:

+ Allow specification of author roles using the CRediT Taxonomy
and export this information in conformant JATS (#10153, Charles
Tapley Hoyt and Jez Cope).

## pandoc 3.6.2 (2025-01-12)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't modify the changelog.md until right before a release. Ideally all the information that
would go in the changelog should be in your commit message. I'd also suggest squashing
your commits into a single one.

@@ -109,6 +109,22 @@ $elseif(author.name)$
$else$
<string-name>$author$</string-name>
$endif$
$if(author.roles)$
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this outer if. If author.roles is not defined, $for(author.roles)$...$endfor$ just won't do anything (iirc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I addressed this!

Comment on lines 53 to 55
>
Software
</role>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: is it okay to have this whitespace around "Software"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is yes, since I ran the output through a JATS validator, but just to be safe, I got rid of this whitespace

vocab-identifier="https://credit.niso.org/"
vocab-term-identifier="https://credit.niso.org/contributor-roles/software/"
vocab-term="Software"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this blank line weren't produced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I had some difficulty figuring out how to make this work nicely with leading whitespace and newlines inside the if/else, so I reorganized the order to make sure we don't get ugly new lines

@jgm
Copy link
Owner

jgm commented Feb 2, 2025

It looks good to me now. @tarleb do you want to have a look? You know more about JATS than I.

@cthoyt
Copy link
Contributor Author

cthoyt commented Feb 2, 2025

@jgm I am working through getting the whitespace fixed, but overall the idea is done!

@cthoyt cthoyt force-pushed the patch-1 branch 3 times, most recently from 2e71bcb to 187d171 Compare February 2, 2025 23:27
@cthoyt
Copy link
Contributor Author

cthoyt commented Feb 2, 2025

Whitespace issues addressed, and commits have all been squashed.

Copy link
Collaborator

@tarleb tarleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I only wonder whether the role descriptions should get translation keys to make it easier to switch languages. JATS documents in languages other than English are probably rare enough, plus there exists a work-around, so I think this is fine for now.

It might also be good to report a warning if the credit role isn't known, but the current behavior seems ok, too.

@jezcope
Copy link
Contributor

jezcope commented Feb 4, 2025

Cool, thanks @tarleb. I've been wanting to contribute more to pandoc in any case, so I'm happy to take a look at those suggestions as further pull requests. :)

@jgm
Copy link
Owner

jgm commented Feb 4, 2025

One more request: can you amend your commit message so that it includes a brief description of the change (wrapped to 80 columns) and Closes #10152.?
This is needed when I create the changelog.

@cthoyt
Copy link
Contributor Author

cthoyt commented Feb 4, 2025

@jgm done in b0aef22 (turns out CI checks for 78 character width)

Enable annotating author roles using the Contribution Role Taxonomy
(CRediT) and export this information in conformant JATS

Closes jgm#10152.

Co-Authored-By: Jez Cope <457628+jezcope@users.noreply.github.com>
@jgm jgm merged commit 1eed55f into jgm:main Feb 6, 2025
10 of 13 checks passed
@jgm
Copy link
Owner

jgm commented Feb 6, 2025

Thanks!

@cthoyt cthoyt deleted the patch-1 branch February 6, 2025 07:07
@cthoyt
Copy link
Contributor Author

cthoyt commented Feb 6, 2025

I'm so excited for this to be merged! Thank you again @jgm and @jezcope for all of the excellent feedback and assistance and @estedeahora for the helpful discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for CRediT roles in JATS export
5 participants