Skip to content

Citation parsing in Markdown depending on specific content leads to AuthorInText instead of NormalCitation #10584

@cderv

Description

@cderv

We've got a report on Quarto side about some puzzling Citation handling issue when Spans is used before a citation with two keys. For reference this is the one:

Here is a minimal reproducible example I managed to come up with

---
title: "Bug demo"
references:
- author: Jane
  id: some-very-long-id-26-chars
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(string of 28 characters long)]{key=value} [@some-very-long-id-26-chars; @id2].

The citation is there will be parsed as AuthorInText, with brackets being kept as string (Str "[")

 pandoc -f markdown -t native --citeproc index.qmd
[ Para
    [ Span
        ( "" , [] , [ ( "key" , "value" ) ] )
        [ Str "(string"
        , Space
        , Str "of"
        , Space
        , Str "28"
        , Space
        , Str "characters"
        , Space
        , Str "long)"
        ]
    , Space
    , Str "["
    , Cite
        [ Citation
            { citationId = "some-very-long-id-26-chars"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "Jane" , Space , Str "(2019)" ]
    , Str ";"
    , Space
    , Cite
        [ Citation
            { citationId = "id2"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = AuthorInText
            , citationNoteNum = 2
            , citationHash = 0
            }
        ]
        [ Str "John" , Space , Str "(2020)" ]
    , Str "]."
    ]
(...truncated...)

In HTML this looks like

Image

This syntax is expected to be NormalCitation which you get with a more generic example

---
title: "Bug demo"
references:
- author: Jane
  id: id1
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(some content)]{key=value} [@id1; @id2].
 pandoc -f markdown -t native --citeproc index.qmd
[ Para
    [ Span
        ( "" , [] , [ ( "key" , "value" ) ] )
        [ Str "(some" , Space , Str "content)" ]
    , Space
    , Cite
        [ Citation
            { citationId = "id1"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 1
            , citationHash = 0
            }
        , Citation
            { citationId = "id2"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "(Jane"
        , Space
        , Str "2019;"
        , Space
        , Str "John"
        , Space
        , Str "2020)"
        ]
    , Str "."
    ]

What is puzzling is that

  • Removing 1 char in the id leads to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-25-char
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [(string of 28 characters long)]{key=value} [@some-very-long-id-25-char; @id2].
     pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    
  • Removing 1 char in the span content between (...) also lead to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-26-chars
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [(string of 27 character long)]{key=value} [@some-very-long-id-26-chars; @id2].
    ❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    
  • Adding space after [ at the start (i.e. [ (...)]) also lead to NormalCitation

    ---
    title: "Bug demo"
    references:
    - author: Jane
      id: some-very-long-id-26-chars
      issued: 2019-09
      title: Title 1
      type: article-journal
    - author: John
      id: id2
      issued: 2020-11
      title: Title 2
      type: article-journal
    ---
    
    [ (string of 28 characters long)]{key=value} [@some-very-long-id-26-chars; @id2].
    ❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
                , citationMode = NormalCitation
                , citationMode = NormalCitation
    

I don't have any more ideas to try pinpoint what triggers this and this seems like an issue, or could be something to protect against

Using @{..} syntax helps as it leads to NormalCitationtoo

---
title: "Bug demo"
references:
- author: Jane
  id: some-very-long-id-26-chars
  issued: 2019-09
  title: Title 1
  type: article-journal
- author: John
  id: id2
  issued: 2020-11
  title: Title 2
  type: article-journal
---

[(string of 28 characters long)]{key=value} [@{some-very-long-id-26-chars}; @id2].
❯ pandoc -f markdown -t native --citeproc index.qmd | grep "NormalCitation"
            , citationMode = NormalCitation
            , citationMode = NormalCitation
```

## Original Context

For context, original use case is something like this
````markdown
---
title: "Bug demo"
---

[(Figures 1A and S1A, Table S1)]{fg="#4D4D0C" bg="#F0F352"} [@soto_perez_crispr_cas_2019; @gregory_gut_2020; @nayfach_metagenomic_2021].

where Span is used to set attributes used in a Lua filter.

Information

This is on Windows with

❯ pandoc --version
pandoc.exe 3.6.2
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: C:\Users\chris\AppData\Roaming\pandoc
Copyright (C) 2006-2024 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions