Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from yaml to csljson converted references show different output of variable note when colon is present #93

Open
maybegeek opened this issue Oct 13, 2021 · 20 comments

Comments

@maybegeek
Copy link

Hi there,

given the references in yaml (bjork.yaml):

---
references:
- id: theid
  author:
    - literal: Björk
  issued:
    - year: 2019
  note: >-
    Bla: Blupp ... Foo: Bar
  title: The Title
  type: motion_picture
...

and converting this to csl json with:

pandoc -s bjork.yaml -f markdown -t csljson -o bjork.json

I get bjork.json with:

[
  {
    "author": [
      {
        "literal": "Björk"
      }
    ],
    "id": "theid",
    "issued": {
      "date-parts": [
        [
          2019
        ]
      ]
    },
    "note": "Bla: Blupp … Foo: Bar",
    "title": "The Title",
    "type": "motion_picture"
  }
]

my mwe csl style is (test.csl):

<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" class="note" version="1.0" demote-non-dropping-particle="sort-only" default-locale="de-DE">

<info>
  <title>test</title>
  <title-short>test</title-short>
  <id></id>
  <author>
    <name>Hans Dampf</name>
  </author>
  <category citation-format="note"/>
  <category field="humanities"/>
  <summary>test</summary>
  <updated>2021-10-13T11:00:00+02:00</updated>
</info>

<citation>
  <sort/>
  <layout/>
</citation>

<bibliography>
  <sort></sort>
  <layout>
    <group suffix="." delimiter=" /|\ ">
      <text variable="title" font-style="italic"/>
      <date variable="issued" prefix=" (" suffix=")">
        <date-part name="year" form="long"/>
      </date>
      <names variable="author">
        <name/>
      </names>
      <text variable="note"/>
    </group>
  </layout>
</bibliography>
</style>

Where I want to show a difference in the output of the note variable, in case there is a colon inside the note value for the json reference. The yaml reference is handled as expected:

my test.md file:

---
lang: de-DE
csl: test.csl
nocite: |
  @*
...

Test YAML and JSON with CSL

as command for pandoc I use:

pandoc test.md --citeproc --output=ref-json.htm -s --metadata title="test" --bibliography=bjork.json

and

pandoc test.md --citeproc --output=ref-yaml.htm -s --metadata title="test" --bibliography=bjork.yaml

For YAML I get:

The Title /|\ (2019) /|\ Björk /|\ Bla: Blupp … Foo: Bar.

The JSON ref brings:

The Title /|\ (2019) /|\ Björk.

If no colon is present, the note value gets output.

The YAML-file brings the expected output, the csljson file not.

To make a long post longer: I use Zotero (BBT) and the extra fields with some cheater syntax. The handling of the colon seperator and key: value handling should already be finished if yaml or json reference files exist.

thanks for looking into this,
best regards

@jgm
Copy link
Owner

jgm commented Oct 13, 2021

Here's a short demonstration of the issue:

% pandoc -s -f csljson -t csljson
[
{ "id": "a",
  "note": "a: b c" }
]
^D
[
  {
    "a": "b c",
    "id": "a",
    "type": ""
  }
]

@jgm
Copy link
Owner

jgm commented Oct 13, 2021

Pure citeproc repro:

λ>  decode "[{\"id\":\"a\",\"note\":\"a: b c\"}]" :: Maybe [Reference (CslJson Text)]
Just [Reference {referenceId = ItemId {unItemId = "a"}, referenceType = "", referenceDisambiguation = Nothing, referenceVariables = fromList [(Variable "a",FancyVal (CslConcat (CslText "b") (CslConcat (CslText " ") (CslText "c"))))]}]

@jgm
Copy link
Owner

jgm commented Oct 13, 2021

OK, I see that this is due to the following code in Citeproc.Types (l. 872):

    | k == "note" = do
        t' <- parseJSON v
        let (kvs, rest) = parseNote t'
         in (if T.null rest
                then id
                else \(Reference i' t'' d' m') ->
                       Reference i' t'' d' (M.insert "note" (TextVal rest) m'))
             <$> foldM go (Reference i t d m) (consolidateNameVariables kvs)

where

parseNote :: Text
          -> ([(Variable, Text)], Text)
parseNote t =
  either (const ([],t)) id $
    P.parseOnly ((,) <$> P.many' pNoteField <*> P.takeText) t
 where
  pNoteField = pBracedField <|> pLineField
  pLineField = do
    name <- pVarname
    _ <- P.char ':'
    val <- P.takeWhile (/='\n')
    () <$ P.char '\n' <|> P.endOfInput
    return (Variable $ CI.mk name, T.strip val)
  pBracedField = do
    _ <- P.string "{:"
    name <- pVarname
    _ <- P.char ':'
    val <- P.takeWhile (/='}')
    _ <- P.char '}'
    return (Variable $ CI.mk name, T.strip val)
  pVarname = P.takeWhile1 (\c -> isLetter c || c == '-')

So, it's intentional. For background, see jgm/pandoc-citeproc#192

@jgm
Copy link
Owner

jgm commented Oct 13, 2021

ALso https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html#cheater-syntax-for-odd-fields
If we haven't implemented this correctly, we can revisit.

@maybegeek
Copy link
Author

maybegeek commented Oct 13, 2021

thanks @jgm , definitely shorter : )

but is it a bug?

citation-style-language/schema#277

If we will have "old" yaml oder csljson files in the future, perhaps there will need to be a way of splitting embedded key:values in note to future custom fields?

What I do not understand, is the difference in handling for yaml and csljson, at the moment.

@jgm
Copy link
Owner

jgm commented Oct 13, 2021

Maybe @denismaier or @bwiernik or @bdarcus can comment.
From the linked issue, it sounds as if CSL has added support for something like

"custom": {"a": "one", "b": "two"}

but I'm not sure what version it's in, and I'm not sure whether this change is supposed to go along with no longer parsing fields in the "note" field...

@bdarcus
Copy link

bdarcus commented Oct 13, 2021

I'm not sure what version it's in ...

It's in the 1.1 branch.

and I'm not sure whether this change is supposed to go along with no longer parsing fields in the "note" field...

It's definitely intended as a better solution to the same requirement. But I don't think parsing the note field was anything "official"?

@bwiernik
Copy link

Yeah, parsing from note was always a citeproc-js hack that we have wanted to phase out.

@jgm BetterBibTeX moves CSL fields out of note when it generates CSL JSON or YAML, so Zotero users have an option to get cleaned note fields when outputting to pandoc. I am not sure if RStudio does similar cleaning, but I think they might or at least would likely be responsive to adding that. Those would probably be the major ways that pandoc would encounter CSL variables in note. So, it might be possible to retire parsing note from pandoc or at least move it to an optional flag.

@denismaier
Copy link

denismaier commented Oct 14, 2021

I am not sure if RStudio does similar cleaning, but I think they might or at least would likely be responsive to adding that. Those would probably be the major ways that pandoc would encounter CSL variables in note.

I'm not sure, but won't RStudio users create their csl json or yaml files using some external tool, e.g. Zotero? If so, then that wouldn't be much of an RStudio issue, right?

@maybegeek
Copy link
Author

Hi there,

if I'm allowed to sum up:

A

pandocing with a *.yaml or *.json file of references should output the result.

for

note: >-
  Bla: Blupp ... Foo: Bar

and

"note": "Bla: Blupp … Foo: Bar",

B

An optional flag for converting and parsing

"note": "Bla: Blupp … Foo: Bar",

to custom key:values could be a way of handling old exports of BBT json/yaml to whichever new structure there will be.

?

@bwiernik
Copy link

@jgm RStudio's visual editor has native integration from Zotero and can create a bib, json, or YAML file as users add citations from Zotero or DOIs

@jgm
Copy link
Owner

jgm commented Oct 15, 2021

Not sure what is best here. The cleanest option would be to add support for custom disable note parsing. This would have the disadvantage that some people's existing workflows may break, and in ways that aren't obvious to them (a pretty big disadvantage).

We could think about adding an option to disable note parsing (maybe checking a metadata field). But people would have to know about this to use it. It's only going to affect people who want to use colons in note fields, and such people aren't going to know about it, in most cases.

@jgm
Copy link
Owner

jgm commented Oct 15, 2021

Wish I knew how common it was for pandoc users to use this note-parsing trick, and how common it is to want to use a note field for other purposes.

@maybegeek
Copy link
Author

people using reference managers would have their tooling and thereby parsing already with Zotero oder Zotero and BBT, by going with cheater syntax for getting their key:values stored there.

If on the other hand someone would write the bibliographic data by hand in yaml or csl json, they would have their data as is written. Writing Original date: in Zoteros extra field is not by choice, writing in yaml/json in note would be by choice, where one could write the actual csl-usable key:value.

Parsing in the reference manager / Zotero export one time would be enough. If one would want to parse in pandoc again, we could make that optional ... -t csljson+custom -o happy-new.json or ... -t csljson+cheater -o happy-new.json perhaps?

On the matter of different output from yaml/json bibliography files with : and the same CLS, well, I was surprised as the same files (yet in different structure) resulted in different handling of the note.

tough choice : )

@bdarcus
Copy link

bdarcus commented Oct 15, 2021 via email

@bwiernik
Copy link

I don't think there's a problem adding it to 1.0

@bdarcus
Copy link

bdarcus commented Nov 12, 2021

I don't think there's a problem adding it to 1.0

Let's do it then?

@jgm
Copy link
Owner

jgm commented Jul 28, 2022

Following up on this: was "custom" ever added to 1.0?

@bdarcus
Copy link

bdarcus commented Jul 28, 2022

No; I just created a linked issue to make sure we don't cause any problems if we do.

I can't imagine we do, but just in case.

@bdarcus
Copy link

bdarcus commented Aug 6, 2022

Turns out it's been there for awhile! Not sure how I missed that.

citation-style-language/schema@fde9bd6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants