Convert a key:value file w/ comments into JSON document with UNIX toolsUnix shell script find out which directory the script file resides?Parsing JSON with Unix toolsHow can I shuffle the lines of a text file on the Unix command line or in a shell script?Best way to parse this particular string using awk / sed?Bash tool to get nth line from a fileHow can I return the entire contents of a split line based on a search?Remove certain lines in string? ShellHow do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?Why grep fails within the script and not in the command lineSearch Nth field in each line for string, then append a value to end of each line

Twin primes whose sum is a cube

How do conventional missiles fly?

Why doesn't H₄O²⁺ exist?

How can saying a song's name be a copyright violation?

I'm flying to France today and my passport expires in less than 2 months

Why are electrically insulating heatsinks so rare? Is it just cost?

How can I tell someone that I want to be his or her friend?

Anagram holiday

How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?

Can I ask the recruiters in my resume to put the reason why I am rejected?

How to prevent "they're falling in love" trope

Doing something right before you need it - expression for this?

Why is the 'in' operator throwing an error with a string literal instead of logging false?

Combinations of multiple lists

What to put in ESTA if staying in US for a few days before going on to Canada

Theorems that impeded progress

1960's book about a plague that kills all white people

Forgetting the musical notes while performing in concert

What reasons are there for a Capitalist to oppose a 100% inheritance tax?

Why is consensus so controversial in Britain?

How much of data wrangling is a data scientist's job?

Emailing HOD to enhance faculty application

How could indestructible materials be used in power generation?

How to model explosives?



Convert a key:value file w/ comments into JSON document with UNIX tools


Unix shell script find out which directory the script file resides?Parsing JSON with Unix toolsHow can I shuffle the lines of a text file on the Unix command line or in a shell script?Best way to parse this particular string using awk / sed?Bash tool to get nth line from a fileHow can I return the entire contents of a split line based on a search?Remove certain lines in string? ShellHow do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?Why grep fails within the script and not in the command lineSearch Nth field in each line for string, then append a value to end of each line






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I have a file in a subset of YAML with data such as the below:



# This is a comment
# This is another comment


spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'




I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):




"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"



The closest I got was this:



cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'


which apart from looking rather gnarly doesn't give the correct answer, it returns:



"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'


which, if I pipe to jq causes a parse error.



I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?










share|improve this question
























  • forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

    – jamiet
    Mar 8 at 18:43











  • Build a new docker image that includes the tools you need to work with YAML and JSON.

    – chepner
    Mar 8 at 18:43











  • I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

    – jamiet
    Mar 8 at 18:44












  • there are integers too. I have edited the sample above to reflect that

    – jamiet
    Mar 8 at 18:50











  • If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

    – match
    Mar 8 at 18:51

















1















I have a file in a subset of YAML with data such as the below:



# This is a comment
# This is another comment


spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'




I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):




"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"



The closest I got was this:



cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'


which apart from looking rather gnarly doesn't give the correct answer, it returns:



"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'


which, if I pipe to jq causes a parse error.



I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?










share|improve this question
























  • forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

    – jamiet
    Mar 8 at 18:43











  • Build a new docker image that includes the tools you need to work with YAML and JSON.

    – chepner
    Mar 8 at 18:43











  • I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

    – jamiet
    Mar 8 at 18:44












  • there are integers too. I have edited the sample above to reflect that

    – jamiet
    Mar 8 at 18:50











  • If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

    – match
    Mar 8 at 18:51













1












1








1








I have a file in a subset of YAML with data such as the below:



# This is a comment
# This is another comment


spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'




I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):




"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"



The closest I got was this:



cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'


which apart from looking rather gnarly doesn't give the correct answer, it returns:



"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'


which, if I pipe to jq causes a parse error.



I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?










share|improve this question
















I have a file in a subset of YAML with data such as the below:



# This is a comment
# This is another comment


spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'




I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):




"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"



The closest I got was this:



cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'


which apart from looking rather gnarly doesn't give the correct answer, it returns:



"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'


which, if I pipe to jq causes a parse error.



I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?







bash shell awk jq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 at 23:47









John Kugelman

248k54406460




248k54406460










asked Mar 8 at 18:34









jamietjamiet

2,52622553




2,52622553












  • forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

    – jamiet
    Mar 8 at 18:43











  • Build a new docker image that includes the tools you need to work with YAML and JSON.

    – chepner
    Mar 8 at 18:43











  • I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

    – jamiet
    Mar 8 at 18:44












  • there are integers too. I have edited the sample above to reflect that

    – jamiet
    Mar 8 at 18:50











  • If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

    – match
    Mar 8 at 18:51

















  • forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

    – jamiet
    Mar 8 at 18:43











  • Build a new docker image that includes the tools you need to work with YAML and JSON.

    – chepner
    Mar 8 at 18:43











  • I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

    – jamiet
    Mar 8 at 18:44












  • there are integers too. I have edited the sample above to reflect that

    – jamiet
    Mar 8 at 18:50











  • If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

    – match
    Mar 8 at 18:51
















forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

– jamiet
Mar 8 at 18:43





forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.

– jamiet
Mar 8 at 18:43













Build a new docker image that includes the tools you need to work with YAML and JSON.

– chepner
Mar 8 at 18:43





Build a new docker image that includes the tools you need to work with YAML and JSON.

– chepner
Mar 8 at 18:43













I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

– jamiet
Mar 8 at 18:44






I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have jq available

– jamiet
Mar 8 at 18:44














there are integers too. I have edited the sample above to reflect that

– jamiet
Mar 8 at 18:50





there are integers too. I have edited the sample above to reflect that

– jamiet
Mar 8 at 18:50













If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

– match
Mar 8 at 18:51





If you're used to jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest

– match
Mar 8 at 18:51












3 Answers
3






active

oldest

votes


















3














Implemented in pure jq (tested with version 1.6):



#!/usr/bin/env bash

jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;

# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)

# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json


Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.






share|improve this answer

























  • oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

    – jamiet
    Mar 8 at 19:14











  • (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

    – jamiet
    Mar 8 at 19:18






  • 1





    Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

    – Charles Duffy
    Mar 8 at 19:27


















1














Here's a no-frills but simple solution:



def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];

reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )


Invocation



jq -n -R -f tojson.jq input.txt





share|improve this answer






























    0














    You can do it all in awk using gsub and sprintf, for example:



    (edit to add "," separating json records)



    awk 'BEGIN ol=0; print "" 
    /^[^#]/
    if (ol) print ","
    gsub ("47", "42")
    $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
    printf "%s %s", $1, $2
    ol++

    END print "n" ' file.yaml


    (note: though jq is the proper tool for json formatting)



    Explanation




    • awk 'BEGIN { ol=0; print "" call awk setting the output line variable ol=0 for "," output control and printing the header "",


    • /^[^#]/ only match non-comment lines,


    • if (ol) print "," if the output line ol is greater than zero, output a trailing ","


    • gsub ("47", "42") replace all single-quotes with double-quotes,


    • $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.


    • print $1, $2 output the reformatted fields,


    • ol++ increment the output line count, and


    • END print "" ' close by printing the "" footer

    Example Use/Output



    Just select/paste the awk command above (changing the filename as needed)



    $ awk 'BEGIN ol=0; print "" 
    > /^[^#]/
    > if (ol) print ","
    > gsub ("47", "42")
    > $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
    > printf "%s %s", $1, $2
    > ol++
    >
    > END print "n" ' file.yaml

    "spark:spark.ui.enabled": "false",
    "spark:spark.sql.adaptive.enabled": "true"






    share|improve this answer

























    • thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

      – jamiet
      Mar 8 at 19:10






    • 1





      Oh -- let me work on that -- I totally forgot :)

      – David C. Rankin
      Mar 8 at 19:12











    • @jamiet - updated

      – David C. Rankin
      Mar 8 at 19:28











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55069082%2fconvert-a-keyvalue-file-w-comments-into-json-document-with-unix-tools%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    Implemented in pure jq (tested with version 1.6):



    #!/usr/bin/env bash

    jq_script=$(cat <<'EOF'
    def content_for_line:
    "^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
    "^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
    "^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
    if test($ignore_re) then else # empty lines add nothing to the data
    if test($content_re) then ( # non-empty: match against $content_re
    capture($content_re) as $content | # ...and put the groups into $content
    $content.key as $key | # string before ": " becomes $key
    (if ($content.value | test($quoted_re)) then # if value contains literal quotes...
    ($content.value | capture($quoted_re)).value # ...take string from inside quotes
    else
    $content.value # no quotes to strip
    end) as $value | # result of the above block becomes $value
    "($key)": "($value)" # and return a map from one key to one value
    ) else
    # we get here if a line didn't match $ignore_re *or* $content_re
    error("Line (.) is not recognized as a comment, empty, or valid content")
    end
    end;

    # iterate over our input lines, passing each one to content_for_line and merging the result
    # into the object we're building, which we eventually return as our result.
    reduce inputs as $item (; . + ($item | content_for_line))
    EOF
    )

    # jq -R: read input as raw strings
    # jq -n: don't read from stdin until requested with "input" or "inputs"
    jq -Rn "$jq_script" <file.yaml >file.json


    Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.






    share|improve this answer

























    • oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

      – jamiet
      Mar 8 at 19:14











    • (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

      – jamiet
      Mar 8 at 19:18






    • 1





      Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

      – Charles Duffy
      Mar 8 at 19:27















    3














    Implemented in pure jq (tested with version 1.6):



    #!/usr/bin/env bash

    jq_script=$(cat <<'EOF'
    def content_for_line:
    "^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
    "^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
    "^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
    if test($ignore_re) then else # empty lines add nothing to the data
    if test($content_re) then ( # non-empty: match against $content_re
    capture($content_re) as $content | # ...and put the groups into $content
    $content.key as $key | # string before ": " becomes $key
    (if ($content.value | test($quoted_re)) then # if value contains literal quotes...
    ($content.value | capture($quoted_re)).value # ...take string from inside quotes
    else
    $content.value # no quotes to strip
    end) as $value | # result of the above block becomes $value
    "($key)": "($value)" # and return a map from one key to one value
    ) else
    # we get here if a line didn't match $ignore_re *or* $content_re
    error("Line (.) is not recognized as a comment, empty, or valid content")
    end
    end;

    # iterate over our input lines, passing each one to content_for_line and merging the result
    # into the object we're building, which we eventually return as our result.
    reduce inputs as $item (; . + ($item | content_for_line))
    EOF
    )

    # jq -R: read input as raw strings
    # jq -n: don't read from stdin until requested with "input" or "inputs"
    jq -Rn "$jq_script" <file.yaml >file.json


    Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.






    share|improve this answer

























    • oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

      – jamiet
      Mar 8 at 19:14











    • (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

      – jamiet
      Mar 8 at 19:18






    • 1





      Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

      – Charles Duffy
      Mar 8 at 19:27













    3












    3








    3







    Implemented in pure jq (tested with version 1.6):



    #!/usr/bin/env bash

    jq_script=$(cat <<'EOF'
    def content_for_line:
    "^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
    "^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
    "^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
    if test($ignore_re) then else # empty lines add nothing to the data
    if test($content_re) then ( # non-empty: match against $content_re
    capture($content_re) as $content | # ...and put the groups into $content
    $content.key as $key | # string before ": " becomes $key
    (if ($content.value | test($quoted_re)) then # if value contains literal quotes...
    ($content.value | capture($quoted_re)).value # ...take string from inside quotes
    else
    $content.value # no quotes to strip
    end) as $value | # result of the above block becomes $value
    "($key)": "($value)" # and return a map from one key to one value
    ) else
    # we get here if a line didn't match $ignore_re *or* $content_re
    error("Line (.) is not recognized as a comment, empty, or valid content")
    end
    end;

    # iterate over our input lines, passing each one to content_for_line and merging the result
    # into the object we're building, which we eventually return as our result.
    reduce inputs as $item (; . + ($item | content_for_line))
    EOF
    )

    # jq -R: read input as raw strings
    # jq -n: don't read from stdin until requested with "input" or "inputs"
    jq -Rn "$jq_script" <file.yaml >file.json


    Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.






    share|improve this answer















    Implemented in pure jq (tested with version 1.6):



    #!/usr/bin/env bash

    jq_script=$(cat <<'EOF'
    def content_for_line:
    "^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
    "^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
    "^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
    if test($ignore_re) then else # empty lines add nothing to the data
    if test($content_re) then ( # non-empty: match against $content_re
    capture($content_re) as $content | # ...and put the groups into $content
    $content.key as $key | # string before ": " becomes $key
    (if ($content.value | test($quoted_re)) then # if value contains literal quotes...
    ($content.value | capture($quoted_re)).value # ...take string from inside quotes
    else
    $content.value # no quotes to strip
    end) as $value | # result of the above block becomes $value
    "($key)": "($value)" # and return a map from one key to one value
    ) else
    # we get here if a line didn't match $ignore_re *or* $content_re
    error("Line (.) is not recognized as a comment, empty, or valid content")
    end
    end;

    # iterate over our input lines, passing each one to content_for_line and merging the result
    # into the object we're building, which we eventually return as our result.
    reduce inputs as $item (; . + ($item | content_for_line))
    EOF
    )

    # jq -R: read input as raw strings
    # jq -n: don't read from stdin until requested with "input" or "inputs"
    jq -Rn "$jq_script" <file.yaml >file.json


    Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 10 at 3:30

























    answered Mar 8 at 19:08









    Charles DuffyCharles Duffy

    181k28206261




    181k28206261












    • oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

      – jamiet
      Mar 8 at 19:14











    • (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

      – jamiet
      Mar 8 at 19:18






    • 1





      Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

      – Charles Duffy
      Mar 8 at 19:27

















    • oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

      – jamiet
      Mar 8 at 19:14











    • (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

      – jamiet
      Mar 8 at 19:18






    • 1





      Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

      – Charles Duffy
      Mar 8 at 19:27
















    oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

    – jamiet
    Mar 8 at 19:14





    oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again

    – jamiet
    Mar 8 at 19:14













    (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

    – jamiet
    Mar 8 at 19:18





    (Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)

    – jamiet
    Mar 8 at 19:18




    1




    1





    Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

    – Charles Duffy
    Mar 8 at 19:27





    Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.

    – Charles Duffy
    Mar 8 at 19:27













    1














    Here's a no-frills but simple solution:



    def tidy: sub("^ *'?";"") | sub(" *'?$";"");
    def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];

    reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
    .[$row[0]] = $row[1] )


    Invocation



    jq -n -R -f tojson.jq input.txt





    share|improve this answer



























      1














      Here's a no-frills but simple solution:



      def tidy: sub("^ *'?";"") | sub(" *'?$";"");
      def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];

      reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
      .[$row[0]] = $row[1] )


      Invocation



      jq -n -R -f tojson.jq input.txt





      share|improve this answer

























        1












        1








        1







        Here's a no-frills but simple solution:



        def tidy: sub("^ *'?";"") | sub(" *'?$";"");
        def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];

        reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
        .[$row[0]] = $row[1] )


        Invocation



        jq -n -R -f tojson.jq input.txt





        share|improve this answer













        Here's a no-frills but simple solution:



        def tidy: sub("^ *'?";"") | sub(" *'?$";"");
        def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];

        reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
        .[$row[0]] = $row[1] )


        Invocation



        jq -n -R -f tojson.jq input.txt






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 9 at 0:34









        peakpeak

        34.5k94461




        34.5k94461





















            0














            You can do it all in awk using gsub and sprintf, for example:



            (edit to add "," separating json records)



            awk 'BEGIN ol=0; print "" 
            /^[^#]/
            if (ol) print ","
            gsub ("47", "42")
            $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            printf "%s %s", $1, $2
            ol++

            END print "n" ' file.yaml


            (note: though jq is the proper tool for json formatting)



            Explanation




            • awk 'BEGIN { ol=0; print "" call awk setting the output line variable ol=0 for "," output control and printing the header "",


            • /^[^#]/ only match non-comment lines,


            • if (ol) print "," if the output line ol is greater than zero, output a trailing ","


            • gsub ("47", "42") replace all single-quotes with double-quotes,


            • $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.


            • print $1, $2 output the reformatted fields,


            • ol++ increment the output line count, and


            • END print "" ' close by printing the "" footer

            Example Use/Output



            Just select/paste the awk command above (changing the filename as needed)



            $ awk 'BEGIN ol=0; print "" 
            > /^[^#]/
            > if (ol) print ","
            > gsub ("47", "42")
            > $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            > printf "%s %s", $1, $2
            > ol++
            >
            > END print "n" ' file.yaml

            "spark:spark.ui.enabled": "false",
            "spark:spark.sql.adaptive.enabled": "true"






            share|improve this answer

























            • thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

              – jamiet
              Mar 8 at 19:10






            • 1





              Oh -- let me work on that -- I totally forgot :)

              – David C. Rankin
              Mar 8 at 19:12











            • @jamiet - updated

              – David C. Rankin
              Mar 8 at 19:28















            0














            You can do it all in awk using gsub and sprintf, for example:



            (edit to add "," separating json records)



            awk 'BEGIN ol=0; print "" 
            /^[^#]/
            if (ol) print ","
            gsub ("47", "42")
            $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            printf "%s %s", $1, $2
            ol++

            END print "n" ' file.yaml


            (note: though jq is the proper tool for json formatting)



            Explanation




            • awk 'BEGIN { ol=0; print "" call awk setting the output line variable ol=0 for "," output control and printing the header "",


            • /^[^#]/ only match non-comment lines,


            • if (ol) print "," if the output line ol is greater than zero, output a trailing ","


            • gsub ("47", "42") replace all single-quotes with double-quotes,


            • $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.


            • print $1, $2 output the reformatted fields,


            • ol++ increment the output line count, and


            • END print "" ' close by printing the "" footer

            Example Use/Output



            Just select/paste the awk command above (changing the filename as needed)



            $ awk 'BEGIN ol=0; print "" 
            > /^[^#]/
            > if (ol) print ","
            > gsub ("47", "42")
            > $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            > printf "%s %s", $1, $2
            > ol++
            >
            > END print "n" ' file.yaml

            "spark:spark.ui.enabled": "false",
            "spark:spark.sql.adaptive.enabled": "true"






            share|improve this answer

























            • thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

              – jamiet
              Mar 8 at 19:10






            • 1





              Oh -- let me work on that -- I totally forgot :)

              – David C. Rankin
              Mar 8 at 19:12











            • @jamiet - updated

              – David C. Rankin
              Mar 8 at 19:28













            0












            0








            0







            You can do it all in awk using gsub and sprintf, for example:



            (edit to add "," separating json records)



            awk 'BEGIN ol=0; print "" 
            /^[^#]/
            if (ol) print ","
            gsub ("47", "42")
            $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            printf "%s %s", $1, $2
            ol++

            END print "n" ' file.yaml


            (note: though jq is the proper tool for json formatting)



            Explanation




            • awk 'BEGIN { ol=0; print "" call awk setting the output line variable ol=0 for "," output control and printing the header "",


            • /^[^#]/ only match non-comment lines,


            • if (ol) print "," if the output line ol is greater than zero, output a trailing ","


            • gsub ("47", "42") replace all single-quotes with double-quotes,


            • $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.


            • print $1, $2 output the reformatted fields,


            • ol++ increment the output line count, and


            • END print "" ' close by printing the "" footer

            Example Use/Output



            Just select/paste the awk command above (changing the filename as needed)



            $ awk 'BEGIN ol=0; print "" 
            > /^[^#]/
            > if (ol) print ","
            > gsub ("47", "42")
            > $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            > printf "%s %s", $1, $2
            > ol++
            >
            > END print "n" ' file.yaml

            "spark:spark.ui.enabled": "false",
            "spark:spark.sql.adaptive.enabled": "true"






            share|improve this answer















            You can do it all in awk using gsub and sprintf, for example:



            (edit to add "," separating json records)



            awk 'BEGIN ol=0; print "" 
            /^[^#]/
            if (ol) print ","
            gsub ("47", "42")
            $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            printf "%s %s", $1, $2
            ol++

            END print "n" ' file.yaml


            (note: though jq is the proper tool for json formatting)



            Explanation




            • awk 'BEGIN { ol=0; print "" call awk setting the output line variable ol=0 for "," output control and printing the header "",


            • /^[^#]/ only match non-comment lines,


            • if (ol) print "," if the output line ol is greater than zero, output a trailing ","


            • gsub ("47", "42") replace all single-quotes with double-quotes,


            • $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.


            • print $1, $2 output the reformatted fields,


            • ol++ increment the output line count, and


            • END print "" ' close by printing the "" footer

            Example Use/Output



            Just select/paste the awk command above (changing the filename as needed)



            $ awk 'BEGIN ol=0; print "" 
            > /^[^#]/
            > if (ol) print ","
            > gsub ("47", "42")
            > $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
            > printf "%s %s", $1, $2
            > ol++
            >
            > END print "n" ' file.yaml

            "spark:spark.ui.enabled": "false",
            "spark:spark.sql.adaptive.enabled": "true"







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 8 at 19:28

























            answered Mar 8 at 19:08









            David C. RankinDavid C. Rankin

            43.4k33151




            43.4k33151












            • thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

              – jamiet
              Mar 8 at 19:10






            • 1





              Oh -- let me work on that -- I totally forgot :)

              – David C. Rankin
              Mar 8 at 19:12











            • @jamiet - updated

              – David C. Rankin
              Mar 8 at 19:28

















            • thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

              – jamiet
              Mar 8 at 19:10






            • 1





              Oh -- let me work on that -- I totally forgot :)

              – David C. Rankin
              Mar 8 at 19:12











            • @jamiet - updated

              – David C. Rankin
              Mar 8 at 19:28
















            thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

            – jamiet
            Mar 8 at 19:10





            thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON

            – jamiet
            Mar 8 at 19:10




            1




            1





            Oh -- let me work on that -- I totally forgot :)

            – David C. Rankin
            Mar 8 at 19:12





            Oh -- let me work on that -- I totally forgot :)

            – David C. Rankin
            Mar 8 at 19:12













            @jamiet - updated

            – David C. Rankin
            Mar 8 at 19:28





            @jamiet - updated

            – David C. Rankin
            Mar 8 at 19:28

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55069082%2fconvert-a-keyvalue-file-w-comments-into-json-document-with-unix-tools%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

            Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

            List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229