Convert a key:value file w/ comments into JSON document with UNIX toolsUnix shell script find out which directory the script file resides?Parsing JSON with Unix toolsHow can I shuffle the lines of a text file on the Unix command line or in a shell script?Best way to parse this particular string using awk / sed?Bash tool to get nth line from a fileHow can I return the entire contents of a split line based on a search?Remove certain lines in string? ShellHow do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?Why grep fails within the script and not in the command lineSearch Nth field in each line for string, then append a value to end of each line
Twin primes whose sum is a cube
How do conventional missiles fly?
Why doesn't H₄O²⁺ exist?
How can saying a song's name be a copyright violation?
I'm flying to France today and my passport expires in less than 2 months
Why are electrically insulating heatsinks so rare? Is it just cost?
How can I tell someone that I want to be his or her friend?
Anagram holiday
How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?
Can I ask the recruiters in my resume to put the reason why I am rejected?
How to prevent "they're falling in love" trope
Doing something right before you need it - expression for this?
Why is the 'in' operator throwing an error with a string literal instead of logging false?
Combinations of multiple lists
What to put in ESTA if staying in US for a few days before going on to Canada
Theorems that impeded progress
1960's book about a plague that kills all white people
Forgetting the musical notes while performing in concert
What reasons are there for a Capitalist to oppose a 100% inheritance tax?
Why is consensus so controversial in Britain?
How much of data wrangling is a data scientist's job?
Emailing HOD to enhance faculty application
How could indestructible materials be used in power generation?
How to model explosives?
Convert a key:value file w/ comments into JSON document with UNIX tools
Unix shell script find out which directory the script file resides?Parsing JSON with Unix toolsHow can I shuffle the lines of a text file on the Unix command line or in a shell script?Best way to parse this particular string using awk / sed?Bash tool to get nth line from a fileHow can I return the entire contents of a split line based on a search?Remove certain lines in string? ShellHow do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?Why grep fails within the script and not in the command lineSearch Nth field in each line for string, then append a value to end of each line
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'
which apart from looking rather gnarly doesn't give the correct answer, it returns:
"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?
bash shell awk jq
|
show 2 more comments
I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'
which apart from looking rather gnarly doesn't give the correct answer, it returns:
"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?
bash shell awk jq
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do havejqavailable
– jamiet
Mar 8 at 18:44
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
If you're used tojqthen there's also a wrapper for it calledyqthat handles YAML: yq.readthedocs.io/en/latest
– match
Mar 8 at 18:51
|
show 2 more comments
I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'
which apart from looking rather gnarly doesn't give the correct answer, it returns:
"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?
bash shell awk jq
I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo grep -o '^[^#]*'
which apart from looking rather gnarly doesn't give the correct answer, it returns:
"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?
bash shell awk jq
bash shell awk jq
edited Mar 8 at 23:47
John Kugelman
248k54406460
248k54406460
asked Mar 8 at 18:34
jamietjamiet
2,52622553
2,52622553
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do havejqavailable
– jamiet
Mar 8 at 18:44
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
If you're used tojqthen there's also a wrapper for it calledyqthat handles YAML: yq.readthedocs.io/en/latest
– match
Mar 8 at 18:51
|
show 2 more comments
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do havejqavailable
– jamiet
Mar 8 at 18:44
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
If you're used tojqthen there's also a wrapper for it calledyqthat handles YAML: yq.readthedocs.io/en/latest
– match
Mar 8 at 18:51
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have
jq available– jamiet
Mar 8 at 18:44
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have
jq available– jamiet
Mar 8 at 18:44
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
If you're used to
jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest– match
Mar 8 at 18:51
If you're used to
jq then there's also a wrapper for it called yq that handles YAML: yq.readthedocs.io/en/latest– match
Mar 8 at 18:51
|
show 2 more comments
3 Answers
3
active
oldest
votes
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
add a comment |
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
add a comment |
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN ol=0; print ""
/^[^#]/
if (ol) print ","
gsub ("47", "42")
$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
END print "n" ' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print ""callawksetting the output line variableol=0for","output control and printing the header"",/^[^#]/only match non-comment lines,if (ol) print ","if the output lineolis greater than zero, output a trailing","gsub ("47", "42")replace all single-quotes with double-quotes,$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a':'at the end.print $1, $2output the reformatted fields,ol++increment the output line count, andEND print "" 'close by printing the""footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN ol=0; print ""
> /^[^#]/
> if (ol) print ","
> gsub ("47", "42")
> $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
>
> END print "n" ' file.yaml
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
Oh -- let me work on that -- I totally forgot:)
– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55069082%2fconvert-a-keyvalue-file-w-comments-into-json-document-with-unix-tools%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
add a comment |
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
add a comment |
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
"($key)": "($value)" # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line (.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item (; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
edited Mar 10 at 3:30
answered Mar 8 at 19:08
Charles DuffyCharles Duffy
181k28206261
181k28206261
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
add a comment |
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
oh my word, I don't know how that worked, but it did. Thank you so much Charles. I shall now spend the rest of the day pouring over it and trying to understand it :) thx again
– jamiet
Mar 8 at 19:14
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
(Also, I just tested it with my my real-world yaml document that is much larger than the sample I posted here and it worked great)
– jamiet
Mar 8 at 19:18
1
1
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
Just added some comments; hopefully they make it easier to follow. Let me know if there are other places clarification is needed.
– Charles Duffy
Mar 8 at 19:27
add a comment |
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
add a comment |
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
add a comment |
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row (;
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
answered Mar 9 at 0:34
peakpeak
34.5k94461
34.5k94461
add a comment |
add a comment |
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN ol=0; print ""
/^[^#]/
if (ol) print ","
gsub ("47", "42")
$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
END print "n" ' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print ""callawksetting the output line variableol=0for","output control and printing the header"",/^[^#]/only match non-comment lines,if (ol) print ","if the output lineolis greater than zero, output a trailing","gsub ("47", "42")replace all single-quotes with double-quotes,$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a':'at the end.print $1, $2output the reformatted fields,ol++increment the output line count, andEND print "" 'close by printing the""footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN ol=0; print ""
> /^[^#]/
> if (ol) print ","
> gsub ("47", "42")
> $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
>
> END print "n" ' file.yaml
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
Oh -- let me work on that -- I totally forgot:)
– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
add a comment |
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN ol=0; print ""
/^[^#]/
if (ol) print ","
gsub ("47", "42")
$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
END print "n" ' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print ""callawksetting the output line variableol=0for","output control and printing the header"",/^[^#]/only match non-comment lines,if (ol) print ","if the output lineolis greater than zero, output a trailing","gsub ("47", "42")replace all single-quotes with double-quotes,$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a':'at the end.print $1, $2output the reformatted fields,ol++increment the output line count, andEND print "" 'close by printing the""footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN ol=0; print ""
> /^[^#]/
> if (ol) print ","
> gsub ("47", "42")
> $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
>
> END print "n" ' file.yaml
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
Oh -- let me work on that -- I totally forgot:)
– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
add a comment |
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN ol=0; print ""
/^[^#]/
if (ol) print ","
gsub ("47", "42")
$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
END print "n" ' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print ""callawksetting the output line variableol=0for","output control and printing the header"",/^[^#]/only match non-comment lines,if (ol) print ","if the output lineolis greater than zero, output a trailing","gsub ("47", "42")replace all single-quotes with double-quotes,$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a':'at the end.print $1, $2output the reformatted fields,ol++increment the output line count, andEND print "" 'close by printing the""footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN ol=0; print ""
> /^[^#]/
> if (ol) print ","
> gsub ("47", "42")
> $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
>
> END print "n" ' file.yaml
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN ol=0; print ""
/^[^#]/
if (ol) print ","
gsub ("47", "42")
$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
END print "n" ' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print ""callawksetting the output line variableol=0for","output control and printing the header"",/^[^#]/only match non-comment lines,if (ol) print ","if the output lineolis greater than zero, output a trailing","gsub ("47", "42")replace all single-quotes with double-quotes,$1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a':'at the end.print $1, $2output the reformatted fields,ol++increment the output line count, andEND print "" 'close by printing the""footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN ol=0; print ""
> /^[^#]/
> if (ol) print ","
> gsub ("47", "42")
> $1 = sprintf (" "%s":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
>
> END print "n" ' file.yaml
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
edited Mar 8 at 19:28
answered Mar 8 at 19:08
David C. RankinDavid C. Rankin
43.4k33151
43.4k33151
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
Oh -- let me work on that -- I totally forgot:)
– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
add a comment |
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
Oh -- let me work on that -- I totally forgot:)
– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
thanks David, that's close, but it doesn't place a comma after the first element thus not valid JSON
– jamiet
Mar 8 at 19:10
1
1
Oh -- let me work on that -- I totally forgot
:)– David C. Rankin
Mar 8 at 19:12
Oh -- let me work on that -- I totally forgot
:)– David C. Rankin
Mar 8 at 19:12
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
@jamiet - updated
– David C. Rankin
Mar 8 at 19:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55069082%2fconvert-a-keyvalue-file-w-comments-into-json-document-with-unix-tools%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
forgive my naivety, I think of those tools as being part of bash, happy to be corrected. I am limited by the tools available to me in the docker image in which I'm running this, that image is built from a debian base.
– jamiet
Mar 8 at 18:43
Build a new docker image that includes the tools you need to work with YAML and JSON.
– chepner
Mar 8 at 18:43
I do just want to deal with that specific input, perhaps the mention of yaml was a bum steer. I do have
jqavailable– jamiet
Mar 8 at 18:44
there are integers too. I have edited the sample above to reflect that
– jamiet
Mar 8 at 18:50
If you're used to
jqthen there's also a wrapper for it calledyqthat handles YAML: yq.readthedocs.io/en/latest– match
Mar 8 at 18:51