Split SQL statements on function name but keep delimiter in Python2019 Community Moderator ElectionParse CASE WHEN statements with sqlparseCalling a function of a module by using its name (a string)Replacements for switch statement in Python?What is the naming convention in Python for variable and function names?How do I split a string with any whitespace chars as delimiters?How do I split a string on a delimiter in Bash?Split Strings into words with multiple word boundary delimitersHow can I do an UPDATE statement with JOIN in SQL?Find all tables containing column with specified name - MS SQL ServerSplit string with multiple delimiters in PythonSplit string on whitespace in Python
How can I discourage/prevent PCs from using door choke-points?
Do Bugbears' arms literally get longer when it's their turn?
Is it ok to include an epilogue dedicated to colleagues who passed away in the end of the manuscript?
How do anti-virus programs start at Windows boot?
How to deal with a cynical class?
Is this animal really missing?
Making a sword in the stone, in a medieval world without magic
How is the Swiss post e-voting system supposed to work, and how was it wrong?
Deleting missing values from a dataset
Coworker uses her breast-pump everywhere in the office
When is a batch class instantiated when you schedule it?
Why do Australian milk farmers need to protest supermarkets' milk price?
Ban on all campaign finance?
What does it mean when multiple 々 marks follow a 、?
Is all copper pipe pretty much the same?
What is the definition of "Natural Selection"?
Running a subshell from the middle of the current command
Why don't MCU characters ever seem to have language issues?
What exactly is the purpose of connection links straped between the rocket and the launch pad
Do I need to leave some extra space available on the disk which my database log files reside, for log backup operations to successfully occur?
Who is our nearest neighbor
How does Dispel Magic work against Stoneskin?
What happens with multiple copies of Humility and Glorious Anthem on the battlefield?
It's a yearly task, alright
Split SQL statements on function name but keep delimiter in Python
2019 Community Moderator ElectionParse CASE WHEN statements with sqlparseCalling a function of a module by using its name (a string)Replacements for switch statement in Python?What is the naming convention in Python for variable and function names?How do I split a string with any whitespace chars as delimiters?How do I split a string on a delimiter in Bash?Split Strings into words with multiple word boundary delimitersHow can I do an UPDATE statement with JOIN in SQL?Find all tables containing column with specified name - MS SQL ServerSplit string with multiple delimiters in PythonSplit string on whitespace in Python
Assuming that I have the following string
that contains SQL statements extracted from a SELECT
clause (in reality this is a huge SQL statement with hundreds of such statements);
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
how can I split this query on function (i.e. SUM
, MAX
, MIN
, MEAN
etc.) such that I can extract the last query but without removing the delimiter (which in this case is SUM
)?
So the desired output would be a string like the one below:
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
PS: For presentation purposes I have provided some sort of indentation but in reality these statements are separated by a comma meaning that no whitespaces or new lines appear in the original form.
python sql regex split
New contributor
add a comment |
Assuming that I have the following string
that contains SQL statements extracted from a SELECT
clause (in reality this is a huge SQL statement with hundreds of such statements);
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
how can I split this query on function (i.e. SUM
, MAX
, MIN
, MEAN
etc.) such that I can extract the last query but without removing the delimiter (which in this case is SUM
)?
So the desired output would be a string like the one below:
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
PS: For presentation purposes I have provided some sort of indentation but in reality these statements are separated by a comma meaning that no whitespaces or new lines appear in the original form.
python sql regex split
New contributor
Have you tried to split by comma (,
) ?
– Ralf
Mar 7 at 11:05
@Ralf It won't work in this scenario. A split on,
(sql.split(',').pop()
) would give"C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
Hm... and.split(',n')
?
– Ralf
Mar 7 at 11:09
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11
add a comment |
Assuming that I have the following string
that contains SQL statements extracted from a SELECT
clause (in reality this is a huge SQL statement with hundreds of such statements);
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
how can I split this query on function (i.e. SUM
, MAX
, MIN
, MEAN
etc.) such that I can extract the last query but without removing the delimiter (which in this case is SUM
)?
So the desired output would be a string like the one below:
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
PS: For presentation purposes I have provided some sort of indentation but in reality these statements are separated by a comma meaning that no whitespaces or new lines appear in the original form.
python sql regex split
New contributor
Assuming that I have the following string
that contains SQL statements extracted from a SELECT
clause (in reality this is a huge SQL statement with hundreds of such statements);
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
how can I split this query on function (i.e. SUM
, MAX
, MIN
, MEAN
etc.) such that I can extract the last query but without removing the delimiter (which in this case is SUM
)?
So the desired output would be a string like the one below:
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
PS: For presentation purposes I have provided some sort of indentation but in reality these statements are separated by a comma meaning that no whitespaces or new lines appear in the original form.
python sql regex split
python sql regex split
New contributor
New contributor
edited Mar 7 at 12:27
Martijn Pieters♦
719k14025112320
719k14025112320
New contributor
asked Mar 7 at 11:01
Old-SchoolOld-School
155
155
New contributor
New contributor
Have you tried to split by comma (,
) ?
– Ralf
Mar 7 at 11:05
@Ralf It won't work in this scenario. A split on,
(sql.split(',').pop()
) would give"C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
Hm... and.split(',n')
?
– Ralf
Mar 7 at 11:09
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11
add a comment |
Have you tried to split by comma (,
) ?
– Ralf
Mar 7 at 11:05
@Ralf It won't work in this scenario. A split on,
(sql.split(',').pop()
) would give"C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
Hm... and.split(',n')
?
– Ralf
Mar 7 at 11:09
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11
Have you tried to split by comma (
,
) ?– Ralf
Mar 7 at 11:05
Have you tried to split by comma (
,
) ?– Ralf
Mar 7 at 11:05
@Ralf It won't work in this scenario. A split on
,
(sql.split(',').pop()
) would give "C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
@Ralf It won't work in this scenario. A split on
,
(sql.split(',').pop()
) would give "C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
Hm... and
.split(',n')
?– Ralf
Mar 7 at 11:09
Hm... and
.split(',n')
?– Ralf
Mar 7 at 11:09
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11
add a comment |
3 Answers
3
active
oldest
votes
You can't use a regular expression here, because SQL syntax does not form regular patterns you could match with the Python re
engine. You'd have to actually parse the string into a token stream or syntax tree; your SUM(...)
can contain a wide array of syntax, including sub-selects, after all.
The sqlparse
library can do this, even though it is a bit underdocumented and not that friendly to external uses.
Re-using the walk_tokens
function I defined in the other post I linked to:
from collections import deque
from sqlparse.sql import TokenList
def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token
extracting the last element from the SELECT
identifier list then is:
import sqlparse
from sqlparse.sql import IdentifierList
tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break
print(identifier)
Demo:
>>> sql = '''
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
identifier
is a sqlparse.sql.Identifier
instance, but converting it to a string again (which print()
does, or you can just use str()
) gives you the input SQL string again for that section.
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from aSELECT
statement and not the actual statement per se.
– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. NoSELECT
orFROM
orJOIN
in sight.
– Martijn Pieters♦
Mar 7 at 11:55
|
show 1 more comment
I have a solution, but it is a bit much code. This is without using regex
, just many iterations of splitting on keywords.
s = """
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
"""
# remove newlines and doble spaces
s = s.replace('n', ' ')
while ' ' in s:
s = s.replace(' ', ' ')
s = s.strip()
# split on keywords, starting with the original string
current_parts = [s, ]
for kw in ['SUM', 'MAX', 'MIN']:
new_parts = []
for part in current_parts:
for i, new_part in enumerate(part.split(kw)):
if i > 0:
# add keyword to the start of this substring
new_part = ''.format(kw, new_part)
new_part = new_part.strip()
if len(new_part) > 0:
new_parts.append(new_part.strip())
current_parts = new_parts
print()
print('current_parts:')
for s in current_parts:
print(s)
The output I get is:
current_parts:
SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0,
MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1,
SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2
Does it work for you? It seems to work for the example string you put in the question.
But I have already told you that there are not new lines (n
) in the string.
– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
add a comment |
You could use something like:
import re
str = 'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0, MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1, SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
result = re.finditer('ass+[a-zA-Z0-9]+', str);
commas = []
parts = []
for reg in result:
end = reg.end()
if(len(str) > end and str[end] == ','):
commas.append(end)
idx = 0
for comma in commas:
parts.append(str[idx:comma])
idx = comma + 1
parts.append(str[idx:])
print(parts)
In commas
array you will have the commas that need to be splitted. Output will be:
[151, 322]
In parts you'll have the final array with the parts (Not sure if this implementation is the best way):
[
'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0',
' MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1',
' SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
]
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Old-School is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55042234%2fsplit-sql-statements-on-function-name-but-keep-delimiter-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can't use a regular expression here, because SQL syntax does not form regular patterns you could match with the Python re
engine. You'd have to actually parse the string into a token stream or syntax tree; your SUM(...)
can contain a wide array of syntax, including sub-selects, after all.
The sqlparse
library can do this, even though it is a bit underdocumented and not that friendly to external uses.
Re-using the walk_tokens
function I defined in the other post I linked to:
from collections import deque
from sqlparse.sql import TokenList
def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token
extracting the last element from the SELECT
identifier list then is:
import sqlparse
from sqlparse.sql import IdentifierList
tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break
print(identifier)
Demo:
>>> sql = '''
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
identifier
is a sqlparse.sql.Identifier
instance, but converting it to a string again (which print()
does, or you can just use str()
) gives you the input SQL string again for that section.
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from aSELECT
statement and not the actual statement per se.
– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. NoSELECT
orFROM
orJOIN
in sight.
– Martijn Pieters♦
Mar 7 at 11:55
|
show 1 more comment
You can't use a regular expression here, because SQL syntax does not form regular patterns you could match with the Python re
engine. You'd have to actually parse the string into a token stream or syntax tree; your SUM(...)
can contain a wide array of syntax, including sub-selects, after all.
The sqlparse
library can do this, even though it is a bit underdocumented and not that friendly to external uses.
Re-using the walk_tokens
function I defined in the other post I linked to:
from collections import deque
from sqlparse.sql import TokenList
def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token
extracting the last element from the SELECT
identifier list then is:
import sqlparse
from sqlparse.sql import IdentifierList
tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break
print(identifier)
Demo:
>>> sql = '''
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
identifier
is a sqlparse.sql.Identifier
instance, but converting it to a string again (which print()
does, or you can just use str()
) gives you the input SQL string again for that section.
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from aSELECT
statement and not the actual statement per se.
– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. NoSELECT
orFROM
orJOIN
in sight.
– Martijn Pieters♦
Mar 7 at 11:55
|
show 1 more comment
You can't use a regular expression here, because SQL syntax does not form regular patterns you could match with the Python re
engine. You'd have to actually parse the string into a token stream or syntax tree; your SUM(...)
can contain a wide array of syntax, including sub-selects, after all.
The sqlparse
library can do this, even though it is a bit underdocumented and not that friendly to external uses.
Re-using the walk_tokens
function I defined in the other post I linked to:
from collections import deque
from sqlparse.sql import TokenList
def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token
extracting the last element from the SELECT
identifier list then is:
import sqlparse
from sqlparse.sql import IdentifierList
tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break
print(identifier)
Demo:
>>> sql = '''
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
identifier
is a sqlparse.sql.Identifier
instance, but converting it to a string again (which print()
does, or you can just use str()
) gives you the input SQL string again for that section.
You can't use a regular expression here, because SQL syntax does not form regular patterns you could match with the Python re
engine. You'd have to actually parse the string into a token stream or syntax tree; your SUM(...)
can contain a wide array of syntax, including sub-selects, after all.
The sqlparse
library can do this, even though it is a bit underdocumented and not that friendly to external uses.
Re-using the walk_tokens
function I defined in the other post I linked to:
from collections import deque
from sqlparse.sql import TokenList
def walk_tokens(token):
queue = deque([token])
while queue:
token = queue.popleft()
if isinstance(token, TokenList):
queue.extend(token)
yield token
extracting the last element from the SELECT
identifier list then is:
import sqlparse
from sqlparse.sql import IdentifierList
tokens = sqlparse.parse(sql)[0]
for tok in walk_tokens(tokens):
if isinstance(tok, IdentifierList):
# iterate to leave the last assigned to `identifier`
for identifier in tok.get_identifiers():
pass
break
print(identifier)
Demo:
>>> sql = '''
... SUM(case when(A.money-B.money>1000
... and A.unixtime-B.unixtime<=890769
... and B.col10 = "A"
... and B.col11 = "12"
... and B.col12 = "V") then 10
... end) as finalCond0,
... MAX(case when(A.money-B.money<0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "4321"
... and B.cond3 in ("E", "F", "G")) then A.col10
... end) as finalCond1,
... SUM(case when(A.money-B.money>0
... and A.unixtime-B.unixtime<=6786000
... and B.cond1 = "A"
... and B.cond2 = "1234"
... and B.cond3 in ("A", "B", "C")) then 2
... end) as finalCond2
... '''
>>> tokens = sqlparse.parse(sql)[0]
>>> for tok in walk_tokens(tokens):
... if isinstance(tok, IdentifierList):
... # iterate to leave the last assigned to `identifier`
... for identifier in tok.get_identifiers():
... pass
... break
...
>>> print(identifier)
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
identifier
is a sqlparse.sql.Identifier
instance, but converting it to a string again (which print()
does, or you can just use str()
) gives you the input SQL string again for that section.
edited Mar 7 at 11:57
answered Mar 7 at 11:34
Martijn Pieters♦Martijn Pieters
719k14025112320
719k14025112320
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from aSELECT
statement and not the actual statement per se.
– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. NoSELECT
orFROM
orJOIN
in sight.
– Martijn Pieters♦
Mar 7 at 11:55
|
show 1 more comment
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from aSELECT
statement and not the actual statement per se.
– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. NoSELECT
orFROM
orJOIN
in sight.
– Martijn Pieters♦
Mar 7 at 11:55
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
Wouldn't be much easier with a regular expression?
– Old-School
Mar 7 at 11:35
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: no, because Python's regular expressions can't be used to parse nested structures.
– Martijn Pieters♦
Mar 7 at 11:36
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
@Old-School: SQL is not 'regular', you can't predict the number of parentheses, or where commas are going to used, etc.
– Martijn Pieters♦
Mar 7 at 11:37
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from a
SELECT
statement and not the actual statement per se.– Old-School
Mar 7 at 11:40
Would this solution work even if I don't have complete SQL statements? Meaning that they are just identifiers from a
SELECT
statement and not the actual statement per se.– Old-School
Mar 7 at 11:40
@Old-School: I used your literal input, it's in the demonstration. No
SELECT
or FROM
or JOIN
in sight.– Martijn Pieters♦
Mar 7 at 11:55
@Old-School: I used your literal input, it's in the demonstration. No
SELECT
or FROM
or JOIN
in sight.– Martijn Pieters♦
Mar 7 at 11:55
|
show 1 more comment
I have a solution, but it is a bit much code. This is without using regex
, just many iterations of splitting on keywords.
s = """
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
"""
# remove newlines and doble spaces
s = s.replace('n', ' ')
while ' ' in s:
s = s.replace(' ', ' ')
s = s.strip()
# split on keywords, starting with the original string
current_parts = [s, ]
for kw in ['SUM', 'MAX', 'MIN']:
new_parts = []
for part in current_parts:
for i, new_part in enumerate(part.split(kw)):
if i > 0:
# add keyword to the start of this substring
new_part = ''.format(kw, new_part)
new_part = new_part.strip()
if len(new_part) > 0:
new_parts.append(new_part.strip())
current_parts = new_parts
print()
print('current_parts:')
for s in current_parts:
print(s)
The output I get is:
current_parts:
SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0,
MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1,
SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2
Does it work for you? It seems to work for the example string you put in the question.
But I have already told you that there are not new lines (n
) in the string.
– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
add a comment |
I have a solution, but it is a bit much code. This is without using regex
, just many iterations of splitting on keywords.
s = """
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
"""
# remove newlines and doble spaces
s = s.replace('n', ' ')
while ' ' in s:
s = s.replace(' ', ' ')
s = s.strip()
# split on keywords, starting with the original string
current_parts = [s, ]
for kw in ['SUM', 'MAX', 'MIN']:
new_parts = []
for part in current_parts:
for i, new_part in enumerate(part.split(kw)):
if i > 0:
# add keyword to the start of this substring
new_part = ''.format(kw, new_part)
new_part = new_part.strip()
if len(new_part) > 0:
new_parts.append(new_part.strip())
current_parts = new_parts
print()
print('current_parts:')
for s in current_parts:
print(s)
The output I get is:
current_parts:
SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0,
MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1,
SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2
Does it work for you? It seems to work for the example string you put in the question.
But I have already told you that there are not new lines (n
) in the string.
– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
add a comment |
I have a solution, but it is a bit much code. This is without using regex
, just many iterations of splitting on keywords.
s = """
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
"""
# remove newlines and doble spaces
s = s.replace('n', ' ')
while ' ' in s:
s = s.replace(' ', ' ')
s = s.strip()
# split on keywords, starting with the original string
current_parts = [s, ]
for kw in ['SUM', 'MAX', 'MIN']:
new_parts = []
for part in current_parts:
for i, new_part in enumerate(part.split(kw)):
if i > 0:
# add keyword to the start of this substring
new_part = ''.format(kw, new_part)
new_part = new_part.strip()
if len(new_part) > 0:
new_parts.append(new_part.strip())
current_parts = new_parts
print()
print('current_parts:')
for s in current_parts:
print(s)
The output I get is:
current_parts:
SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0,
MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1,
SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2
Does it work for you? It seems to work for the example string you put in the question.
I have a solution, but it is a bit much code. This is without using regex
, just many iterations of splitting on keywords.
s = """
SUM(case when(A.money-B.money>1000
and A.unixtime-B.unixtime<=890769
and B.col10 = "A"
and B.col11 = "12"
and B.col12 = "V") then 10
end) as finalCond0,
MAX(case when(A.money-B.money<0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "4321"
and B.cond3 in ("E", "F", "G")) then A.col10
end) as finalCond1,
SUM(case when(A.money-B.money>0
and A.unixtime-B.unixtime<=6786000
and B.cond1 = "A"
and B.cond2 = "1234"
and B.cond3 in ("A", "B", "C")) then 2
end) as finalCond2
"""
# remove newlines and doble spaces
s = s.replace('n', ' ')
while ' ' in s:
s = s.replace(' ', ' ')
s = s.strip()
# split on keywords, starting with the original string
current_parts = [s, ]
for kw in ['SUM', 'MAX', 'MIN']:
new_parts = []
for part in current_parts:
for i, new_part in enumerate(part.split(kw)):
if i > 0:
# add keyword to the start of this substring
new_part = ''.format(kw, new_part)
new_part = new_part.strip()
if len(new_part) > 0:
new_parts.append(new_part.strip())
current_parts = new_parts
print()
print('current_parts:')
for s in current_parts:
print(s)
The output I get is:
current_parts:
SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0,
MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1,
SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2
Does it work for you? It seems to work for the example string you put in the question.
edited Mar 7 at 11:31
answered Mar 7 at 11:27
RalfRalf
6,70141337
6,70141337
But I have already told you that there are not new lines (n
) in the string.
– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
add a comment |
But I have already told you that there are not new lines (n
) in the string.
– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
But I have already told you that there are not new lines (
n
) in the string.– Old-School
Mar 7 at 11:28
But I have already told you that there are not new lines (
n
) in the string.– Old-School
Mar 7 at 11:28
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
The code I posted will work, wheter there are newlines in the string or not. I just make sure that there are no newlines present at the start, but that wont affect the outcome if the string does not have any newlines in it.
– Ralf
Mar 7 at 11:29
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
Right OK. Thanks for the attempt.
– Old-School
Mar 7 at 11:30
add a comment |
You could use something like:
import re
str = 'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0, MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1, SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
result = re.finditer('ass+[a-zA-Z0-9]+', str);
commas = []
parts = []
for reg in result:
end = reg.end()
if(len(str) > end and str[end] == ','):
commas.append(end)
idx = 0
for comma in commas:
parts.append(str[idx:comma])
idx = comma + 1
parts.append(str[idx:])
print(parts)
In commas
array you will have the commas that need to be splitted. Output will be:
[151, 322]
In parts you'll have the final array with the parts (Not sure if this implementation is the best way):
[
'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0',
' MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1',
' SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
]
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
add a comment |
You could use something like:
import re
str = 'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0, MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1, SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
result = re.finditer('ass+[a-zA-Z0-9]+', str);
commas = []
parts = []
for reg in result:
end = reg.end()
if(len(str) > end and str[end] == ','):
commas.append(end)
idx = 0
for comma in commas:
parts.append(str[idx:comma])
idx = comma + 1
parts.append(str[idx:])
print(parts)
In commas
array you will have the commas that need to be splitted. Output will be:
[151, 322]
In parts you'll have the final array with the parts (Not sure if this implementation is the best way):
[
'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0',
' MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1',
' SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
]
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
add a comment |
You could use something like:
import re
str = 'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0, MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1, SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
result = re.finditer('ass+[a-zA-Z0-9]+', str);
commas = []
parts = []
for reg in result:
end = reg.end()
if(len(str) > end and str[end] == ','):
commas.append(end)
idx = 0
for comma in commas:
parts.append(str[idx:comma])
idx = comma + 1
parts.append(str[idx:])
print(parts)
In commas
array you will have the commas that need to be splitted. Output will be:
[151, 322]
In parts you'll have the final array with the parts (Not sure if this implementation is the best way):
[
'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0',
' MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1',
' SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
]
You could use something like:
import re
str = 'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0, MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1, SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
result = re.finditer('ass+[a-zA-Z0-9]+', str);
commas = []
parts = []
for reg in result:
end = reg.end()
if(len(str) > end and str[end] == ','):
commas.append(end)
idx = 0
for comma in commas:
parts.append(str[idx:comma])
idx = comma + 1
parts.append(str[idx:])
print(parts)
In commas
array you will have the commas that need to be splitted. Output will be:
[151, 322]
In parts you'll have the final array with the parts (Not sure if this implementation is the best way):
[
'SUM(case when(A.money-B.money>1000 and A.unixtime-B.unixtime<=890769 and B.col10 = "A" and B.col11 = "12" and B.col12 = "V") then 10 end) as finalCond0',
' MAX(case when(A.money-B.money<0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "4321" and B.cond3 in ("E", "F", "G")) then A.col10 end) as finalCond1',
' SUM(case when(A.money-B.money>0 and A.unixtime-B.unixtime<=6786000 and B.cond1 = "A" and B.cond2 = "1234" and B.cond3 in ("A", "B", "C")) then 2 end) as finalCond2'
]
edited Mar 7 at 11:57
answered Mar 7 at 11:45
ALFAALFA
543212
543212
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
add a comment |
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Is the output of this code even close to the desired output I've posted in my question?
– Old-School
Mar 7 at 11:47
Edited for you, check now!
– ALFA
Mar 7 at 11:58
Edited for you, check now!
– ALFA
Mar 7 at 11:58
add a comment |
Old-School is a new contributor. Be nice, and check out our Code of Conduct.
Old-School is a new contributor. Be nice, and check out our Code of Conduct.
Old-School is a new contributor. Be nice, and check out our Code of Conduct.
Old-School is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55042234%2fsplit-sql-statements-on-function-name-but-keep-delimiter-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Have you tried to split by comma (
,
) ?– Ralf
Mar 7 at 11:05
@Ralf It won't work in this scenario. A split on
,
(sql.split(',').pop()
) would give"C")) then 2 end) as finalCond2
– Old-School
Mar 7 at 11:07
Hm... and
.split(',n')
?– Ralf
Mar 7 at 11:09
@Ralf Won't work either. For presentation purposes I have provided some sort of indentation but in reality these statements are just separated by a comma meaning that no whitespaces or new lines appear in the original form.
– Old-School
Mar 7 at 11:11