Converting complex SQL join to Pandas mergePandas: Join dataframe with conditionWhat is the difference between “INNER JOIN” and “OUTER JOIN”?How to merge two dictionaries in a single expression?Converting string into datetimePython join: why is it string.join(list) instead of list.join(string)?How to join (merge) data frames (inner, outer, left, right)?How do you merge two Git repositories?Merge / convert multiple PDF files into one PDFWhat's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?Renaming columns in pandasSelect rows from a DataFrame based on values in a column in pandas
Terse Method to Swap Lowest for Highest?
How does the math work for Perception checks?
Extract more than nine arguments that occur periodically in a sentence to use in macros in order to typset
Has any country ever had 2 former presidents in jail simultaneously?
How to explain what's wrong with this application of the chain rule?
Why is so much work done on numerical verification of the Riemann Hypothesis?
Can I say "fingers" when referring to toes?
Redundant comparison & "if" before assignment
Is there a way to get `mathscr' with lower case letters in pdfLaTeX?
Can a Canadian Travel to the USA twice, less than 180 days each time?
How to cover method return statement in Apex Class?
Store Credit Card Information in Password Manager?
The probability of Bus A arriving before Bus B
Biological Blimps: Propulsion
Can a stoichiometric mixture of oxygen and methane exist as a liquid at standard pressure and some (low) temperature?
Fear of getting stuck on one programming language / technology that is not used in my country
A social experiment. What is the worst that can happen?
How do you make your own symbol when Detexify fails?
What is the highest possible scrabble score for placing a single tile
Why would a new[] expression ever invoke a destructor?
Keeping a ball lost forever
What does chmod -u do?
Electoral considerations aside, what are potential benefits, for the US, of policy changes proposed by the tweet recognizing Golan annexation?
How can I write humor as character trait?
Converting complex SQL join to Pandas merge
Pandas: Join dataframe with conditionWhat is the difference between “INNER JOIN” and “OUTER JOIN”?How to merge two dictionaries in a single expression?Converting string into datetimePython join: why is it string.join(list) instead of list.join(string)?How to join (merge) data frames (inner, outer, left, right)?How do you merge two Git repositories?Merge / convert multiple PDF files into one PDFWhat's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?Renaming columns in pandasSelect rows from a DataFrame based on values in a column in pandas
I have the following SQL query for finding overlaps between begin
and end
for a particular note_id
:
select a.*, b.*
from test.analytical_cui_mipacq_concepts_new a
inner join test.analytical_cui_mipacq_concepts_new b on (
( b.begin>=a.begin and b.begin<=a.end )
or
( b.begin<=a.begin and b.end>=a.begin )
)
where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id))
that is taking forever and a day to run. I am trying to follow this thread to convert to a pandas merge:
pandas-join-dataframe-with-condition
and I so far came up with (new
is my original dataframe, note_id
is how I identify a particular individual, and id_
is the pk from the db table):
a = new.copy()
b = new.copy()
b.columns
b = b.rename(index=str, columns='end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x',
'old_cui': 'old_cui_x', 'type': 'type_x',
'polarity': 'polarity_x', 'id_':'id_x')
c = a.merge(b, how='inner', on=['note_id'])
print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
| ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
(((c.system=='metamap') & (c.system!=c.system_x))
| ((c.system_x=='metamap') & (c.system==c.system_x)
& (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]
When I run this, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
32 print(len(a), len(b), len(c))
33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
---> 34 | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
35 (((c.system=='metamap') & (c.system!=c.system_x))
36 | ((c.system_x=='metamap') & (c.system==c.system_x)
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
1674
1675 elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676 raise ValueError("Can only compare identically-labeled "
1677 "Series objects")
1678
ValueError: Can only compare identically-labeled Series objects
Not exactly sure what this means, even after Googling around for it.
The data look like:
begin,polarity,end,note_id,type,system,cui,id_
31,1,37,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004352,1
63,1,71,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,2
81,1,86,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0039869,3
96,1,100,527982345,biomedicus.v2.UmlsConcept,biomedicus,C1123023,4
96,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,5
101,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,6
130,1,138,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,7
143,1,144,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0184661,8
156,1,162,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0026591,9
176,1,185,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004268,10
201,1,209,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,11
101,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168094
100,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168095
109,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168096
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C0205435,168097
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C1279901,168098
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C0574032,168099
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C1827465,168100
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0021966,168101
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0221138,168102
31,1,37,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0004352,55414
599,1,603,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0206655,55415
67,1,73,4069123471-4,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C3263723,55416
646,-1,650,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0042109,55417
31,1,37,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32496
56,1,71,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C0993666,32497
92,1,105,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32498
96,1,100,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32499
120,1,129,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C2008415,32500
python pandas join merge
|
show 1 more comment
I have the following SQL query for finding overlaps between begin
and end
for a particular note_id
:
select a.*, b.*
from test.analytical_cui_mipacq_concepts_new a
inner join test.analytical_cui_mipacq_concepts_new b on (
( b.begin>=a.begin and b.begin<=a.end )
or
( b.begin<=a.begin and b.end>=a.begin )
)
where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id))
that is taking forever and a day to run. I am trying to follow this thread to convert to a pandas merge:
pandas-join-dataframe-with-condition
and I so far came up with (new
is my original dataframe, note_id
is how I identify a particular individual, and id_
is the pk from the db table):
a = new.copy()
b = new.copy()
b.columns
b = b.rename(index=str, columns='end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x',
'old_cui': 'old_cui_x', 'type': 'type_x',
'polarity': 'polarity_x', 'id_':'id_x')
c = a.merge(b, how='inner', on=['note_id'])
print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
| ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
(((c.system=='metamap') & (c.system!=c.system_x))
| ((c.system_x=='metamap') & (c.system==c.system_x)
& (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]
When I run this, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
32 print(len(a), len(b), len(c))
33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
---> 34 | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
35 (((c.system=='metamap') & (c.system!=c.system_x))
36 | ((c.system_x=='metamap') & (c.system==c.system_x)
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
1674
1675 elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676 raise ValueError("Can only compare identically-labeled "
1677 "Series objects")
1678
ValueError: Can only compare identically-labeled Series objects
Not exactly sure what this means, even after Googling around for it.
The data look like:
begin,polarity,end,note_id,type,system,cui,id_
31,1,37,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004352,1
63,1,71,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,2
81,1,86,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0039869,3
96,1,100,527982345,biomedicus.v2.UmlsConcept,biomedicus,C1123023,4
96,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,5
101,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,6
130,1,138,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,7
143,1,144,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0184661,8
156,1,162,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0026591,9
176,1,185,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004268,10
201,1,209,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,11
101,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168094
100,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168095
109,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168096
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C0205435,168097
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C1279901,168098
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C0574032,168099
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C1827465,168100
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0021966,168101
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0221138,168102
31,1,37,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0004352,55414
599,1,603,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0206655,55415
67,1,73,4069123471-4,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C3263723,55416
646,-1,650,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0042109,55417
31,1,37,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32496
56,1,71,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C0993666,32497
92,1,105,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32498
96,1,100,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32499
120,1,129,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C2008415,32500
python pandas join merge
That means the Seriesa
andb
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the testa = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?
– Peter Leimbigler
Mar 8 at 2:48
Done. I'm basically trying to find overlaps in mybegin
andend
columns across a singlenote_id
instance..
– horcle_buzz
Mar 8 at 3:11
2
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
1
you should probably sample your data given what you provided does not match some of the conditions you specify, such assystem == 'metamap'
– aws_apprentice
Mar 8 at 3:27
|
show 1 more comment
I have the following SQL query for finding overlaps between begin
and end
for a particular note_id
:
select a.*, b.*
from test.analytical_cui_mipacq_concepts_new a
inner join test.analytical_cui_mipacq_concepts_new b on (
( b.begin>=a.begin and b.begin<=a.end )
or
( b.begin<=a.begin and b.end>=a.begin )
)
where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id))
that is taking forever and a day to run. I am trying to follow this thread to convert to a pandas merge:
pandas-join-dataframe-with-condition
and I so far came up with (new
is my original dataframe, note_id
is how I identify a particular individual, and id_
is the pk from the db table):
a = new.copy()
b = new.copy()
b.columns
b = b.rename(index=str, columns='end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x',
'old_cui': 'old_cui_x', 'type': 'type_x',
'polarity': 'polarity_x', 'id_':'id_x')
c = a.merge(b, how='inner', on=['note_id'])
print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
| ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
(((c.system=='metamap') & (c.system!=c.system_x))
| ((c.system_x=='metamap') & (c.system==c.system_x)
& (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]
When I run this, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
32 print(len(a), len(b), len(c))
33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
---> 34 | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
35 (((c.system=='metamap') & (c.system!=c.system_x))
36 | ((c.system_x=='metamap') & (c.system==c.system_x)
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
1674
1675 elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676 raise ValueError("Can only compare identically-labeled "
1677 "Series objects")
1678
ValueError: Can only compare identically-labeled Series objects
Not exactly sure what this means, even after Googling around for it.
The data look like:
begin,polarity,end,note_id,type,system,cui,id_
31,1,37,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004352,1
63,1,71,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,2
81,1,86,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0039869,3
96,1,100,527982345,biomedicus.v2.UmlsConcept,biomedicus,C1123023,4
96,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,5
101,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,6
130,1,138,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,7
143,1,144,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0184661,8
156,1,162,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0026591,9
176,1,185,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004268,10
201,1,209,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,11
101,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168094
100,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168095
109,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168096
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C0205435,168097
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C1279901,168098
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C0574032,168099
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C1827465,168100
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0021966,168101
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0221138,168102
31,1,37,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0004352,55414
599,1,603,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0206655,55415
67,1,73,4069123471-4,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C3263723,55416
646,-1,650,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0042109,55417
31,1,37,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32496
56,1,71,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C0993666,32497
92,1,105,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32498
96,1,100,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32499
120,1,129,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C2008415,32500
python pandas join merge
I have the following SQL query for finding overlaps between begin
and end
for a particular note_id
:
select a.*, b.*
from test.analytical_cui_mipacq_concepts_new a
inner join test.analytical_cui_mipacq_concepts_new b on (
( b.begin>=a.begin and b.begin<=a.end )
or
( b.begin<=a.begin and b.end>=a.begin )
)
where ((a.system='metamap' and b.system!=a.system) or (a.system='metamap' and b.system=a.system and a.id_ != b.id_ and a.note_id = b.note_id))
that is taking forever and a day to run. I am trying to follow this thread to convert to a pandas merge:
pandas-join-dataframe-with-condition
and I so far came up with (new
is my original dataframe, note_id
is how I identify a particular individual, and id_
is the pk from the db table):
a = new.copy()
b = new.copy()
b.columns
b = b.rename(index=str, columns='end':'end_x', 'begin': 'begin_x', 'cui': 'cui_x',
'old_cui': 'old_cui_x', 'type': 'type_x',
'polarity': 'polarity_x', 'id_':'id_x')
c = a.merge(b, how='inner', on=['note_id'])
print(len(a), len(b), len(c))
c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
| ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
(((c.system=='metamap') & (c.system!=c.system_x))
| ((c.system_x=='metamap') & (c.system==c.system_x)
& (c.id_ != c.id_x) & (c.note_id == c.note_id_x)))]
When I run this, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-e8c0d060f2a0> in <module>()
32 print(len(a), len(b), len(c))
33 c.loc[(((c.begin >= c.begin_x) & (c.begin <= c.end_x))
---> 34 | ((c.begin<=b.begin_x) & (c.end>=c.begin_x))) &
35 (((c.system=='metamap') & (c.system!=c.system_x))
36 | ((c.system_x=='metamap') & (c.system==c.system_x)
/anaconda3/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
1674
1675 elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676 raise ValueError("Can only compare identically-labeled "
1677 "Series objects")
1678
ValueError: Can only compare identically-labeled Series objects
Not exactly sure what this means, even after Googling around for it.
The data look like:
begin,polarity,end,note_id,type,system,cui,id_
31,1,37,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004352,1
63,1,71,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,2
81,1,86,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0039869,3
96,1,100,527982345,biomedicus.v2.UmlsConcept,biomedicus,C1123023,4
96,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,5
101,1,105,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0015230,6
130,1,138,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,7
143,1,144,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0184661,8
156,1,162,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0026591,9
176,1,185,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0004268,10
201,1,209,527982345,biomedicus.v2.UmlsConcept,biomedicus,C0574032,11
101,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168094
100,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168095
109,-1,116,527982345,org.metamap.uima.ts.Candidate,metamap,C0445223,168096
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C0205435,168097
124,1,129,527982345,org.metamap.uima.ts.Candidate,metamap,C1279901,168098
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C0574032,168099
130,1,138,527982345,org.metamap.uima.ts.Candidate,metamap,C1827465,168100
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0021966,168101
143,1,144,527982345,org.metamap.uima.ts.Candidate,metamap,C0221138,168102
31,1,37,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0004352,55414
599,1,603,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0206655,55415
67,1,73,4069123471-4,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C3263723,55416
646,-1,650,527982345,org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention,ctakes,C0042109,55417
31,1,37,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32496
56,1,71,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C0993666,32497
92,1,105,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32498
96,1,100,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,,32499
120,1,129,527982345,edu.uth.clamp.nlp.typesystem.ClampNameEntityUIMA,clamp,C2008415,32500
python pandas join merge
python pandas join merge
edited Mar 8 at 3:43
horcle_buzz
asked Mar 8 at 2:37
horcle_buzzhorcle_buzz
6991023
6991023
That means the Seriesa
andb
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the testa = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?
– Peter Leimbigler
Mar 8 at 2:48
Done. I'm basically trying to find overlaps in mybegin
andend
columns across a singlenote_id
instance..
– horcle_buzz
Mar 8 at 3:11
2
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
1
you should probably sample your data given what you provided does not match some of the conditions you specify, such assystem == 'metamap'
– aws_apprentice
Mar 8 at 3:27
|
show 1 more comment
That means the Seriesa
andb
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the testa = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?
– Peter Leimbigler
Mar 8 at 2:48
Done. I'm basically trying to find overlaps in mybegin
andend
columns across a singlenote_id
instance..
– horcle_buzz
Mar 8 at 3:11
2
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
1
you should probably sample your data given what you provided does not match some of the conditions you specify, such assystem == 'metamap'
– aws_apprentice
Mar 8 at 3:27
That means the Series
a
and b
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the test a = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?– Peter Leimbigler
Mar 8 at 2:48
That means the Series
a
and b
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the test a = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?– Peter Leimbigler
Mar 8 at 2:48
Done. I'm basically trying to find overlaps in my
begin
and end
columns across a single note_id
instance..– horcle_buzz
Mar 8 at 3:11
Done. I'm basically trying to find overlaps in my
begin
and end
columns across a single note_id
instance..– horcle_buzz
Mar 8 at 3:11
2
2
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
1
1
you should probably sample your data given what you provided does not match some of the conditions you specify, such as
system == 'metamap'
– aws_apprentice
Mar 8 at 3:27
you should probably sample your data given what you provided does not match some of the conditions you specify, such as
system == 'metamap'
– aws_apprentice
Mar 8 at 3:27
|
show 1 more comment
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55055931%2fconverting-complex-sql-join-to-pandas-merge%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55055931%2fconverting-complex-sql-join-to-pandas-merge%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
That means the Series
a
andb
have different indexes, and pandas does not define Series comparison in this case. The same error occurs with the testa = pd.Series([1, 2], index=[0, 1]); b = pd.Series([1, 2], index=[0, 2]); a == b
. Could you post a few lines of example data?– Peter Leimbigler
Mar 8 at 2:48
Done. I'm basically trying to find overlaps in my
begin
andend
columns across a singlenote_id
instance..– horcle_buzz
Mar 8 at 3:11
2
can you post the data not as an image but as actual text so that we can paste it into our IDE's? thanks!
– aws_apprentice
Mar 8 at 3:17
Done. Pasting from excel makes it an image, for some stupid reason.
– horcle_buzz
Mar 8 at 3:25
1
you should probably sample your data given what you provided does not match some of the conditions you specify, such as
system == 'metamap'
– aws_apprentice
Mar 8 at 3:27