Preserve Dataframe column data type after outer mergeSelecting multiple columns in a pandas dataframeNumPy or Pandas: Keeping array type as integer while having a NaN valueAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column name“Large data” work flows using pandasChange data type of columns in PandasSelect rows from a DataFrame based on values in a column in pandasConverting Pandas Dataframe typesGet list from pandas DataFrame column headers
The use of multiple foreign keys on same column in SQL Server
Why are weather verbs 曇る and 晴れる treated differently in this sentence?
How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?
Have astronauts in space suits ever taken selfies? If so, how?
Which models of the Boeing 737 are still in production?
Is it possible to rebuild the bike frame (to make it lighter) by welding aluminum tubes
I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
What would happen to a modern skyscraper if it rains micro blackholes?
Email Account under attack (really) - anything I can do?
Modeling an IPv4 Address
"You are your self first supporter", a more proper way to say it
Does the fruit of Mantra Japa automatically go to Indra if Japa Samarpana Mantra is not chanted?
Languages that we cannot (dis)prove to be Context-Free
can i play a electric guitar through a bass amp?
How is it possible to have an ability score that is less than 3?
Animated Series: Alien black spider robot crashes on Earth
If two metric spaces are topologically equivalent imply that they are complete?
How does strength of boric acid solution increase in presence of salicylic acid?
Is this a crack on the carbon frame?
How do you know if an analog film camera is still working?
GPS Rollover on Android Smartphones
TGV timetables / schedules?
How does one intimidate enemies without having the capacity for violence?
Preserve Dataframe column data type after outer merge
Selecting multiple columns in a pandas dataframeNumPy or Pandas: Keeping array type as integer while having a NaN valueAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column name“Large data” work flows using pandasChange data type of columns in PandasSelect rows from a DataFrame based on values in a column in pandasConverting Pandas Dataframe typesGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.
fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?
Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.
Is there a workaround to this?
python pandas
add a comment |
When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.
fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?
Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.
Is there a workaround to this?
python pandas
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41
add a comment |
When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.
fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?
Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.
Is there a workaround to this?
python pandas
When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.
fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?
Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.
Is there a workaround to this?
python pandas
python pandas
edited Mar 10 at 9:54
Max Shvartsman
92
92
asked Apr 20 '16 at 12:18
JeffJeff
156113
156113
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41
add a comment |
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41
add a comment |
4 Answers
4
active
oldest
votes
I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:
import pandas as pd
# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])
# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
# join the dataframe
joined = df.join(df2, how='outer')
# columns with nans are now float dtype
print joined.dtypes
# replace nans with suitable int value
joined.fillna(-1, inplace=True)
# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))
print joined_orig_types.dtypes
add a comment |
This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.
Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:
import pandas as pd
df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
df2 = pd.DataFrame('d': [1,2], 'e': [True, False])
df2 = df2.astype('int').astype('Int64')
df2.dtypes
#d Int64
#e Int64
#dtype: object
df.join(df2)
# a b c d e
#0 1 1 0 1 1
#1 1 2 1 2 0
#2 1 1 2 NaN NaN
#3 1 2 3 NaN NaN
#4 1 1 4 NaN NaN
#5 1 2 5 NaN NaN
#a int64
#b int64
#c int64
#d Int64
#e Int64
#dtype: object
The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.
With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.
add a comment |
Or you can just do a concat/append on dtypes of both dfs and applyastype():
joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
#or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
print(joined)
a b c e d
0 1 1 0 1 1
1 1 2 1 1 2
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
add a comment |
A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
joined = df.join(df2, how='outer')
new_joined = joined.fillna(-1).astype(orig)
print(new_joined)
print(new_joined.dtypes)
Output:
a b c d e
0 1 1 0 1 1
1 1 2 1 2 1
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
a int64
b int64
c int32
d int64
e int64
dtype: object
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36743563%2fpreserve-dataframe-column-data-type-after-outer-merge%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:
import pandas as pd
# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])
# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
# join the dataframe
joined = df.join(df2, how='outer')
# columns with nans are now float dtype
print joined.dtypes
# replace nans with suitable int value
joined.fillna(-1, inplace=True)
# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))
print joined_orig_types.dtypes
add a comment |
I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:
import pandas as pd
# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])
# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
# join the dataframe
joined = df.join(df2, how='outer')
# columns with nans are now float dtype
print joined.dtypes
# replace nans with suitable int value
joined.fillna(-1, inplace=True)
# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))
print joined_orig_types.dtypes
add a comment |
I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:
import pandas as pd
# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])
# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
# join the dataframe
joined = df.join(df2, how='outer')
# columns with nans are now float dtype
print joined.dtypes
# replace nans with suitable int value
joined.fillna(-1, inplace=True)
# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))
print joined_orig_types.dtypes
I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:
import pandas as pd
# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])
# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
# join the dataframe
joined = df.join(df2, how='outer')
# columns with nans are now float dtype
print joined.dtypes
# replace nans with suitable int value
joined.fillna(-1, inplace=True)
# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))
print joined_orig_types.dtypes
answered Apr 20 '16 at 14:21
humehume
924713
924713
add a comment |
add a comment |
This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.
Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:
import pandas as pd
df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
df2 = pd.DataFrame('d': [1,2], 'e': [True, False])
df2 = df2.astype('int').astype('Int64')
df2.dtypes
#d Int64
#e Int64
#dtype: object
df.join(df2)
# a b c d e
#0 1 1 0 1 1
#1 1 2 1 2 0
#2 1 1 2 NaN NaN
#3 1 2 3 NaN NaN
#4 1 1 4 NaN NaN
#5 1 2 5 NaN NaN
#a int64
#b int64
#c int64
#d Int64
#e Int64
#dtype: object
The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.
With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.
add a comment |
This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.
Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:
import pandas as pd
df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
df2 = pd.DataFrame('d': [1,2], 'e': [True, False])
df2 = df2.astype('int').astype('Int64')
df2.dtypes
#d Int64
#e Int64
#dtype: object
df.join(df2)
# a b c d e
#0 1 1 0 1 1
#1 1 2 1 2 0
#2 1 1 2 NaN NaN
#3 1 2 3 NaN NaN
#4 1 1 4 NaN NaN
#5 1 2 5 NaN NaN
#a int64
#b int64
#c int64
#d Int64
#e Int64
#dtype: object
The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.
With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.
add a comment |
This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.
Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:
import pandas as pd
df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
df2 = pd.DataFrame('d': [1,2], 'e': [True, False])
df2 = df2.astype('int').astype('Int64')
df2.dtypes
#d Int64
#e Int64
#dtype: object
df.join(df2)
# a b c d e
#0 1 1 0 1 1
#1 1 2 1 2 0
#2 1 1 2 NaN NaN
#3 1 2 3 NaN NaN
#4 1 1 4 NaN NaN
#5 1 2 5 NaN NaN
#a int64
#b int64
#c int64
#d Int64
#e Int64
#dtype: object
The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.
With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.
This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.
Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:
import pandas as pd
df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
df2 = pd.DataFrame('d': [1,2], 'e': [True, False])
df2 = df2.astype('int').astype('Int64')
df2.dtypes
#d Int64
#e Int64
#dtype: object
df.join(df2)
# a b c d e
#0 1 1 0 1 1
#1 1 2 1 2 0
#2 1 1 2 NaN NaN
#3 1 2 3 NaN NaN
#4 1 1 4 NaN NaN
#5 1 2 5 NaN NaN
#a int64
#b int64
#c int64
#d Int64
#e Int64
#dtype: object
The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.
With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.
edited Mar 9 at 19:52
answered Mar 9 at 17:49
ALollzALollz
16.2k31838
16.2k31838
add a comment |
add a comment |
Or you can just do a concat/append on dtypes of both dfs and applyastype():
joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
#or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
print(joined)
a b c e d
0 1 1 0 1 1
1 1 2 1 1 2
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
add a comment |
Or you can just do a concat/append on dtypes of both dfs and applyastype():
joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
#or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
print(joined)
a b c e d
0 1 1 0 1 1
1 1 2 1 1 2
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
add a comment |
Or you can just do a concat/append on dtypes of both dfs and applyastype():
joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
#or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
print(joined)
a b c e d
0 1 1 0 1 1
1 1 2 1 1 2
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
Or you can just do a concat/append on dtypes of both dfs and applyastype():
joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
#or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
print(joined)
a b c e d
0 1 1 0 1 1
1 1 2 1 1 2
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
edited Mar 9 at 6:47
answered Mar 9 at 6:39
anky_91anky_91
10.6k2922
10.6k2922
add a comment |
add a comment |
A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
joined = df.join(df2, how='outer')
new_joined = joined.fillna(-1).astype(orig)
print(new_joined)
print(new_joined.dtypes)
Output:
a b c d e
0 1 1 0 1 1
1 1 2 1 2 1
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
a int64
b int64
c int32
d int64
e int64
dtype: object
add a comment |
A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
joined = df.join(df2, how='outer')
new_joined = joined.fillna(-1).astype(orig)
print(new_joined)
print(new_joined.dtypes)
Output:
a b c d e
0 1 1 0 1 1
1 1 2 1 2 1
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
a int64
b int64
c int32
d int64
e int64
dtype: object
add a comment |
A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
joined = df.join(df2, how='outer')
new_joined = joined.fillna(-1).astype(orig)
print(new_joined)
print(new_joined.dtypes)
Output:
a b c d e
0 1 1 0 1 1
1 1 2 1 2 1
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
a int64
b int64
c int32
d int64
e int64
dtype: object
A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())
joined = df.join(df2, how='outer')
new_joined = joined.fillna(-1).astype(orig)
print(new_joined)
print(new_joined.dtypes)
Output:
a b c d e
0 1 1 0 1 1
1 1 2 1 2 1
2 1 1 2 -1 -1
3 1 2 3 -1 -1
4 1 1 4 -1 -1
5 1 2 5 -1 -1
6 1 1 6 -1 -1
7 1 2 7 -1 -1
8 1 1 8 -1 -1
9 1 2 9 -1 -1
a int64
b int64
c int32
d int64
e int64
dtype: object
answered Mar 9 at 3:16
U9-ForwardU9-Forward
17.9k51743
17.9k51743
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36743563%2fpreserve-dataframe-column-data-type-after-outer-merge%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?
– ptrj
Apr 20 '16 at 13:41