Preserve Dataframe column data type after outer mergeSelecting multiple columns in a pandas dataframeNumPy or Pandas: Keeping array type as integer while having a NaN valueAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column name“Large data” work flows using pandasChange data type of columns in PandasSelect rows from a DataFrame based on values in a column in pandasConverting Pandas Dataframe typesGet list from pandas DataFrame column headers

The use of multiple foreign keys on same column in SQL Server

Why are weather verbs 曇る and 晴れる treated differently in this sentence?

How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?

Have astronauts in space suits ever taken selfies? If so, how?

Which models of the Boeing 737 are still in production?

Is it possible to rebuild the bike frame (to make it lighter) by welding aluminum tubes

I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

What would happen to a modern skyscraper if it rains micro blackholes?

Email Account under attack (really) - anything I can do?

Modeling an IPv4 Address

"You are your self first supporter", a more proper way to say it

Does the fruit of Mantra Japa automatically go to Indra if Japa Samarpana Mantra is not chanted?

Languages that we cannot (dis)prove to be Context-Free

can i play a electric guitar through a bass amp?

How is it possible to have an ability score that is less than 3?

Animated Series: Alien black spider robot crashes on Earth

If two metric spaces are topologically equivalent imply that they are complete?

How does strength of boric acid solution increase in presence of salicylic acid?

Is this a crack on the carbon frame?

How do you know if an analog film camera is still working?

GPS Rollover on Android Smartphones

TGV timetables / schedules?

How does one intimidate enemies without having the capacity for violence?



Preserve Dataframe column data type after outer merge


Selecting multiple columns in a pandas dataframeNumPy or Pandas: Keeping array type as integer while having a NaN valueAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column name“Large data” work flows using pandasChange data type of columns in PandasSelect rows from a DataFrame based on values in a column in pandasConverting Pandas Dataframe typesGet list from pandas DataFrame column headers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








15















When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.



fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?



Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.



Is there a workaround to this?










share|improve this question
























  • I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

    – ptrj
    Apr 20 '16 at 13:41

















15















When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.



fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?



Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.



Is there a workaround to this?










share|improve this question
























  • I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

    – ptrj
    Apr 20 '16 at 13:41













15












15








15


6






When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.



fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?



Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.



Is there a workaround to this?










share|improve this question
















When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.



fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?



Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.



Is there a workaround to this?







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 10 at 9:54









Max Shvartsman

92




92










asked Apr 20 '16 at 12:18









JeffJeff

156113




156113












  • I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

    – ptrj
    Apr 20 '16 at 13:41

















  • I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

    – ptrj
    Apr 20 '16 at 13:41
















I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

– ptrj
Apr 20 '16 at 13:41





I think some example would help clarify what you want to achieve. Sometimes you can't change a type back, for example from float to int, because an int column can't contain NaN. And if all NaNs are immediately dropped, then why to use 'outer'?

– ptrj
Apr 20 '16 at 13:41












4 Answers
4






active

oldest

votes


















4














I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:



import pandas as pd

# all types are originally ints
df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])

# track the original dtypes
orig = df.dtypes.to_dict()
orig.update(df2.dtypes.to_dict())

# join the dataframe
joined = df.join(df2, how='outer')

# columns with nans are now float dtype
print joined.dtypes

# replace nans with suitable int value
joined.fillna(-1, inplace=True)

# re-cast the columns as their original dtype
joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))

print joined_orig_types.dtypes





share|improve this answer






























    4





    +50









    This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.



    Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:



    import pandas as pd

    df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
    df2 = pd.DataFrame('d': [1,2], 'e': [True, False])

    df2 = df2.astype('int').astype('Int64')
    df2.dtypes
    #d Int64
    #e Int64
    #dtype: object

    df.join(df2)
    # a b c d e
    #0 1 1 0 1 1
    #1 1 2 1 2 0
    #2 1 1 2 NaN NaN
    #3 1 2 3 NaN NaN
    #4 1 1 4 NaN NaN
    #5 1 2 5 NaN NaN

    #a int64
    #b int64
    #c int64
    #d Int64
    #e Int64
    #dtype: object



    The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.



    With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.






    share|improve this answer
































      2














      Or you can just do a concat/append on dtypes of both dfs and applyastype():



      joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
      #or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
      print(joined)

      a b c e d
      0 1 1 0 1 1
      1 1 2 1 1 2
      2 1 1 2 -1 -1
      3 1 2 3 -1 -1
      4 1 1 4 -1 -1
      5 1 2 5 -1 -1
      6 1 1 6 -1 -1
      7 1 2 7 -1 -1
      8 1 1 8 -1 -1
      9 1 2 9 -1 -1





      share|improve this answer
































        0














        A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:



        orig = df.dtypes.to_dict()
        orig.update(df2.dtypes.to_dict())
        joined = df.join(df2, how='outer')
        new_joined = joined.fillna(-1).astype(orig)
        print(new_joined)
        print(new_joined.dtypes)


        Output:



         a b c d e
        0 1 1 0 1 1
        1 1 2 1 2 1
        2 1 1 2 -1 -1
        3 1 2 3 -1 -1
        4 1 1 4 -1 -1
        5 1 2 5 -1 -1
        6 1 1 6 -1 -1
        7 1 2 7 -1 -1
        8 1 1 8 -1 -1
        9 1 2 9 -1 -1
        a int64
        b int64
        c int32
        d int64
        e int64
        dtype: object





        share|improve this answer























          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36743563%2fpreserve-dataframe-column-data-type-after-outer-merge%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:



          import pandas as pd

          # all types are originally ints
          df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
          df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])

          # track the original dtypes
          orig = df.dtypes.to_dict()
          orig.update(df2.dtypes.to_dict())

          # join the dataframe
          joined = df.join(df2, how='outer')

          # columns with nans are now float dtype
          print joined.dtypes

          # replace nans with suitable int value
          joined.fillna(-1, inplace=True)

          # re-cast the columns as their original dtype
          joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))

          print joined_orig_types.dtypes





          share|improve this answer



























            4














            I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:



            import pandas as pd

            # all types are originally ints
            df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
            df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])

            # track the original dtypes
            orig = df.dtypes.to_dict()
            orig.update(df2.dtypes.to_dict())

            # join the dataframe
            joined = df.join(df2, how='outer')

            # columns with nans are now float dtype
            print joined.dtypes

            # replace nans with suitable int value
            joined.fillna(-1, inplace=True)

            # re-cast the columns as their original dtype
            joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))

            print joined_orig_types.dtypes





            share|improve this answer

























              4












              4








              4







              I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:



              import pandas as pd

              # all types are originally ints
              df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
              df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])

              # track the original dtypes
              orig = df.dtypes.to_dict()
              orig.update(df2.dtypes.to_dict())

              # join the dataframe
              joined = df.join(df2, how='outer')

              # columns with nans are now float dtype
              print joined.dtypes

              # replace nans with suitable int value
              joined.fillna(-1, inplace=True)

              # re-cast the columns as their original dtype
              joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))

              print joined_orig_types.dtypes





              share|improve this answer













              I don't think there's any really elegant/efficient way to do it. You could do it by tracking the original datatypes and then casting the columns after the merge, like this:



              import pandas as pd

              # all types are originally ints
              df = pd.DataFrame('a': [1]*10, 'b': [1, 2] * 5, 'c': range(10))
              df2 = pd.DataFrame('e': [1, 1], 'd': [1, 2])

              # track the original dtypes
              orig = df.dtypes.to_dict()
              orig.update(df2.dtypes.to_dict())

              # join the dataframe
              joined = df.join(df2, how='outer')

              # columns with nans are now float dtype
              print joined.dtypes

              # replace nans with suitable int value
              joined.fillna(-1, inplace=True)

              # re-cast the columns as their original dtype
              joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name]))

              print joined_orig_types.dtypes






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 20 '16 at 14:21









              humehume

              924713




              924713























                  4





                  +50









                  This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.



                  Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:



                  import pandas as pd

                  df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
                  df2 = pd.DataFrame('d': [1,2], 'e': [True, False])

                  df2 = df2.astype('int').astype('Int64')
                  df2.dtypes
                  #d Int64
                  #e Int64
                  #dtype: object

                  df.join(df2)
                  # a b c d e
                  #0 1 1 0 1 1
                  #1 1 2 1 2 0
                  #2 1 1 2 NaN NaN
                  #3 1 2 3 NaN NaN
                  #4 1 1 4 NaN NaN
                  #5 1 2 5 NaN NaN

                  #a int64
                  #b int64
                  #c int64
                  #d Int64
                  #e Int64
                  #dtype: object



                  The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.



                  With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.






                  share|improve this answer





























                    4





                    +50









                    This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.



                    Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:



                    import pandas as pd

                    df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
                    df2 = pd.DataFrame('d': [1,2], 'e': [True, False])

                    df2 = df2.astype('int').astype('Int64')
                    df2.dtypes
                    #d Int64
                    #e Int64
                    #dtype: object

                    df.join(df2)
                    # a b c d e
                    #0 1 1 0 1 1
                    #1 1 2 1 2 0
                    #2 1 1 2 NaN NaN
                    #3 1 2 3 NaN NaN
                    #4 1 1 4 NaN NaN
                    #5 1 2 5 NaN NaN

                    #a int64
                    #b int64
                    #c int64
                    #d Int64
                    #e Int64
                    #dtype: object



                    The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.



                    With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.






                    share|improve this answer



























                      4





                      +50







                      4





                      +50



                      4




                      +50





                      This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.



                      Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:



                      import pandas as pd

                      df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
                      df2 = pd.DataFrame('d': [1,2], 'e': [True, False])

                      df2 = df2.astype('int').astype('Int64')
                      df2.dtypes
                      #d Int64
                      #e Int64
                      #dtype: object

                      df.join(df2)
                      # a b c d e
                      #0 1 1 0 1 1
                      #1 1 2 1 2 0
                      #2 1 1 2 NaN NaN
                      #3 1 2 3 NaN NaN
                      #4 1 1 4 NaN NaN
                      #5 1 2 5 NaN NaN

                      #a int64
                      #b int64
                      #c int64
                      #d Int64
                      #e Int64
                      #dtype: object



                      The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.



                      With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.






                      share|improve this answer















                      This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.



                      Because of this, I'd recommend using the new Int64 type for your integer or bool columns, which is capable of stroring NaN. For Booleans, they need to be converted to 1 or 0 instead of True or False, then to Int64. You should do this for all int and bool columns before the join, but I'll just illustrate on df2 whose columns get NaN rows after the join:



                      import pandas as pd

                      df = pd.DataFrame('a': [1]*6, 'b': [1, 2]*3, 'c': range(6))
                      df2 = pd.DataFrame('d': [1,2], 'e': [True, False])

                      df2 = df2.astype('int').astype('Int64')
                      df2.dtypes
                      #d Int64
                      #e Int64
                      #dtype: object

                      df.join(df2)
                      # a b c d e
                      #0 1 1 0 1 1
                      #1 1 2 1 2 0
                      #2 1 1 2 NaN NaN
                      #3 1 2 3 NaN NaN
                      #4 1 1 4 NaN NaN
                      #5 1 2 5 NaN NaN

                      #a int64
                      #b int64
                      #c int64
                      #d Int64
                      #e Int64
                      #dtype: object



                      The benefit here is that nothing will be upcast until it needs to. For instance, in the other solutions if you do .fillna(-1.72) you may get an unwanted answer as you call int(-1.72) which then coerces the fill value to -1. This could be useful in some situations, but dangerous in others.



                      With Int64 the fill value remains true to what you specify and the column is only upcast if you fill with a non-int. Also it will not throw an error if you do something like .fillna('Missing'), as it never tries to typecast a string to an int.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Mar 9 at 19:52

























                      answered Mar 9 at 17:49









                      ALollzALollz

                      16.2k31838




                      16.2k31838





















                          2














                          Or you can just do a concat/append on dtypes of both dfs and applyastype():



                          joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
                          #or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
                          print(joined)

                          a b c e d
                          0 1 1 0 1 1
                          1 1 2 1 1 2
                          2 1 1 2 -1 -1
                          3 1 2 3 -1 -1
                          4 1 1 4 -1 -1
                          5 1 2 5 -1 -1
                          6 1 1 6 -1 -1
                          7 1 2 7 -1 -1
                          8 1 1 8 -1 -1
                          9 1 2 9 -1 -1





                          share|improve this answer





























                            2














                            Or you can just do a concat/append on dtypes of both dfs and applyastype():



                            joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
                            #or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
                            print(joined)

                            a b c e d
                            0 1 1 0 1 1
                            1 1 2 1 1 2
                            2 1 1 2 -1 -1
                            3 1 2 3 -1 -1
                            4 1 1 4 -1 -1
                            5 1 2 5 -1 -1
                            6 1 1 6 -1 -1
                            7 1 2 7 -1 -1
                            8 1 1 8 -1 -1
                            9 1 2 9 -1 -1





                            share|improve this answer



























                              2












                              2








                              2







                              Or you can just do a concat/append on dtypes of both dfs and applyastype():



                              joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
                              #or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
                              print(joined)

                              a b c e d
                              0 1 1 0 1 1
                              1 1 2 1 1 2
                              2 1 1 2 -1 -1
                              3 1 2 3 -1 -1
                              4 1 1 4 -1 -1
                              5 1 2 5 -1 -1
                              6 1 1 6 -1 -1
                              7 1 2 7 -1 -1
                              8 1 1 8 -1 -1
                              9 1 2 9 -1 -1





                              share|improve this answer















                              Or you can just do a concat/append on dtypes of both dfs and applyastype():



                              joined = df.join(df2, how='outer').fillna(-1).astype(pd.concat([df.dtypes,df2.dtypes]))
                              #or joined = df.join(df2, how='outer').fillna(-1).astype(df.dtypes.append(df2.dtypes))
                              print(joined)

                              a b c e d
                              0 1 1 0 1 1
                              1 1 2 1 1 2
                              2 1 1 2 -1 -1
                              3 1 2 3 -1 -1
                              4 1 1 4 -1 -1
                              5 1 2 5 -1 -1
                              6 1 1 6 -1 -1
                              7 1 2 7 -1 -1
                              8 1 1 8 -1 -1
                              9 1 2 9 -1 -1






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Mar 9 at 6:47

























                              answered Mar 9 at 6:39









                              anky_91anky_91

                              10.6k2922




                              10.6k2922





















                                  0














                                  A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:



                                  orig = df.dtypes.to_dict()
                                  orig.update(df2.dtypes.to_dict())
                                  joined = df.join(df2, how='outer')
                                  new_joined = joined.fillna(-1).astype(orig)
                                  print(new_joined)
                                  print(new_joined.dtypes)


                                  Output:



                                   a b c d e
                                  0 1 1 0 1 1
                                  1 1 2 1 2 1
                                  2 1 1 2 -1 -1
                                  3 1 2 3 -1 -1
                                  4 1 1 4 -1 -1
                                  5 1 2 5 -1 -1
                                  6 1 1 6 -1 -1
                                  7 1 2 7 -1 -1
                                  8 1 1 8 -1 -1
                                  9 1 2 9 -1 -1
                                  a int64
                                  b int64
                                  c int32
                                  d int64
                                  e int64
                                  dtype: object





                                  share|improve this answer



























                                    0














                                    A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:



                                    orig = df.dtypes.to_dict()
                                    orig.update(df2.dtypes.to_dict())
                                    joined = df.join(df2, how='outer')
                                    new_joined = joined.fillna(-1).astype(orig)
                                    print(new_joined)
                                    print(new_joined.dtypes)


                                    Output:



                                     a b c d e
                                    0 1 1 0 1 1
                                    1 1 2 1 2 1
                                    2 1 1 2 -1 -1
                                    3 1 2 3 -1 -1
                                    4 1 1 4 -1 -1
                                    5 1 2 5 -1 -1
                                    6 1 1 6 -1 -1
                                    7 1 2 7 -1 -1
                                    8 1 1 8 -1 -1
                                    9 1 2 9 -1 -1
                                    a int64
                                    b int64
                                    c int32
                                    d int64
                                    e int64
                                    dtype: object





                                    share|improve this answer

























                                      0












                                      0








                                      0







                                      A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:



                                      orig = df.dtypes.to_dict()
                                      orig.update(df2.dtypes.to_dict())
                                      joined = df.join(df2, how='outer')
                                      new_joined = joined.fillna(-1).astype(orig)
                                      print(new_joined)
                                      print(new_joined.dtypes)


                                      Output:



                                       a b c d e
                                      0 1 1 0 1 1
                                      1 1 2 1 2 1
                                      2 1 1 2 -1 -1
                                      3 1 2 3 -1 -1
                                      4 1 1 4 -1 -1
                                      5 1 2 5 -1 -1
                                      6 1 1 6 -1 -1
                                      7 1 2 7 -1 -1
                                      8 1 1 8 -1 -1
                                      9 1 2 9 -1 -1
                                      a int64
                                      b int64
                                      c int32
                                      d int64
                                      e int64
                                      dtype: object





                                      share|improve this answer













                                      A simpler version of @hume's answer, directly get the original types, then use astype with one shot and get the data-types back, here is the code:



                                      orig = df.dtypes.to_dict()
                                      orig.update(df2.dtypes.to_dict())
                                      joined = df.join(df2, how='outer')
                                      new_joined = joined.fillna(-1).astype(orig)
                                      print(new_joined)
                                      print(new_joined.dtypes)


                                      Output:



                                       a b c d e
                                      0 1 1 0 1 1
                                      1 1 2 1 2 1
                                      2 1 1 2 -1 -1
                                      3 1 2 3 -1 -1
                                      4 1 1 4 -1 -1
                                      5 1 2 5 -1 -1
                                      6 1 1 6 -1 -1
                                      7 1 2 7 -1 -1
                                      8 1 1 8 -1 -1
                                      9 1 2 9 -1 -1
                                      a int64
                                      b int64
                                      c int32
                                      d int64
                                      e int64
                                      dtype: object






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Mar 9 at 3:16









                                      U9-ForwardU9-Forward

                                      17.9k51743




                                      17.9k51743



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f36743563%2fpreserve-dataframe-column-data-type-after-outer-merge%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

                                          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

                                          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229