delete observations with less than X consecutive datesIs there a simple way to delete a list element by value?Delete an element from a dictionaryDelete a file or folderWhy is reading lines from stdin much slower in C++ than Python?Delete column from pandas DataFrame by column namePython Pandas - replace values with NAN in multiple columns based on mutliple dates?Max & Min Date in Python Pandas DataFramePandas dataframe delete rows if the gap is more than 3 daysDelete non-consecutive values from a dataframe columnHow to group the data by the day-month parts of the Date regardless of the year part?

Taxes on Dividends in a Roth IRA

US tourist/student visa

Has any country ever had 2 former presidents in jail simultaneously?

How to convince somebody that he is fit for something else, but not this job?

How to explain what's wrong with this application of the chain rule?

Does the reader need to like the PoV character?

Can I say "fingers" when referring to toes?

Doesn't the system of the Supreme Court oppose justice?

What to do when eye contact makes your coworker uncomfortable?

What are some good ways to treat frozen vegetables such that they behave like fresh vegetables when stir frying them?

"It doesn't matter" or "it won't matter"?

How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week

When were female captains banned from Starfleet?

Shouldn’t conservatives embrace universal basic income?

Why does this expression simplify as such?

Microchip documentation does not label CAN buss pins on micro controller pinout diagram

How would you translate "more" for use as an interface button?

How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?

Can you use Vicious Mockery to win an argument or gain favours?

PTIJ: Why is Haman obsessed with Bose?

A variation to the phrase "hanging over my shoulders"

Why do ¬, ∀ and ∃ have the same precedence?

The Digit Triangles

Does the Linux kernel need a file system to run?



delete observations with less than X consecutive dates


Is there a simple way to delete a list element by value?Delete an element from a dictionaryDelete a file or folderWhy is reading lines from stdin much slower in C++ than Python?Delete column from pandas DataFrame by column namePython Pandas - replace values with NAN in multiple columns based on mutliple dates?Max & Min Date in Python Pandas DataFramePandas dataframe delete rows if the gap is more than 3 daysDelete non-consecutive values from a dataframe columnHow to group the data by the day-month parts of the Date regardless of the year part?













1















the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.



The starting dataset is



df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)


In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.



I wrote the following code but it has an obvious problem and I can't figure out how to fix it:



df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]


The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).



Any help would be much appreciated. Thank you










share|improve this question


























    1















    the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.



    The starting dataset is



    df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
    "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
    "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
    df.date = pd.to_datetime(df.date, unit='ms')
    df.sort_values(by=["ID", "date"],inplace=True)


    In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.



    I wrote the following code but it has an obvious problem and I can't figure out how to fix it:



    df['delete'] = 0
    for name, group in df.groupby(by = "ID"):
    if group.shape[0] < 3:
    df.loc[df['ID']==name,'delete'] = 1
    df = df.loc[df['delete'] == 0,:]


    The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).



    Any help would be much appreciated. Thank you










    share|improve this question
























      1












      1








      1








      the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.



      The starting dataset is



      df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
      "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
      "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
      df.date = pd.to_datetime(df.date, unit='ms')
      df.sort_values(by=["ID", "date"],inplace=True)


      In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.



      I wrote the following code but it has an obvious problem and I can't figure out how to fix it:



      df['delete'] = 0
      for name, group in df.groupby(by = "ID"):
      if group.shape[0] < 3:
      df.loc[df['ID']==name,'delete'] = 1
      df = df.loc[df['delete'] == 0,:]


      The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).



      Any help would be much appreciated. Thank you










      share|improve this question














      the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.



      The starting dataset is



      df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
      "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
      "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
      df.date = pd.to_datetime(df.date, unit='ms')
      df.sort_values(by=["ID", "date"],inplace=True)


      In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.



      I wrote the following code but it has an obvious problem and I can't figure out how to fix it:



      df['delete'] = 0
      for name, group in df.groupby(by = "ID"):
      if group.shape[0] < 3:
      df.loc[df['ID']==name,'delete'] = 1
      df = df.loc[df['delete'] == 0,:]


      The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).



      Any help would be much appreciated. Thank you







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 7 at 23:49









      cycler10cycler10

      103




      103






















          3 Answers
          3






          active

          oldest

          votes


















          0














          IIUC using diff + cumsum with date column create the group key New, then we just using groupby + filter the unwanted groups



          df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
          yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
          yourdf
          Out[809]:
          ID date variable New
          4 4 2015-01-12 28 1
          5 4 2015-01-13 61 1
          6 4 2015-01-14 62 1





          share|improve this answer






























            0














            I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.



            df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
            "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
            "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
            df.date = pd.to_datetime(df.date, unit='ms')
            df.sort_values(by=["ID", "date"],inplace=True)

            df['delete'] = 0
            for name, group in df.groupby(by = "ID"):
            group.set_index('date',inplace=True)

            if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
            df.loc[df['ID']==name,'delete'] = 1
            df = df.loc[df['delete'] == 0,:]





            share|improve this answer






























              0














              df['delete'] = 0
              for name, group in df.groupby(by = "ID"):
              if group.shape[0] != 3:
              df.loc[df['ID']==name,'delete'] = 1
              df = df.loc[df['delete'] == 0,:]


              you may set wrong in if group.shape[0] != 3






              share|improve this answer






















                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054640%2fdelete-observations-with-less-than-x-consecutive-dates%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                0














                IIUC using diff + cumsum with date column create the group key New, then we just using groupby + filter the unwanted groups



                df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
                yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
                yourdf
                Out[809]:
                ID date variable New
                4 4 2015-01-12 28 1
                5 4 2015-01-13 61 1
                6 4 2015-01-14 62 1





                share|improve this answer



























                  0














                  IIUC using diff + cumsum with date column create the group key New, then we just using groupby + filter the unwanted groups



                  df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
                  yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
                  yourdf
                  Out[809]:
                  ID date variable New
                  4 4 2015-01-12 28 1
                  5 4 2015-01-13 61 1
                  6 4 2015-01-14 62 1





                  share|improve this answer

























                    0












                    0








                    0







                    IIUC using diff + cumsum with date column create the group key New, then we just using groupby + filter the unwanted groups



                    df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
                    yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
                    yourdf
                    Out[809]:
                    ID date variable New
                    4 4 2015-01-12 28 1
                    5 4 2015-01-13 61 1
                    6 4 2015-01-14 62 1





                    share|improve this answer













                    IIUC using diff + cumsum with date column create the group key New, then we just using groupby + filter the unwanted groups



                    df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
                    yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
                    yourdf
                    Out[809]:
                    ID date variable New
                    4 4 2015-01-12 28 1
                    5 4 2015-01-13 61 1
                    6 4 2015-01-14 62 1






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 8 at 0:10









                    Wen-BenWen-Ben

                    119k83569




                    119k83569























                        0














                        I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.



                        df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
                        "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
                        "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
                        df.date = pd.to_datetime(df.date, unit='ms')
                        df.sort_values(by=["ID", "date"],inplace=True)

                        df['delete'] = 0
                        for name, group in df.groupby(by = "ID"):
                        group.set_index('date',inplace=True)

                        if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
                        df.loc[df['ID']==name,'delete'] = 1
                        df = df.loc[df['delete'] == 0,:]





                        share|improve this answer



























                          0














                          I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.



                          df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
                          "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
                          "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
                          df.date = pd.to_datetime(df.date, unit='ms')
                          df.sort_values(by=["ID", "date"],inplace=True)

                          df['delete'] = 0
                          for name, group in df.groupby(by = "ID"):
                          group.set_index('date',inplace=True)

                          if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
                          df.loc[df['ID']==name,'delete'] = 1
                          df = df.loc[df['delete'] == 0,:]





                          share|improve this answer

























                            0












                            0








                            0







                            I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.



                            df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
                            "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
                            "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
                            df.date = pd.to_datetime(df.date, unit='ms')
                            df.sort_values(by=["ID", "date"],inplace=True)

                            df['delete'] = 0
                            for name, group in df.groupby(by = "ID"):
                            group.set_index('date',inplace=True)

                            if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
                            df.loc[df['ID']==name,'delete'] = 1
                            df = df.loc[df['delete'] == 0,:]





                            share|improve this answer













                            I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.



                            df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
                            "date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
                            "variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
                            df.date = pd.to_datetime(df.date, unit='ms')
                            df.sort_values(by=["ID", "date"],inplace=True)

                            df['delete'] = 0
                            for name, group in df.groupby(by = "ID"):
                            group.set_index('date',inplace=True)

                            if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
                            df.loc[df['ID']==name,'delete'] = 1
                            df = df.loc[df['delete'] == 0,:]






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Mar 8 at 6:44









                            Giuseppe SalernoGiuseppe Salerno

                            354




                            354





















                                0














                                df['delete'] = 0
                                for name, group in df.groupby(by = "ID"):
                                if group.shape[0] != 3:
                                df.loc[df['ID']==name,'delete'] = 1
                                df = df.loc[df['delete'] == 0,:]


                                you may set wrong in if group.shape[0] != 3






                                share|improve this answer



























                                  0














                                  df['delete'] = 0
                                  for name, group in df.groupby(by = "ID"):
                                  if group.shape[0] != 3:
                                  df.loc[df['ID']==name,'delete'] = 1
                                  df = df.loc[df['delete'] == 0,:]


                                  you may set wrong in if group.shape[0] != 3






                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    df['delete'] = 0
                                    for name, group in df.groupby(by = "ID"):
                                    if group.shape[0] != 3:
                                    df.loc[df['ID']==name,'delete'] = 1
                                    df = df.loc[df['delete'] == 0,:]


                                    you may set wrong in if group.shape[0] != 3






                                    share|improve this answer













                                    df['delete'] = 0
                                    for name, group in df.groupby(by = "ID"):
                                    if group.shape[0] != 3:
                                    df.loc[df['ID']==name,'delete'] = 1
                                    df = df.loc[df['delete'] == 0,:]


                                    you may set wrong in if group.shape[0] != 3







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Mar 8 at 6:59









                                    Tom.chen.kangTom.chen.kang

                                    265




                                    265



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054640%2fdelete-observations-with-less-than-x-consecutive-dates%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

                                        2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived

                                        Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme