delete observations with less than X consecutive datesIs there a simple way to delete a list element by value?Delete an element from a dictionaryDelete a file or folderWhy is reading lines from stdin much slower in C++ than Python?Delete column from pandas DataFrame by column namePython Pandas - replace values with NAN in multiple columns based on mutliple dates?Max & Min Date in Python Pandas DataFramePandas dataframe delete rows if the gap is more than 3 daysDelete non-consecutive values from a dataframe columnHow to group the data by the day-month parts of the Date regardless of the year part?
Taxes on Dividends in a Roth IRA
US tourist/student visa
Has any country ever had 2 former presidents in jail simultaneously?
How to convince somebody that he is fit for something else, but not this job?
How to explain what's wrong with this application of the chain rule?
Does the reader need to like the PoV character?
Can I say "fingers" when referring to toes?
Doesn't the system of the Supreme Court oppose justice?
What to do when eye contact makes your coworker uncomfortable?
What are some good ways to treat frozen vegetables such that they behave like fresh vegetables when stir frying them?
"It doesn't matter" or "it won't matter"?
How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week
When were female captains banned from Starfleet?
Shouldn’t conservatives embrace universal basic income?
Why does this expression simplify as such?
Microchip documentation does not label CAN buss pins on micro controller pinout diagram
How would you translate "more" for use as an interface button?
How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?
Can you use Vicious Mockery to win an argument or gain favours?
PTIJ: Why is Haman obsessed with Bose?
A variation to the phrase "hanging over my shoulders"
Why do ¬, ∀ and ∃ have the same precedence?
The Digit Triangles
Does the Linux kernel need a file system to run?
delete observations with less than X consecutive dates
Is there a simple way to delete a list element by value?Delete an element from a dictionaryDelete a file or folderWhy is reading lines from stdin much slower in C++ than Python?Delete column from pandas DataFrame by column namePython Pandas - replace values with NAN in multiple columns based on mutliple dates?Max & Min Date in Python Pandas DataFramePandas dataframe delete rows if the gap is more than 3 daysDelete non-consecutive values from a dataframe columnHow to group the data by the day-month parts of the Date regardless of the year part?
the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.
The starting dataset is
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.
I wrote the following code but it has an obvious problem and I can't figure out how to fix it:
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).
Any help would be much appreciated. Thank you
python pandas
add a comment |
the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.
The starting dataset is
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.
I wrote the following code but it has an obvious problem and I can't figure out how to fix it:
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).
Any help would be much appreciated. Thank you
python pandas
add a comment |
the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.
The starting dataset is
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.
I wrote the following code but it has an obvious problem and I can't figure out how to fix it:
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).
Any help would be much appreciated. Thank you
python pandas
the following dataframe which contains data for the same company (column ID) at different dates (column date). I would like to delete the observations for which there are less than 3 days.
The starting dataset is
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
In the above dataframe, only company with ID = 4 satisfies the requirement and I would like to delete the others.
I wrote the following code but it has an obvious problem and I can't figure out how to fix it:
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
The above code keeps both companies with ID=1 and ID=4; ID=1 should be cancelled because it contains 4 datapoints but at maximum two of them are on consecutive days (while I want to impose a minimum of 3).
Any help would be much appreciated. Thank you
python pandas
python pandas
asked Mar 7 at 23:49
cycler10cycler10
103
103
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
IIUC using diff
+ cumsum
with date
column create the group key New, then we just using groupby
+ filter
the unwanted groups
df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
yourdf
Out[809]:
ID date variable New
4 4 2015-01-12 28 1
5 4 2015-01-13 61 1
6 4 2015-01-14 62 1
add a comment |
I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
group.set_index('date',inplace=True)
if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
add a comment |
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] != 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
you may set wrong in if group.shape[0] != 3
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054640%2fdelete-observations-with-less-than-x-consecutive-dates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
IIUC using diff
+ cumsum
with date
column create the group key New, then we just using groupby
+ filter
the unwanted groups
df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
yourdf
Out[809]:
ID date variable New
4 4 2015-01-12 28 1
5 4 2015-01-13 61 1
6 4 2015-01-14 62 1
add a comment |
IIUC using diff
+ cumsum
with date
column create the group key New, then we just using groupby
+ filter
the unwanted groups
df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
yourdf
Out[809]:
ID date variable New
4 4 2015-01-12 28 1
5 4 2015-01-13 61 1
6 4 2015-01-14 62 1
add a comment |
IIUC using diff
+ cumsum
with date
column create the group key New, then we just using groupby
+ filter
the unwanted groups
df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
yourdf
Out[809]:
ID date variable New
4 4 2015-01-12 28 1
5 4 2015-01-13 61 1
6 4 2015-01-14 62 1
IIUC using diff
+ cumsum
with date
column create the group key New, then we just using groupby
+ filter
the unwanted groups
df['New']=df.groupby('ID').date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
yourdf=df.groupby(['ID','New']).filter(lambda x : len(x)>=3)
yourdf
Out[809]:
ID date variable New
4 4 2015-01-12 28 1
5 4 2015-01-13 61 1
6 4 2015-01-14 62 1
answered Mar 8 at 0:10
Wen-BenWen-Ben
119k83569
119k83569
add a comment |
add a comment |
I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
group.set_index('date',inplace=True)
if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
add a comment |
I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
group.set_index('date',inplace=True)
if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
add a comment |
I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
group.set_index('date',inplace=True)
if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
I think you could replace the "group.shape[0]" applying a moving window of 3 days and count items.
df = pd.DataFrame("ID":"0":1,"1":1,"2":1,"3":1,"4":4,"5":4,"6":4,"7":2,"8":2,"9":3,"10":3,
"date":"0":1421020800000,"1":1421193600000,"2":1422489600000,"3":1423353600000,"4":1421020800000,"5":1421107200000,"6":1421193600000,"7":1421020800000,"8":1421107200000,"9":1421452800000,"10":1421539200000,
"variable":"0":28,"1":62,"2":60,"3":72,"4":28,"5":61,"6":62,"7":23,"8":70,"9":32,"10":55)
df.date = pd.to_datetime(df.date, unit='ms')
df.sort_values(by=["ID", "date"],inplace=True)
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
group.set_index('date',inplace=True)
if group.rolling(window='3D',min_periods=0).count()['delete'].max() < 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
answered Mar 8 at 6:44
Giuseppe SalernoGiuseppe Salerno
354
354
add a comment |
add a comment |
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] != 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
you may set wrong in if group.shape[0] != 3
add a comment |
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] != 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
you may set wrong in if group.shape[0] != 3
add a comment |
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] != 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
you may set wrong in if group.shape[0] != 3
df['delete'] = 0
for name, group in df.groupby(by = "ID"):
if group.shape[0] != 3:
df.loc[df['ID']==name,'delete'] = 1
df = df.loc[df['delete'] == 0,:]
you may set wrong in if group.shape[0] != 3
answered Mar 8 at 6:59
Tom.chen.kangTom.chen.kang
265
265
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55054640%2fdelete-observations-with-less-than-x-consecutive-dates%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown