Python pandas groupby object apply method duplicates first group The Next CEO of Stack OverflowIs Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?Strange groupby behaviour in pandasWhy pandas group by call a function for a group twice?pandas dataframe: groupby will give duplicate groupbypandas groupby output not correct?Apply a algorithm by group to populate a new data frameStrange behaviour in pandas.groupbywhy df.groupby().apply() calculate the first group twiceWhy is pandas.DataFrame().groupby().apply(func) called more often than there are groups?Applying a function to a pandas groupby object result in the function running more time then there are groupsSelect first row in each GROUP BY group?Converting a Pandas GroupBy object to DataFrameAdding new column to existing DataFrame in Python pandasreturning aggregated dataframe from pandas groupbyIs it possible to do applymap using the groupby in pandas?pandas - create key value pair from grouped by data frameHow do I convert upgrouped Data to Grouped Data in Pandaspandas - Apply mean to a specific row in grouped dataframeHow to get the first group in a groupby of multiple columns?Pandas data frame: groupby and plot with two different columns
Decide between Polyglossia and Babel for LuaLaTeX in 2019
IC has pull-down resistors on SMBus lines?
Is it professional to write unrelated content in an almost-empty email?
What happened in Rome, when the western empire "fell"?
Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?
If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?
Which one is the true statement?
Physiological effects of huge anime eyes
Inexact numbers as keys in Association?
Is there a reasonable and studied concept of reduction between regular languages?
Can someone explain this formula for calculating Manhattan distance?
Does Germany produce more waste than the US?
Airplane gently rocking its wings during whole flight
free fall ellipse or parabola?
My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?
Easy to read palindrome checker
Would a grinding machine be a simple and workable propulsion system for an interplanetary spacecraft?
How to get the last not-null value in an ordered column of a huge table?
Is there a difference between "Fahrstuhl" and "Aufzug"?
How to find image of a complex function with given constraints?
Where do students learn to solve polynomial equations these days?
What difference does it make using sed with/without whitespaces?
What day is it again?
Is it okay to majorly distort historical facts while writing a fiction story?
Python pandas groupby object apply method duplicates first group
The Next CEO of Stack OverflowIs Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?Strange groupby behaviour in pandasWhy pandas group by call a function for a group twice?pandas dataframe: groupby will give duplicate groupbypandas groupby output not correct?Apply a algorithm by group to populate a new data frameStrange behaviour in pandas.groupbywhy df.groupby().apply() calculate the first group twiceWhy is pandas.DataFrame().groupby().apply(func) called more often than there are groups?Applying a function to a pandas groupby object result in the function running more time then there are groupsSelect first row in each GROUP BY group?Converting a Pandas GroupBy object to DataFrameAdding new column to existing DataFrame in Python pandasreturning aggregated dataframe from pandas groupbyIs it possible to do applymap using the groupby in pandas?pandas - create key value pair from grouped by data frameHow do I convert upgrouped Data to Grouped Data in Pandaspandas - Apply mean to a specific row in grouped dataframeHow to get the first group in a groupby of multiple columns?Pandas data frame: groupby and plot with two different columns
My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
I first check that the groupby function works ok, and it seems to be fine:
>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)
Then I try to do something similar using apply on the groupby object and I get the first row output twice:
>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
class count
0 A 1
class count
0 A 1
class count
1 B 0
class count
2 C 2
Any help would be appreciated! Thanks.
Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:
>>> def addone(group):
>>> group['count'] += 1
>>> return group
>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
But by assigning the return of the method to a new object, we see that it works as expected:
df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)
class count
0 A 2
1 B 1
2 C 3
python-2.7 pandas group-by
|
show 4 more comments
My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
I first check that the groupby function works ok, and it seems to be fine:
>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)
Then I try to do something similar using apply on the groupby object and I get the first row output twice:
>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
class count
0 A 1
class count
0 A 1
class count
1 B 0
class count
2 C 2
Any help would be appreciated! Thanks.
Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:
>>> def addone(group):
>>> group['count'] += 1
>>> return group
>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
But by assigning the return of the method to a new object, we see that it works as expected:
df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)
class count
0 A 2
1 B 1
2 C 3
python-2.7 pandas group-by
10
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54
|
show 4 more comments
My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
I first check that the groupby function works ok, and it seems to be fine:
>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)
Then I try to do something similar using apply on the groupby object and I get the first row output twice:
>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
class count
0 A 1
class count
0 A 1
class count
1 B 0
class count
2 C 2
Any help would be appreciated! Thanks.
Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:
>>> def addone(group):
>>> group['count'] += 1
>>> return group
>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
But by assigning the return of the method to a new object, we see that it works as expected:
df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)
class count
0 A 2
1 B 1
2 C 3
python-2.7 pandas group-by
My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:
>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
I first check that the groupby function works ok, and it seems to be fine:
>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)
Then I try to do something similar using apply on the groupby object and I get the first row output twice:
>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
class count
0 A 1
class count
0 A 1
class count
1 B 0
class count
2 C 2
Any help would be appreciated! Thanks.
Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:
>>> def addone(group):
>>> group['count'] += 1
>>> return group
>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)
class count
0 A 1
1 B 0
2 C 2
But by assigning the return of the method to a new object, we see that it works as expected:
df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)
class count
0 A 2
1 B 1
2 C 3
python-2.7 pandas group-by
python-2.7 pandas group-by
edited Jun 17 '16 at 12:48
Merlin
8,9912880159
8,9912880159
asked Jan 27 '14 at 19:37
NC maize breeding JimNC maize breeding Jim
27848
27848
10
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54
|
show 4 more comments
10
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54
10
10
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54
|
show 4 more comments
2 Answers
2
active
oldest
votes
This is by design, as described here and here
The apply
function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit
in your case) twice to achieve this.
Depending on your actual use case, you can replace the call to apply
with aggregate
, transform
or filter
, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.
However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.
add a comment |
you can use for loop to avoid the groupby.apply duplicate first row,
log_sample.csv
guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null
my code snippit
df=pd.read_csv("log_sample.csv")
grouped = df.groupby("guestid")
for guestid, df_group in grouped:
print(list(df_group['guestid']))
df.head(100)
output
[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]
add a comment |
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21390035%2fpython-pandas-groupby-object-apply-method-duplicates-first-group%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is by design, as described here and here
The apply
function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit
in your case) twice to achieve this.
Depending on your actual use case, you can replace the call to apply
with aggregate
, transform
or filter
, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.
However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.
add a comment |
This is by design, as described here and here
The apply
function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit
in your case) twice to achieve this.
Depending on your actual use case, you can replace the call to apply
with aggregate
, transform
or filter
, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.
However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.
add a comment |
This is by design, as described here and here
The apply
function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit
in your case) twice to achieve this.
Depending on your actual use case, you can replace the call to apply
with aggregate
, transform
or filter
, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.
However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.
This is by design, as described here and here
The apply
function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit
in your case) twice to achieve this.
Depending on your actual use case, you can replace the call to apply
with aggregate
, transform
or filter
, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.
However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.
answered Sep 8 '14 at 1:39
ZeroZero
4,57943660
4,57943660
add a comment |
add a comment |
you can use for loop to avoid the groupby.apply duplicate first row,
log_sample.csv
guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null
my code snippit
df=pd.read_csv("log_sample.csv")
grouped = df.groupby("guestid")
for guestid, df_group in grouped:
print(list(df_group['guestid']))
df.head(100)
output
[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]
add a comment |
you can use for loop to avoid the groupby.apply duplicate first row,
log_sample.csv
guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null
my code snippit
df=pd.read_csv("log_sample.csv")
grouped = df.groupby("guestid")
for guestid, df_group in grouped:
print(list(df_group['guestid']))
df.head(100)
output
[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]
add a comment |
you can use for loop to avoid the groupby.apply duplicate first row,
log_sample.csv
guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null
my code snippit
df=pd.read_csv("log_sample.csv")
grouped = df.groupby("guestid")
for guestid, df_group in grouped:
print(list(df_group['guestid']))
df.head(100)
output
[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]
you can use for loop to avoid the groupby.apply duplicate first row,
log_sample.csv
guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null
my code snippit
df=pd.read_csv("log_sample.csv")
grouped = df.groupby("guestid")
for guestid, df_group in grouped:
print(list(df_group['guestid']))
df.head(100)
output
[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]
answered Apr 4 '18 at 3:17
geosmartgeosmart
587
587
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21390035%2fpython-pandas-groupby-object-apply-method-duplicates-first-group%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
10
This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.
– Jeff
Jan 27 '14 at 19:40
@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)
– unutbu
Jan 27 '14 at 19:43
@Jeff: Or maybe the function could be wrapped in a memoizer...
– unutbu
Jan 27 '14 at 19:48
its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !
– Jeff
Jan 27 '14 at 19:51
Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]
– DSM
Jan 27 '14 at 19:54