Python pandas groupby object apply method duplicates first group The Next CEO of Stack OverflowIs Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?Strange groupby behaviour in pandasWhy pandas group by call a function for a group twice?pandas dataframe: groupby will give duplicate groupbypandas groupby output not correct?Apply a algorithm by group to populate a new data frameStrange behaviour in pandas.groupbywhy df.groupby().apply() calculate the first group twiceWhy is pandas.DataFrame().groupby().apply(func) called more often than there are groups?Applying a function to a pandas groupby object result in the function running more time then there are groupsSelect first row in each GROUP BY group?Converting a Pandas GroupBy object to DataFrameAdding new column to existing DataFrame in Python pandasreturning aggregated dataframe from pandas groupbyIs it possible to do applymap using the groupby in pandas?pandas - create key value pair from grouped by data frameHow do I convert upgrouped Data to Grouped Data in Pandaspandas - Apply mean to a specific row in grouped dataframeHow to get the first group in a groupby of multiple columns?Pandas data frame: groupby and plot with two different columns

Decide between Polyglossia and Babel for LuaLaTeX in 2019

IC has pull-down resistors on SMBus lines?

Is it professional to write unrelated content in an almost-empty email?

What happened in Rome, when the western empire "fell"?

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

Which one is the true statement?

Physiological effects of huge anime eyes

Inexact numbers as keys in Association?

Is there a reasonable and studied concept of reduction between regular languages?

Can someone explain this formula for calculating Manhattan distance?

Does Germany produce more waste than the US?

Airplane gently rocking its wings during whole flight

free fall ellipse or parabola?

My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?

Easy to read palindrome checker

Would a grinding machine be a simple and workable propulsion system for an interplanetary spacecraft?

How to get the last not-null value in an ordered column of a huge table?

Is there a difference between "Fahrstuhl" and "Aufzug"?

How to find image of a complex function with given constraints?

Where do students learn to solve polynomial equations these days?

What difference does it make using sed with/without whitespaces?

What day is it again?

Is it okay to majorly distort historical facts while writing a fiction story?

Python pandas groupby object apply method duplicates first group

The Next CEO of Stack OverflowIs Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?Strange groupby behaviour in pandasWhy pandas group by call a function for a group twice?pandas dataframe: groupby will give duplicate groupbypandas groupby output not correct?Apply a algorithm by group to populate a new data frameStrange behaviour in pandas.groupbywhy df.groupby().apply() calculate the first group twiceWhy is pandas.DataFrame().groupby().apply(func) called more often than there are groups?Applying a function to a pandas groupby object result in the function running more time then there are groupsSelect first row in each GROUP BY group?Converting a Pandas GroupBy object to DataFrameAdding new column to existing DataFrame in Python pandasreturning aggregated dataframe from pandas groupbyIs it possible to do applymap using the groupby in pandas?pandas - create key value pair from grouped by data frameHow do I convert upgrouped Data to Grouped Data in Pandaspandas - Apply mean to a specific row in grouped dataframeHow to get the first group in a groupby of multiple columns?Pandas data frame: groupby and plot with two different columns

My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
 class count 
0 A 1 
1 B 0 
2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
 class count
0 A 1
 class count
0 A 1
 class count
1 B 0
 class count
2 C 2

Any help would be appreciated! Thanks.

Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:

>>> def addone(group):
>>> group['count'] += 1
>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

 class count
0 A 1
1 B 0
2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)

 class count
0 A 2
1 B 1
2 C 3

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

10

This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.

– Jeff
Jan 27 '14 at 19:40

@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)

– unutbu
Jan 27 '14 at 19:43

@Jeff: Or maybe the function could be wrapped in a memoizer...

– unutbu
Jan 27 '14 at 19:48

its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !

– Jeff
Jan 27 '14 at 19:51

Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]

– DSM
Jan 27 '14 at 19:54

|
show 4 more comments

My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
 class count 
0 A 1 
1 B 0 
2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
 class count
0 A 1
 class count
0 A 1
 class count
1 B 0
 class count
2 C 2

Any help would be appreciated! Thanks.

>>> def addone(group):
>>> group['count'] += 1
>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

 class count
0 A 1
1 B 0
2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)

 class count
0 A 2
1 B 1
2 C 3

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

10

This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.

– Jeff
Jan 27 '14 at 19:40

@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)

– unutbu
Jan 27 '14 at 19:43

@Jeff: Or maybe the function could be wrapped in a memoizer...

– unutbu
Jan 27 '14 at 19:48

its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !

– Jeff
Jan 27 '14 at 19:51

Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]

– DSM
Jan 27 '14 at 19:54

|
show 4 more comments

My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
 class count 
0 A 1 
1 B 0 
2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
 class count
0 A 1
 class count
0 A 1
 class count
1 B 0
 class count
2 C 2

Any help would be appreciated! Thanks.

>>> def addone(group):
>>> group['count'] += 1
>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

 class count
0 A 1
1 B 0
2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)

 class count
0 A 2
1 B 1
2 C 3

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

My first SO question:
I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame('class': ['A', 'B', 'C'], 'count':[1,0,2])
>>> print(df)
 class count 
0 A 1 
1 B 0 
2 C 2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>> print(group)
('A', class count
0 A 1)
('B', class count
1 B 0)
('C', class count
2 C 2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>> print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
 class count
0 A 1
 class count
0 A 1
 class count
1 B 0
 class count
2 C 2

Any help would be appreciated! Thanks.

>>> def addone(group):
>>> group['count'] += 1
>>> return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

 class count
0 A 1
1 B 0
2 C 2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone)
print(df2)

 class count
0 A 2
1 B 1
2 C 3

python-2.7 pandas group-by

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

edited Jun 17 '16 at 12:48

Merlin

8,9912880159

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

asked Jan 27 '14 at 19:37

NC maize breeding Jim

27848

10

This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.

– Jeff
Jan 27 '14 at 19:40

@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)

– unutbu
Jan 27 '14 at 19:43

@Jeff: Or maybe the function could be wrapped in a memoizer...

– unutbu
Jan 27 '14 at 19:48

its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !

– Jeff
Jan 27 '14 at 19:51

Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]

– DSM
Jan 27 '14 at 19:54

|
show 4 more comments

10

This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.

– Jeff
Jan 27 '14 at 19:40

@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)

– unutbu
Jan 27 '14 at 19:43

@Jeff: Or maybe the function could be wrapped in a memoizer...

– unutbu
Jan 27 '14 at 19:48

its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !

– Jeff
Jan 27 '14 at 19:51

Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]

– DSM
Jan 27 '14 at 19:54

This is checking whether you are mutating the data in the apply. If you are then it has to take a slower path than otherwise. It doesn't change the results.

– Jeff
Jan 27 '14 at 19:40

@Jeff: Could the result of the first call be saved so it is not called again? This might help if the function called by apply takes a long time... (along with being more intuitive, since this question comes up a lot.)

– unutbu
Jan 27 '14 at 19:43

@Jeff: Or maybe the function could be wrapped in a memoizer...

– unutbu
Jan 27 '14 at 19:48

its actually tricky; the fast-path is in cython (usually), so right now it doesn't pass it back to python space (it could I suppose). transform DOES do this however (where it choses a path and then uses that result to move on ). Its just a little bit tricky in code. Welcome to do a PR !

– Jeff
Jan 27 '14 at 19:51

Wouldn't it make more sense to just bite the bullet and make an explicit mutating/non-mutating parameter, defaulting to non-mutating? [Somewhat silly additional comment deleted, but my first question stands.]

– DSM
Jan 27 '14 at 19:54

|
show 4 more comments

2 Answers
2

active

oldest

votes

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

Depending on your actual use case, you can replace the call to apply with aggregate, transform or filter, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

answered Sep 8 '14 at 1:39

Zero

4,57943660

add a comment |

you can use for loop to avoid the groupby.apply duplicate first row,

log_sample.csv

guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null

my code snippit

df=pd.read_csv("log_sample.csv") 
grouped = df.groupby("guestid")

for guestid, df_group in grouped:
 print(list(df_group['guestid'])) 

df.head(100)

output

[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]

answered Apr 4 '18 at 3:17

geosmart

587

add a comment |

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21390035%2fpython-pandas-groupby-object-apply-method-duplicates-first-group%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

answered Sep 8 '14 at 1:39

Zero

4,57943660

add a comment |

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

answered Sep 8 '14 at 1:39

Zero

4,57943660

add a comment |

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

answered Sep 8 '14 at 1:39

Zero

4,57943660

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

answered Sep 8 '14 at 1:39

Zero

4,57943660

answered Sep 8 '14 at 1:39

Zero

4,57943660

answered Sep 8 '14 at 1:39

Zero

4,57943660

answered Sep 8 '14 at 1:39

Zero

4,57943660

add a comment |

you can use for loop to avoid the groupby.apply duplicate first row,

log_sample.csv

guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null

my code snippit

df=pd.read_csv("log_sample.csv") 
grouped = df.groupby("guestid")

for guestid, df_group in grouped:
 print(list(df_group['guestid'])) 

df.head(100)

output

[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]

answered Apr 4 '18 at 3:17

geosmart

587

add a comment |

you can use for loop to avoid the groupby.apply duplicate first row,

log_sample.csv

guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null

my code snippit

df=pd.read_csv("log_sample.csv") 
grouped = df.groupby("guestid")

for guestid, df_group in grouped:
 print(list(df_group['guestid'])) 

df.head(100)

output

[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]

answered Apr 4 '18 at 3:17

geosmart

587

add a comment |

you can use for loop to avoid the groupby.apply duplicate first row,

log_sample.csv

guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null

my code snippit

df=pd.read_csv("log_sample.csv") 
grouped = df.groupby("guestid")

for guestid, df_group in grouped:
 print(list(df_group['guestid'])) 

df.head(100)

output

[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]

answered Apr 4 '18 at 3:17

geosmart

587

you can use for loop to avoid the groupby.apply duplicate first row,

log_sample.csv

guestid,keyword
1,null
2,null
2,null
3,null
3,null
3,null
4,null
4,null
4,null
4,null

my code snippit

df=pd.read_csv("log_sample.csv") 
grouped = df.groupby("guestid")

for guestid, df_group in grouped:
 print(list(df_group['guestid'])) 

df.head(100)

output

[1]
[2, 2]
[3, 3, 3]
[4, 4, 4, 4]

answered Apr 4 '18 at 3:17

geosmart

587

answered Apr 4 '18 at 3:17

geosmart

587

answered Apr 4 '18 at 3:17

geosmart

587

answered Apr 4 '18 at 3:17

geosmart

587

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

2 Answers
2

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

2 Answers
2

2 Answers
2

2 Answers
2