Check for Duplicate values and Pull Info to New DataframeAdding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count

How to take photos in burst mode, without vibration?

What is the most common color to indicate the input-field is disabled?

SSH "lag" in LAN on some machines, mixed distros

Is it legal for company to use my work email to pretend I still work there?

Today is the Center

Has there ever been an airliner design involving reducing generator load by installing solar panels?

How to show the equivalence between the regularized regression and their constraint formulas using KKT

In a Spin are Both Wings Stalled?

What is the word for reserving something for yourself before others do?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

Blender 2.8 I can't see vertices, edges or faces in edit mode

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

Watching something be written to a file live with tail

Why is the 'in' operator throwing an error with a string literal instead of logging false?

How can I tell someone that I want to be his or her friend?

How to prevent "they're falling in love" trope

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Can a virus destroy the BIOS of a modern computer?

Does a druid starting with a bow start with no arrows?

1960's book about a plague that kills all white people

Twin primes whose sum is a cube

Fully-Firstable Anagram Sets

Is it possible to run Internet Explorer on OS X El Capitan?

Check for Duplicate values and Pull Info to New Dataframe

Adding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)

What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.

Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.

I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results

cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)

but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?

Basically say I had

accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]

I would want it to cycle through and pull out each repeat from df_data and return a list of them like

repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)

Then I'd like to use that list to pull from df_data['accntnum'], thinking something like

df_repeats = df_data[df_data['accntnum'] isin repeats]]

Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!

asked Mar 8 at 23:40

RiverVal

df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53

Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41

add a comment |

I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)

Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.

I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results

cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)

Basically say I had

accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]

I would want it to cycle through and pull out each repeat from df_data and return a list of them like

repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)

Then I'd like to use that list to pull from df_data['accntnum'], thinking something like

df_repeats = df_data[df_data['accntnum'] isin repeats]]

Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!

asked Mar 8 at 23:40

RiverVal

df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53

Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41

add a comment |

I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)

Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.

I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results

cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)

Basically say I had

accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]

I would want it to cycle through and pull out each repeat from df_data and return a list of them like

repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)

Then I'd like to use that list to pull from df_data['accntnum'], thinking something like

df_repeats = df_data[df_data['accntnum'] isin repeats]]

Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!

asked Mar 8 at 23:40

RiverVal

I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)

Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.

I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results

cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)

Basically say I had

accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]

I would want it to cycle through and pull out each repeat from df_data and return a list of them like

repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)

Then I'd like to use that list to pull from df_data['accntnum'], thinking something like

df_repeats = df_data[df_data['accntnum'] isin repeats]]

Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!

python pandas

asked Mar 8 at 23:40

RiverVal

asked Mar 8 at 23:40

RiverVal

asked Mar 8 at 23:40

RiverVal

asked Mar 8 at 23:40

RiverVal

asked Mar 8 at 23:40

RiverVal

df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53

Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41

add a comment |

df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53

Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41

df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53

Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072498%2fcheck-for-duplicate-values-and-pull-info-to-new-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggtcf

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Thal And Out Agency railway station See also References External links Navigation menuOfficial Web Site of Pakistan RailwaysArchivedOfficial Web Site of Pakistan Railwayseeexpanding ite