Check for Duplicate values and Pull Info to New DataframeAdding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count

How to take photos in burst mode, without vibration?

What is the most common color to indicate the input-field is disabled?

SSH "lag" in LAN on some machines, mixed distros

Is it legal for company to use my work email to pretend I still work there?

Today is the Center

Has there ever been an airliner design involving reducing generator load by installing solar panels?

How to show the equivalence between the regularized regression and their constraint formulas using KKT

In a Spin are Both Wings Stalled?

What is the word for reserving something for yourself before others do?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

If human space travel is limited by the G force vulnerability, is there a way to counter G forces?

Blender 2.8 I can't see vertices, edges or faces in edit mode

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

Watching something be written to a file live with tail

Why is the 'in' operator throwing an error with a string literal instead of logging false?

How can I tell someone that I want to be his or her friend?

How to prevent "they're falling in love" trope

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Can a virus destroy the BIOS of a modern computer?

Does a druid starting with a bow start with no arrows?

1960's book about a plague that kills all white people

Twin primes whose sum is a cube

Fully-Firstable Anagram Sets

Is it possible to run Internet Explorer on OS X El Capitan?



Check for Duplicate values and Pull Info to New Dataframe


Adding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)



What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.



Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.







I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results



cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)


but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?







Basically say I had



accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]


I would want it to cycle through and pull out each repeat from df_data and return a list of them like



repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)


Then I'd like to use that list to pull from df_data['accntnum'], thinking something like



df_repeats = df_data[df_data['accntnum'] isin repeats]]





Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!










share|improve this question






















  • df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

    – panktijk
    Mar 8 at 23:53











  • Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

    – RiverVal
    Mar 11 at 22:41


















0















I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)



What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.



Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.







I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results



cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)


but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?







Basically say I had



accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]


I would want it to cycle through and pull out each repeat from df_data and return a list of them like



repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)


Then I'd like to use that list to pull from df_data['accntnum'], thinking something like



df_repeats = df_data[df_data['accntnum'] isin repeats]]





Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!










share|improve this question






















  • df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

    – panktijk
    Mar 8 at 23:53











  • Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

    – RiverVal
    Mar 11 at 22:41














0












0








0








I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)



What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.



Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.







I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results



cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)


but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?







Basically say I had



accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]


I would want it to cycle through and pull out each repeat from df_data and return a list of them like



repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)


Then I'd like to use that list to pull from df_data['accntnum'], thinking something like



df_repeats = df_data[df_data['accntnum'] isin repeats]]





Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!










share|improve this question














I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)



What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.



Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.







I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results



cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)


but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?







Basically say I had



accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]


I would want it to cycle through and pull out each repeat from df_data and return a list of them like



repeats = [0, 2, 5, 7] 
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)


Then I'd like to use that list to pull from df_data['accntnum'], thinking something like



df_repeats = df_data[df_data['accntnum'] isin repeats]]





Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!







python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 23:40









RiverValRiverVal

11




11












  • df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

    – panktijk
    Mar 8 at 23:53











  • Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

    – RiverVal
    Mar 11 at 22:41


















  • df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

    – panktijk
    Mar 8 at 23:53











  • Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

    – RiverVal
    Mar 11 at 22:41

















df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53





df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first') ?

– panktijk
Mar 8 at 23:53













Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41






Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!

– RiverVal
Mar 11 at 22:41













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072498%2fcheck-for-duplicate-values-and-pull-info-to-new-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072498%2fcheck-for-duplicate-values-and-pull-info-to-new-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme