Check for Duplicate values and Pull Info to New DataframeAdding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count
How to take photos in burst mode, without vibration?
What is the most common color to indicate the input-field is disabled?
SSH "lag" in LAN on some machines, mixed distros
Is it legal for company to use my work email to pretend I still work there?
Today is the Center
Has there ever been an airliner design involving reducing generator load by installing solar panels?
How to show the equivalence between the regularized regression and their constraint formulas using KKT
In a Spin are Both Wings Stalled?
What is the word for reserving something for yourself before others do?
Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?
If human space travel is limited by the G force vulnerability, is there a way to counter G forces?
Blender 2.8 I can't see vertices, edges or faces in edit mode
Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?
Watching something be written to a file live with tail
Why is the 'in' operator throwing an error with a string literal instead of logging false?
How can I tell someone that I want to be his or her friend?
How to prevent "they're falling in love" trope
AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?
Can a virus destroy the BIOS of a modern computer?
Does a druid starting with a bow start with no arrows?
1960's book about a plague that kills all white people
Twin primes whose sum is a cube
Fully-Firstable Anagram Sets
Is it possible to run Internet Explorer on OS X El Capitan?
Check for Duplicate values and Pull Info to New Dataframe
Adding new column to existing DataFrame in Python pandasHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasSelect rows from a DataFrame based on values in a column in pandaspandas create new column based on values from other columnsHow to check if any value is NaN in a Pandas DataFrameCounting the number of occurrences of a pandas column without loopingFilter elements from 2 pandas dataframesDataframe column value insert based on timestamp comparisonWeekly count - occurrence of unique column values, and display max count
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)
What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.
Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.
I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results
cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)
but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?
Basically say I had
accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]
I would want it to cycle through and pull out each repeat from df_data and return a list of them like
repeats = [0, 2, 5, 7]
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)
Then I'd like to use that list to pull from df_data['accntnum'], thinking something like
df_repeats = df_data[df_data['accntnum'] isin repeats]]
Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!
python pandas
add a comment |
I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)
What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.
Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.
I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results
cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)
but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?
Basically say I had
accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]
I would want it to cycle through and pull out each repeat from df_data and return a list of them like
repeats = [0, 2, 5, 7]
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)
Then I'd like to use that list to pull from df_data['accntnum'], thinking something like
df_repeats = df_data[df_data['accntnum'] isin repeats]]
Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!
python pandas
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?
– panktijk
Mar 8 at 23:53
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41
add a comment |
I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)
What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.
Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.
I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results
cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)
but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?
Basically say I had
accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]
I would want it to cycle through and pull out each repeat from df_data and return a list of them like
repeats = [0, 2, 5, 7]
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)
Then I'd like to use that list to pull from df_data['accntnum'], thinking something like
df_repeats = df_data[df_data['accntnum'] isin repeats]]
Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!
python pandas
I have a dataframe (df_data) with 14 columns for info over 1 month. I pulled out one week's data (df1) and made a list of all the account numbers there (accounts1)
What I am trying to do is take that accounts1 list and have it go through each value in the list, checking if it is counted more than once in df_data and if so, to save that account number to a new list for repeats only.
Then I want to take that repeats list and pull the 14 columns out of the original df_data so I can have all the rows of all 14 columns for each occurrence of that account number.
I'm getting stuck with the list of repeated account numbers, I used the following code, which seems to have worked to create a list with results
cnt = collections.Counter(accounts1)
repeats.append([k for k, v in cnt.items() if v > 1])
print((repeats).count)
but the amount of elements in that list is right under 3,000. When I used the .unique and checked the difference it should be a little over 5,000. What am I doing wrong? And how can I then use those elements to pull the columns from the original dataframe?
Basically say I had
accounts1 = df1['accntnum'] = [0,1,2,5,8,2,5,0,0,7]
I would want it to cycle through and pull out each repeat from df_data and return a list of them like
repeats = [0, 2, 5, 7]
(There are numbers in the monthly df_data that are in weekly df1 but may not be repeated there yet)
Then I'd like to use that list to pull from df_data['accntnum'], thinking something like
df_repeats = df_data[df_data['accntnum'] isin repeats]]
Oh also, I'm really only interested in the first occurrence of a repeat. There is a date and time column that can help sort those out in the end though. Thank you in advance!
python pandas
python pandas
asked Mar 8 at 23:40
RiverValRiverVal
11
11
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?
– panktijk
Mar 8 at 23:53
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41
add a comment |
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?
– panktijk
Mar 8 at 23:53
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?– panktijk
Mar 8 at 23:53
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?– panktijk
Mar 8 at 23:53
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072498%2fcheck-for-duplicate-values-and-pull-info-to-new-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55072498%2fcheck-for-duplicate-values-and-pull-info-to-new-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
df_data[df_data['acctnum'].isin(repeats)].sort_values(['date', 'time']).drop_duplicates(keep='first')
?– panktijk
Mar 8 at 23:53
Didn't quite work since it only kept the initial occurrence and not the first repetition as well like I needed, but it helped put me on the right track, thank you! I ended up sorting my df_data, creating a new df_first using the drop_duplicates(keep = 'first') like you suggested for the initial occurrence of each account number. Then, I used drop() to get rid of those first occurrences from df_data and repeated process to get second occurrences in their own dataframe, then combined the 2. Thank you!
– RiverVal
Mar 11 at 22:41