Merge (join) data frames - too many rows in resultLeft joining two tables with identical keysHow to join (merge) data frames (inner, outer, left, right)?Left join only selected columns in R with the merge() functionSubset of a data frame including elements of another data frame at the specified columnsR - Merge 2 data frames with one column being differentMerge returns duplicate rowsMerge two python pandas data frames of different length but keep all rows in output data frameMerging multiple data frames in a loopMerge two data.frames where one of the data frame contain an extra rowPython:merge data frame with different rowsMerging rows from two data frames in pairs
On a tidally locked planet, would time be quantized?
How can I write humor as character trait?
Can I still be respawned if I die by falling off the map?
Is there an injective, monotonically increasing, strictly concave function from the reals, to the reals?
Limits and Infinite Integration by Parts
Does IPv6 have similar concept of network mask?
What happens if you are holding an Iron Flask with a demon inside and walk into an Antimagic Field?
Can I visit Japan without a visa?
Angel of Condemnation - Exile creature with second ability
15% tax on $7.5k earnings. Is that right?
What are the advantages of simplicial model categories over non-simplicial ones?
A social experiment. What is the worst that can happen?
Keeping a ball lost forever
Why is the "ls" command showing permissions of files in a FAT32 partition?
Why does the Sun have different day lengths, but not the gas giants?
Why can Carol Danvers change her suit colours in the first place?
Redundant comparison & "if" before assignment
How do you make your own symbol when Detexify fails?
Does malloc reserve more space while allocating memory?
Fear of getting stuck on one programming language / technology that is not used in my country
How does a computer interpret real numbers?
How to cover method return statement in Apex Class?
Did arcade monitors have same pixel aspect ratio as TV sets?
Why Shazam when there is already Superman?
Merge (join) data frames - too many rows in result
Left joining two tables with identical keysHow to join (merge) data frames (inner, outer, left, right)?Left join only selected columns in R with the merge() functionSubset of a data frame including elements of another data frame at the specified columnsR - Merge 2 data frames with one column being differentMerge returns duplicate rowsMerge two python pandas data frames of different length but keep all rows in output data frameMerging multiple data frames in a loopMerge two data.frames where one of the data frame contain an extra rowPython:merge data frame with different rowsMerging rows from two data frames in pairs
I have two data frames(df1 and df2). I want to join them using merge function.
df1 has 3903 rows and df2 has 351 rows.
I want to left join df2 to df1 by a common column(column1). I am using merge function.
My code is like below:
dfjoin<-merge(df1,df2, by="column1",all.x=TRUE)
So I expect dfjoin has 3903 rows equal to rows of df1. However it returns 4010 rows.
Why does it return more rows than expected. I will be very glad for any help. Thanks a lot.
r merge
add a comment |
I have two data frames(df1 and df2). I want to join them using merge function.
df1 has 3903 rows and df2 has 351 rows.
I want to left join df2 to df1 by a common column(column1). I am using merge function.
My code is like below:
dfjoin<-merge(df1,df2, by="column1",all.x=TRUE)
So I expect dfjoin has 3903 rows equal to rows of df1. However it returns 4010 rows.
Why does it return more rows than expected. I will be very glad for any help. Thanks a lot.
r merge
1
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by usingtable(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
– RDizzl3
Mar 12 '16 at 10:12
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library calledsqldf
which allows you to use sql like queries on your data frames!
– RDizzl3
Mar 12 '16 at 10:16
not reproducible
– jangorecki
Mar 12 '16 at 12:05
add a comment |
I have two data frames(df1 and df2). I want to join them using merge function.
df1 has 3903 rows and df2 has 351 rows.
I want to left join df2 to df1 by a common column(column1). I am using merge function.
My code is like below:
dfjoin<-merge(df1,df2, by="column1",all.x=TRUE)
So I expect dfjoin has 3903 rows equal to rows of df1. However it returns 4010 rows.
Why does it return more rows than expected. I will be very glad for any help. Thanks a lot.
r merge
I have two data frames(df1 and df2). I want to join them using merge function.
df1 has 3903 rows and df2 has 351 rows.
I want to left join df2 to df1 by a common column(column1). I am using merge function.
My code is like below:
dfjoin<-merge(df1,df2, by="column1",all.x=TRUE)
So I expect dfjoin has 3903 rows equal to rows of df1. However it returns 4010 rows.
Why does it return more rows than expected. I will be very glad for any help. Thanks a lot.
r merge
r merge
edited Jul 31 '18 at 0:33
Gregor
67.1k1095179
67.1k1095179
asked Mar 12 '16 at 9:51
oercimoercim
7002620
7002620
1
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by usingtable(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
– RDizzl3
Mar 12 '16 at 10:12
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library calledsqldf
which allows you to use sql like queries on your data frames!
– RDizzl3
Mar 12 '16 at 10:16
not reproducible
– jangorecki
Mar 12 '16 at 12:05
add a comment |
1
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by usingtable(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
– RDizzl3
Mar 12 '16 at 10:12
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library calledsqldf
which allows you to use sql like queries on your data frames!
– RDizzl3
Mar 12 '16 at 10:16
not reproducible
– jangorecki
Mar 12 '16 at 12:05
1
1
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using
table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.– RDizzl3
Mar 12 '16 at 10:12
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using
table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.– RDizzl3
Mar 12 '16 at 10:12
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called
sqldf
which allows you to use sql like queries on your data frames!– RDizzl3
Mar 12 '16 at 10:16
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called
sqldf
which allows you to use sql like queries on your data frames!– RDizzl3
Mar 12 '16 at 10:16
not reproducible
– jangorecki
Mar 12 '16 at 12:05
not reproducible
– jangorecki
Mar 12 '16 at 12:05
add a comment |
3 Answers
3
active
oldest
votes
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf
which allows you to use sql like queries on your data frames!
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
add a comment |
I can't be sure without seeing an example of your problem, but usually the syntax is:
df <- merge(df1, df2, by.all="name_of_column_in_common", all.x=T)
However, if the columns you are matching on have duplicated values, r will match all possible combinations. So,
df1 <- data.frame(id=c("a","a","b","c"), x1=rnorm(4))
df2 <- data.frame(id=c("a","a","b"), x2=rnorm(3))
df <- merge(df1, df2, by.all="id", all.x=T)
Will give you a df of dimensions 6 by 3, as each "a" in df2 has been matched to each "a" in df1, 2 by 2 for 4 permutations.
add a comment |
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35956108%2fmerge-join-data-frames-too-many-rows-in-result%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf
which allows you to use sql like queries on your data frames!
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
add a comment |
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf
which allows you to use sql like queries on your data frames!
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
add a comment |
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf
which allows you to use sql like queries on your data frames!
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf
which allows you to use sql like queries on your data frames!
answered Mar 12 '16 at 10:22
RDizzl3RDizzl3
198111
198111
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
add a comment |
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
Thanks a lt RDizzl3. As you said the tables were not one to one mapping.
– oercim
Mar 12 '16 at 23:46
add a comment |
I can't be sure without seeing an example of your problem, but usually the syntax is:
df <- merge(df1, df2, by.all="name_of_column_in_common", all.x=T)
However, if the columns you are matching on have duplicated values, r will match all possible combinations. So,
df1 <- data.frame(id=c("a","a","b","c"), x1=rnorm(4))
df2 <- data.frame(id=c("a","a","b"), x2=rnorm(3))
df <- merge(df1, df2, by.all="id", all.x=T)
Will give you a df of dimensions 6 by 3, as each "a" in df2 has been matched to each "a" in df1, 2 by 2 for 4 permutations.
add a comment |
I can't be sure without seeing an example of your problem, but usually the syntax is:
df <- merge(df1, df2, by.all="name_of_column_in_common", all.x=T)
However, if the columns you are matching on have duplicated values, r will match all possible combinations. So,
df1 <- data.frame(id=c("a","a","b","c"), x1=rnorm(4))
df2 <- data.frame(id=c("a","a","b"), x2=rnorm(3))
df <- merge(df1, df2, by.all="id", all.x=T)
Will give you a df of dimensions 6 by 3, as each "a" in df2 has been matched to each "a" in df1, 2 by 2 for 4 permutations.
add a comment |
I can't be sure without seeing an example of your problem, but usually the syntax is:
df <- merge(df1, df2, by.all="name_of_column_in_common", all.x=T)
However, if the columns you are matching on have duplicated values, r will match all possible combinations. So,
df1 <- data.frame(id=c("a","a","b","c"), x1=rnorm(4))
df2 <- data.frame(id=c("a","a","b"), x2=rnorm(3))
df <- merge(df1, df2, by.all="id", all.x=T)
Will give you a df of dimensions 6 by 3, as each "a" in df2 has been matched to each "a" in df1, 2 by 2 for 4 permutations.
I can't be sure without seeing an example of your problem, but usually the syntax is:
df <- merge(df1, df2, by.all="name_of_column_in_common", all.x=T)
However, if the columns you are matching on have duplicated values, r will match all possible combinations. So,
df1 <- data.frame(id=c("a","a","b","c"), x1=rnorm(4))
df2 <- data.frame(id=c("a","a","b"), x2=rnorm(3))
df <- merge(df1, df2, by.all="id", all.x=T)
Will give you a df of dimensions 6 by 3, as each "a" in df2 has been matched to each "a" in df1, 2 by 2 for 4 permutations.
answered Mar 12 '16 at 10:20
gfgmgfgm
2,703727
2,703727
add a comment |
add a comment |
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).
add a comment |
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).
add a comment |
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).
answered Mar 8 at 21:31
Moody_MudskipperMoody_Mudskipper
24.2k33466
24.2k33466
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35956108%2fmerge-join-data-frames-too-many-rows-in-result%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using
table(df2$column1)
. If you find a value from column1 with a count > 1 then this is the reason.– RDizzl3
Mar 12 '16 at 10:12
Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called
sqldf
which allows you to use sql like queries on your data frames!– RDizzl3
Mar 12 '16 at 10:16
not reproducible
– jangorecki
Mar 12 '16 at 12:05