generate bootstrap sample dependent on columnHow to sort a dataframe by multiple column(s)?Resample a longitudinal dataset in Rdata.table vs dplyr: can one do something well the other can't or does poorly?Using dplyr's do to perform bootstrap replicationsHow to resample data by clusters (block sampling) with replacement in R using Sampling packageReproducable bootstrap samplesreplicate the row depending on a columnBootstrapping on R - resampling for a stratfied sampleResampling cross-sectional time series data in RHow to subsample different numbers by ID and bootstrap in R
When is separating the total wavefunction into a space part and a spin part possible?
Blender - show edges angles “direction”
Can the harmonic series explain the origin of the major scale?
Why does this part of the Space Shuttle launch pad seem to be floating in air?
How can I successfully establish a nationwide combat training program for a large country?
What does the "3am" section means in manpages?
Can I use my Chinese passport to enter China after I acquired another citizenship?
What should I use for Mishna study?
word describing multiple paths to the same abstract outcome
Meta programming: Declare a new struct on the fly
Giant Toughroad SLR 2 for 200 miles in two days, will it make it?
Bob has never been a M before
Why isn't KTEX's runway designation 10/28 instead of 9/27?
Should my PhD thesis be submitted under my legal name?
How to check participants in at events?
How to prevent YouTube from showing already watched videos?
What do you call the infoboxes with text and sometimes images on the side of a page we find in textbooks?
node command while defining a coordinate in TikZ
Pronouncing Homer as in modern Greek
Can I Retrieve Email Addresses from BCC?
How will losing mobility of one hand affect my career as a programmer?
Is it legal to discriminate due to the medicine used to treat a medical condition?
I2C signal and power over long range (10meter cable)
Indicating multiple different modes of speech (fantasy language or telepathy)
generate bootstrap sample dependent on column
How to sort a dataframe by multiple column(s)?Resample a longitudinal dataset in Rdata.table vs dplyr: can one do something well the other can't or does poorly?Using dplyr's do to perform bootstrap replicationsHow to resample data by clusters (block sampling) with replacement in R using Sampling packageReproducable bootstrap samplesreplicate the row depending on a columnBootstrapping on R - resampling for a stratfied sampleResampling cross-sectional time series data in RHow to subsample different numbers by ID and bootstrap in R
I have a data set like this
set.seed(1)
df <- data.frame(ID = rep(1:4, each = 3),
x = c(1,2,3,2,3,4,1,2,3,3,4,5),
V1 = rnorm(12))
> df
ID x V1
1 1 1 -0.6264538
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
12 4 5 0.3898432
this example contains 4 individuals, defined by ID.
Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.
In this example I have 2 observations at time point 1 (ID 1 and ID 3),
and 3 observations at time point 2 (IDs 1,2,3)
I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.
In this example the data set could look like this:
> df
ID x V1
1 1 1 -0.6264538
1 1 1 -0.6264538
2 1 2 0.1836433
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
6 2 4 -0.8204684
7 3 1 0.4874291
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
11 4 4 1.5117812
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
this data set now has 4 observations at each time point.
r dplyr sampling resampling
add a comment |
I have a data set like this
set.seed(1)
df <- data.frame(ID = rep(1:4, each = 3),
x = c(1,2,3,2,3,4,1,2,3,3,4,5),
V1 = rnorm(12))
> df
ID x V1
1 1 1 -0.6264538
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
12 4 5 0.3898432
this example contains 4 individuals, defined by ID.
Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.
In this example I have 2 observations at time point 1 (ID 1 and ID 3),
and 3 observations at time point 2 (IDs 1,2,3)
I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.
In this example the data set could look like this:
> df
ID x V1
1 1 1 -0.6264538
1 1 1 -0.6264538
2 1 2 0.1836433
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
6 2 4 -0.8204684
7 3 1 0.4874291
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
11 4 4 1.5117812
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
this data set now has 4 observations at each time point.
r dplyr sampling resampling
add a comment |
I have a data set like this
set.seed(1)
df <- data.frame(ID = rep(1:4, each = 3),
x = c(1,2,3,2,3,4,1,2,3,3,4,5),
V1 = rnorm(12))
> df
ID x V1
1 1 1 -0.6264538
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
12 4 5 0.3898432
this example contains 4 individuals, defined by ID.
Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.
In this example I have 2 observations at time point 1 (ID 1 and ID 3),
and 3 observations at time point 2 (IDs 1,2,3)
I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.
In this example the data set could look like this:
> df
ID x V1
1 1 1 -0.6264538
1 1 1 -0.6264538
2 1 2 0.1836433
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
6 2 4 -0.8204684
7 3 1 0.4874291
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
11 4 4 1.5117812
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
this data set now has 4 observations at each time point.
r dplyr sampling resampling
I have a data set like this
set.seed(1)
df <- data.frame(ID = rep(1:4, each = 3),
x = c(1,2,3,2,3,4,1,2,3,3,4,5),
V1 = rnorm(12))
> df
ID x V1
1 1 1 -0.6264538
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
12 4 5 0.3898432
this example contains 4 individuals, defined by ID.
Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.
In this example I have 2 observations at time point 1 (ID 1 and ID 3),
and 3 observations at time point 2 (IDs 1,2,3)
I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.
In this example the data set could look like this:
> df
ID x V1
1 1 1 -0.6264538
1 1 1 -0.6264538
2 1 2 0.1836433
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
6 2 4 -0.8204684
7 3 1 0.4874291
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
11 4 4 1.5117812
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
this data set now has 4 observations at each time point.
r dplyr sampling resampling
r dplyr sampling resampling
asked Mar 8 at 7:50
spore234spore234
1,34712050
1,34712050
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.
max_sample <- max(table(df$x))
library(dplyr)
df %>%
group_by(x) %>%
sample_n(max_sample, replace = TRUE) %>%
arrange(x)
# ID x V1
# <int> <dbl> <dbl>
# 1 3 1 0.487
# 2 1 1 -0.626
# 3 1 1 -0.626
# 4 1 1 -0.626
# 5 3 2 0.738
# 6 2 2 1.60
# 7 2 2 1.60
# 8 3 2 0.738
# 9 4 3 -0.305
#10 2 3 0.330
#11 2 3 0.330
#12 4 3 -0.305
#13 4 4 1.51
#14 4 4 1.51
#15 4 4 1.51
#16 4 4 1.51
#17 4 5 0.390
#18 4 5 0.390
#19 4 5 0.390
#20 4 5 0.390
thank, I should add thatxdoes not always start with 1, it ranges from -20 to +20
– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency ofxwithtableirrespective of what it's value is.
– Ronak Shah
Mar 8 at 8:08
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058842%2fgenerate-bootstrap-sample-dependent-on-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.
max_sample <- max(table(df$x))
library(dplyr)
df %>%
group_by(x) %>%
sample_n(max_sample, replace = TRUE) %>%
arrange(x)
# ID x V1
# <int> <dbl> <dbl>
# 1 3 1 0.487
# 2 1 1 -0.626
# 3 1 1 -0.626
# 4 1 1 -0.626
# 5 3 2 0.738
# 6 2 2 1.60
# 7 2 2 1.60
# 8 3 2 0.738
# 9 4 3 -0.305
#10 2 3 0.330
#11 2 3 0.330
#12 4 3 -0.305
#13 4 4 1.51
#14 4 4 1.51
#15 4 4 1.51
#16 4 4 1.51
#17 4 5 0.390
#18 4 5 0.390
#19 4 5 0.390
#20 4 5 0.390
thank, I should add thatxdoes not always start with 1, it ranges from -20 to +20
– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency ofxwithtableirrespective of what it's value is.
– Ronak Shah
Mar 8 at 8:08
add a comment |
We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.
max_sample <- max(table(df$x))
library(dplyr)
df %>%
group_by(x) %>%
sample_n(max_sample, replace = TRUE) %>%
arrange(x)
# ID x V1
# <int> <dbl> <dbl>
# 1 3 1 0.487
# 2 1 1 -0.626
# 3 1 1 -0.626
# 4 1 1 -0.626
# 5 3 2 0.738
# 6 2 2 1.60
# 7 2 2 1.60
# 8 3 2 0.738
# 9 4 3 -0.305
#10 2 3 0.330
#11 2 3 0.330
#12 4 3 -0.305
#13 4 4 1.51
#14 4 4 1.51
#15 4 4 1.51
#16 4 4 1.51
#17 4 5 0.390
#18 4 5 0.390
#19 4 5 0.390
#20 4 5 0.390
thank, I should add thatxdoes not always start with 1, it ranges from -20 to +20
– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency ofxwithtableirrespective of what it's value is.
– Ronak Shah
Mar 8 at 8:08
add a comment |
We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.
max_sample <- max(table(df$x))
library(dplyr)
df %>%
group_by(x) %>%
sample_n(max_sample, replace = TRUE) %>%
arrange(x)
# ID x V1
# <int> <dbl> <dbl>
# 1 3 1 0.487
# 2 1 1 -0.626
# 3 1 1 -0.626
# 4 1 1 -0.626
# 5 3 2 0.738
# 6 2 2 1.60
# 7 2 2 1.60
# 8 3 2 0.738
# 9 4 3 -0.305
#10 2 3 0.330
#11 2 3 0.330
#12 4 3 -0.305
#13 4 4 1.51
#14 4 4 1.51
#15 4 4 1.51
#16 4 4 1.51
#17 4 5 0.390
#18 4 5 0.390
#19 4 5 0.390
#20 4 5 0.390
We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.
max_sample <- max(table(df$x))
library(dplyr)
df %>%
group_by(x) %>%
sample_n(max_sample, replace = TRUE) %>%
arrange(x)
# ID x V1
# <int> <dbl> <dbl>
# 1 3 1 0.487
# 2 1 1 -0.626
# 3 1 1 -0.626
# 4 1 1 -0.626
# 5 3 2 0.738
# 6 2 2 1.60
# 7 2 2 1.60
# 8 3 2 0.738
# 9 4 3 -0.305
#10 2 3 0.330
#11 2 3 0.330
#12 4 3 -0.305
#13 4 4 1.51
#14 4 4 1.51
#15 4 4 1.51
#16 4 4 1.51
#17 4 5 0.390
#18 4 5 0.390
#19 4 5 0.390
#20 4 5 0.390
answered Mar 8 at 8:02
Ronak ShahRonak Shah
42.8k104266
42.8k104266
thank, I should add thatxdoes not always start with 1, it ranges from -20 to +20
– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency ofxwithtableirrespective of what it's value is.
– Ronak Shah
Mar 8 at 8:08
add a comment |
thank, I should add thatxdoes not always start with 1, it ranges from -20 to +20
– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency ofxwithtableirrespective of what it's value is.
– Ronak Shah
Mar 8 at 8:08
thank, I should add that
x does not always start with 1, it ranges from -20 to +20– spore234
Mar 8 at 8:06
thank, I should add that
x does not always start with 1, it ranges from -20 to +20– spore234
Mar 8 at 8:06
@spore234 umm...it should not matter I think because we are counting frequency of
x with table irrespective of what it's value is.– Ronak Shah
Mar 8 at 8:08
@spore234 umm...it should not matter I think because we are counting frequency of
x with table irrespective of what it's value is.– Ronak Shah
Mar 8 at 8:08
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058842%2fgenerate-bootstrap-sample-dependent-on-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown