generate bootstrap sample dependent on columnHow to sort a dataframe by multiple column(s)?Resample a longitudinal dataset in Rdata.table vs dplyr: can one do something well the other can't or does poorly?Using dplyr's do to perform bootstrap replicationsHow to resample data by clusters (block sampling) with replacement in R using Sampling packageReproducable bootstrap samplesreplicate the row depending on a columnBootstrapping on R - resampling for a stratfied sampleResampling cross-sectional time series data in RHow to subsample different numbers by ID and bootstrap in R

When is separating the total wavefunction into a space part and a spin part possible?

Blender - show edges angles “direction”

Can the harmonic series explain the origin of the major scale?

Why does this part of the Space Shuttle launch pad seem to be floating in air?

How can I successfully establish a nationwide combat training program for a large country?

What does the "3am" section means in manpages?

Can I use my Chinese passport to enter China after I acquired another citizenship?

What should I use for Mishna study?

word describing multiple paths to the same abstract outcome

Meta programming: Declare a new struct on the fly

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

Bob has never been a M before

Why isn't KTEX's runway designation 10/28 instead of 9/27?

Should my PhD thesis be submitted under my legal name?

How to check participants in at events?

How to prevent YouTube from showing already watched videos?

What do you call the infoboxes with text and sometimes images on the side of a page we find in textbooks?

node command while defining a coordinate in TikZ

Pronouncing Homer as in modern Greek

Can I Retrieve Email Addresses from BCC?

How will losing mobility of one hand affect my career as a programmer?

Is it legal to discriminate due to the medicine used to treat a medical condition?

I2C signal and power over long range (10meter cable)

Indicating multiple different modes of speech (fantasy language or telepathy)



generate bootstrap sample dependent on column


How to sort a dataframe by multiple column(s)?Resample a longitudinal dataset in Rdata.table vs dplyr: can one do something well the other can't or does poorly?Using dplyr's do to perform bootstrap replicationsHow to resample data by clusters (block sampling) with replacement in R using Sampling packageReproducable bootstrap samplesreplicate the row depending on a columnBootstrapping on R - resampling for a stratfied sampleResampling cross-sectional time series data in RHow to subsample different numbers by ID and bootstrap in R













-1















I have a data set like this



set.seed(1)
df <- data.frame(ID = rep(1:4, each = 3),
x = c(1,2,3,2,3,4,1,2,3,3,4,5),
V1 = rnorm(12))

> df
ID x V1
1 1 1 -0.6264538
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
12 4 5 0.3898432


this example contains 4 individuals, defined by ID.
Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.



In this example I have 2 observations at time point 1 (ID 1 and ID 3),
and 3 observations at time point 2 (IDs 1,2,3)



I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.



In this example the data set could look like this:



> df
ID x V1
1 1 1 -0.6264538
1 1 1 -0.6264538
2 1 2 0.1836433
2 1 2 0.1836433
3 1 3 -0.8356286
4 2 2 1.5952808
5 2 3 0.3295078
6 2 4 -0.8204684
6 2 4 -0.8204684
7 3 1 0.4874291
7 3 1 0.4874291
8 3 2 0.7383247
9 3 3 0.5757814
10 4 3 -0.3053884
11 4 4 1.5117812
11 4 4 1.5117812
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432
12 4 5 0.3898432


this data set now has 4 observations at each time point.










share|improve this question


























    -1















    I have a data set like this



    set.seed(1)
    df <- data.frame(ID = rep(1:4, each = 3),
    x = c(1,2,3,2,3,4,1,2,3,3,4,5),
    V1 = rnorm(12))

    > df
    ID x V1
    1 1 1 -0.6264538
    2 1 2 0.1836433
    3 1 3 -0.8356286
    4 2 2 1.5952808
    5 2 3 0.3295078
    6 2 4 -0.8204684
    7 3 1 0.4874291
    8 3 2 0.7383247
    9 3 3 0.5757814
    10 4 3 -0.3053884
    11 4 4 1.5117812
    12 4 5 0.3898432


    this example contains 4 individuals, defined by ID.
    Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.



    In this example I have 2 observations at time point 1 (ID 1 and ID 3),
    and 3 observations at time point 2 (IDs 1,2,3)



    I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.



    In this example the data set could look like this:



    > df
    ID x V1
    1 1 1 -0.6264538
    1 1 1 -0.6264538
    2 1 2 0.1836433
    2 1 2 0.1836433
    3 1 3 -0.8356286
    4 2 2 1.5952808
    5 2 3 0.3295078
    6 2 4 -0.8204684
    6 2 4 -0.8204684
    7 3 1 0.4874291
    7 3 1 0.4874291
    8 3 2 0.7383247
    9 3 3 0.5757814
    10 4 3 -0.3053884
    11 4 4 1.5117812
    11 4 4 1.5117812
    12 4 5 0.3898432
    12 4 5 0.3898432
    12 4 5 0.3898432
    12 4 5 0.3898432


    this data set now has 4 observations at each time point.










    share|improve this question
























      -1












      -1








      -1








      I have a data set like this



      set.seed(1)
      df <- data.frame(ID = rep(1:4, each = 3),
      x = c(1,2,3,2,3,4,1,2,3,3,4,5),
      V1 = rnorm(12))

      > df
      ID x V1
      1 1 1 -0.6264538
      2 1 2 0.1836433
      3 1 3 -0.8356286
      4 2 2 1.5952808
      5 2 3 0.3295078
      6 2 4 -0.8204684
      7 3 1 0.4874291
      8 3 2 0.7383247
      9 3 3 0.5757814
      10 4 3 -0.3053884
      11 4 4 1.5117812
      12 4 5 0.3898432


      this example contains 4 individuals, defined by ID.
      Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.



      In this example I have 2 observations at time point 1 (ID 1 and ID 3),
      and 3 observations at time point 2 (IDs 1,2,3)



      I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.



      In this example the data set could look like this:



      > df
      ID x V1
      1 1 1 -0.6264538
      1 1 1 -0.6264538
      2 1 2 0.1836433
      2 1 2 0.1836433
      3 1 3 -0.8356286
      4 2 2 1.5952808
      5 2 3 0.3295078
      6 2 4 -0.8204684
      6 2 4 -0.8204684
      7 3 1 0.4874291
      7 3 1 0.4874291
      8 3 2 0.7383247
      9 3 3 0.5757814
      10 4 3 -0.3053884
      11 4 4 1.5117812
      11 4 4 1.5117812
      12 4 5 0.3898432
      12 4 5 0.3898432
      12 4 5 0.3898432
      12 4 5 0.3898432


      this data set now has 4 observations at each time point.










      share|improve this question














      I have a data set like this



      set.seed(1)
      df <- data.frame(ID = rep(1:4, each = 3),
      x = c(1,2,3,2,3,4,1,2,3,3,4,5),
      V1 = rnorm(12))

      > df
      ID x V1
      1 1 1 -0.6264538
      2 1 2 0.1836433
      3 1 3 -0.8356286
      4 2 2 1.5952808
      5 2 3 0.3295078
      6 2 4 -0.8204684
      7 3 1 0.4874291
      8 3 2 0.7383247
      9 3 3 0.5757814
      10 4 3 -0.3053884
      11 4 4 1.5117812
      12 4 5 0.3898432


      this example contains 4 individuals, defined by ID.
      Each individual has an observation period x. For example ID 1 is observed at time points 1,2,3.



      In this example I have 2 observations at time point 1 (ID 1 and ID 3),
      and 3 observations at time point 2 (IDs 1,2,3)



      I now want a bootstrapped (sample with replacement) data set that contains the same number of observations at each time point.



      In this example the data set could look like this:



      > df
      ID x V1
      1 1 1 -0.6264538
      1 1 1 -0.6264538
      2 1 2 0.1836433
      2 1 2 0.1836433
      3 1 3 -0.8356286
      4 2 2 1.5952808
      5 2 3 0.3295078
      6 2 4 -0.8204684
      6 2 4 -0.8204684
      7 3 1 0.4874291
      7 3 1 0.4874291
      8 3 2 0.7383247
      9 3 3 0.5757814
      10 4 3 -0.3053884
      11 4 4 1.5117812
      11 4 4 1.5117812
      12 4 5 0.3898432
      12 4 5 0.3898432
      12 4 5 0.3898432
      12 4 5 0.3898432


      this data set now has 4 observations at each time point.







      r dplyr sampling resampling






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 8 at 7:50









      spore234spore234

      1,34712050




      1,34712050






















          1 Answer
          1






          active

          oldest

          votes


















          2














          We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.



          max_sample <- max(table(df$x))

          library(dplyr)

          df %>%
          group_by(x) %>%
          sample_n(max_sample, replace = TRUE) %>%
          arrange(x)

          # ID x V1
          # <int> <dbl> <dbl>
          # 1 3 1 0.487
          # 2 1 1 -0.626
          # 3 1 1 -0.626
          # 4 1 1 -0.626
          # 5 3 2 0.738
          # 6 2 2 1.60
          # 7 2 2 1.60
          # 8 3 2 0.738
          # 9 4 3 -0.305
          #10 2 3 0.330
          #11 2 3 0.330
          #12 4 3 -0.305
          #13 4 4 1.51
          #14 4 4 1.51
          #15 4 4 1.51
          #16 4 4 1.51
          #17 4 5 0.390
          #18 4 5 0.390
          #19 4 5 0.390
          #20 4 5 0.390





          share|improve this answer























          • thank, I should add that x does not always start with 1, it ranges from -20 to +20

            – spore234
            Mar 8 at 8:06











          • @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

            – Ronak Shah
            Mar 8 at 8:08











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058842%2fgenerate-bootstrap-sample-dependent-on-column%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.



          max_sample <- max(table(df$x))

          library(dplyr)

          df %>%
          group_by(x) %>%
          sample_n(max_sample, replace = TRUE) %>%
          arrange(x)

          # ID x V1
          # <int> <dbl> <dbl>
          # 1 3 1 0.487
          # 2 1 1 -0.626
          # 3 1 1 -0.626
          # 4 1 1 -0.626
          # 5 3 2 0.738
          # 6 2 2 1.60
          # 7 2 2 1.60
          # 8 3 2 0.738
          # 9 4 3 -0.305
          #10 2 3 0.330
          #11 2 3 0.330
          #12 4 3 -0.305
          #13 4 4 1.51
          #14 4 4 1.51
          #15 4 4 1.51
          #16 4 4 1.51
          #17 4 5 0.390
          #18 4 5 0.390
          #19 4 5 0.390
          #20 4 5 0.390





          share|improve this answer























          • thank, I should add that x does not always start with 1, it ranges from -20 to +20

            – spore234
            Mar 8 at 8:06











          • @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

            – Ronak Shah
            Mar 8 at 8:08
















          2














          We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.



          max_sample <- max(table(df$x))

          library(dplyr)

          df %>%
          group_by(x) %>%
          sample_n(max_sample, replace = TRUE) %>%
          arrange(x)

          # ID x V1
          # <int> <dbl> <dbl>
          # 1 3 1 0.487
          # 2 1 1 -0.626
          # 3 1 1 -0.626
          # 4 1 1 -0.626
          # 5 3 2 0.738
          # 6 2 2 1.60
          # 7 2 2 1.60
          # 8 3 2 0.738
          # 9 4 3 -0.305
          #10 2 3 0.330
          #11 2 3 0.330
          #12 4 3 -0.305
          #13 4 4 1.51
          #14 4 4 1.51
          #15 4 4 1.51
          #16 4 4 1.51
          #17 4 5 0.390
          #18 4 5 0.390
          #19 4 5 0.390
          #20 4 5 0.390





          share|improve this answer























          • thank, I should add that x does not always start with 1, it ranges from -20 to +20

            – spore234
            Mar 8 at 8:06











          • @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

            – Ronak Shah
            Mar 8 at 8:08














          2












          2








          2







          We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.



          max_sample <- max(table(df$x))

          library(dplyr)

          df %>%
          group_by(x) %>%
          sample_n(max_sample, replace = TRUE) %>%
          arrange(x)

          # ID x V1
          # <int> <dbl> <dbl>
          # 1 3 1 0.487
          # 2 1 1 -0.626
          # 3 1 1 -0.626
          # 4 1 1 -0.626
          # 5 3 2 0.738
          # 6 2 2 1.60
          # 7 2 2 1.60
          # 8 3 2 0.738
          # 9 4 3 -0.305
          #10 2 3 0.330
          #11 2 3 0.330
          #12 4 3 -0.305
          #13 4 4 1.51
          #14 4 4 1.51
          #15 4 4 1.51
          #16 4 4 1.51
          #17 4 5 0.390
          #18 4 5 0.390
          #19 4 5 0.390
          #20 4 5 0.390





          share|improve this answer













          We could first find the maximum number of times x occurs and sample_n for each x with replace = TRUE to get equal number of rows for each x.



          max_sample <- max(table(df$x))

          library(dplyr)

          df %>%
          group_by(x) %>%
          sample_n(max_sample, replace = TRUE) %>%
          arrange(x)

          # ID x V1
          # <int> <dbl> <dbl>
          # 1 3 1 0.487
          # 2 1 1 -0.626
          # 3 1 1 -0.626
          # 4 1 1 -0.626
          # 5 3 2 0.738
          # 6 2 2 1.60
          # 7 2 2 1.60
          # 8 3 2 0.738
          # 9 4 3 -0.305
          #10 2 3 0.330
          #11 2 3 0.330
          #12 4 3 -0.305
          #13 4 4 1.51
          #14 4 4 1.51
          #15 4 4 1.51
          #16 4 4 1.51
          #17 4 5 0.390
          #18 4 5 0.390
          #19 4 5 0.390
          #20 4 5 0.390






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 8 at 8:02









          Ronak ShahRonak Shah

          42.8k104266




          42.8k104266












          • thank, I should add that x does not always start with 1, it ranges from -20 to +20

            – spore234
            Mar 8 at 8:06











          • @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

            – Ronak Shah
            Mar 8 at 8:08


















          • thank, I should add that x does not always start with 1, it ranges from -20 to +20

            – spore234
            Mar 8 at 8:06











          • @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

            – Ronak Shah
            Mar 8 at 8:08

















          thank, I should add that x does not always start with 1, it ranges from -20 to +20

          – spore234
          Mar 8 at 8:06





          thank, I should add that x does not always start with 1, it ranges from -20 to +20

          – spore234
          Mar 8 at 8:06













          @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

          – Ronak Shah
          Mar 8 at 8:08






          @spore234 umm...it should not matter I think because we are counting frequency of x with table irrespective of what it's value is.

          – Ronak Shah
          Mar 8 at 8:08




















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55058842%2fgenerate-bootstrap-sample-dependent-on-column%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

          List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229