SQL: Random sampling X% of samples(unique IDs) from duplicated ID table2019 Community Moderator ElectionAdd a column with a default value to an existing table in SQL ServerHow to return only the Date from a SQL Server DateTime datatypeHow to check if a column exists in a SQL Server table?Check if table exists in SQL ServerHow to concatenate text from multiple rows into a single text string in SQL server?SQL update from one Table to another based on a ID matchHow do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableFind all tables containing column with specified name - MS SQL ServerHow to drop a table if it exists in SQL Server?

Help! My Character is too much for her story!

Having the player face themselves after the mid-game

Short story about cities being connected by a conveyor belt

Paper published similar to PhD thesis

Why isn't P and P/poly trivially the same?

Short story about an infectious indestructible metal bar?

Averaging over columns while ignoring zero entries

What exactly is the meaning of "fine wine"?

After Brexit, will the EU recognize British passports that are valid for more than ten years?

Mixed Feelings - What am I

What would be the most expensive material to an intergalactic society?

If nine coins are tossed, what is the probability that the number of heads is even?

How spaceships determine each other's mass in space?

Where is the License file location for Identity Server in Sitecore 9.1?

Why does a car's steering wheel get lighter with increasing speed

Why do phishing e-mails use faked e-mail addresses instead of the real one?

Unfamiliar notation in Diabelli's "Duet in D" for piano

Precision notation for voltmeters

Ultrafilters as a double dual

Is the differential, dp, exact or not?

Professor forcing me to attend a conference, I can't afford even with 50% funding

Exempt portion of equation line from aligning?

Why aren't there more Gauls like Obelix?

Vector-transposing function



SQL: Random sampling X% of samples(unique IDs) from duplicated ID table



2019 Community Moderator ElectionAdd a column with a default value to an existing table in SQL ServerHow to return only the Date from a SQL Server DateTime datatypeHow to check if a column exists in a SQL Server table?Check if table exists in SQL ServerHow to concatenate text from multiple rows into a single text string in SQL server?SQL update from one Table to another based on a ID matchHow do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableFind all tables containing column with specified name - MS SQL ServerHow to drop a table if it exists in SQL Server?










1















I am using SQL and obtained the following table:



userID| time | location 
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000


where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.



In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).



I've written:



Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)

group by time, location


where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):



time | location | n_uu 
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1


Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.



How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.



===Added =========================



Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location









share|improve this question









New contributor




Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    1















    I am using SQL and obtained the following table:



    userID| time | location 
    A10 | 20130801| 1000
    A10 | 20130802| 1002
    A10 | 20130806| 1008
    B21 | 20130803| 1000
    B21 | 20130801| 1099
    C11 | 20130802| 1000
    D33 | 20130802| 1002
    D33 | 20130806| 1877
    E01 | 20130801| 1765
    E01 | 20130801| 1000
    E01 | 20130802| 1000


    where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.



    In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).



    I've written:



    Select time,location,
    count(DISTINCT userID) as n_uu
    from( ---
    --- here, I construct the example table
    ---
    ) as maintable
    where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
    (0x7fffffff AS int)

    group by time, location


    where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):



    time | location | n_uu 
    20130801| 1000 | 2
    20130801| 1765 | 1
    20130802| 1002 | 2
    20130802| 1000 | 2
    20130806| 1008 | 1
    20130806| 1877 | 1


    Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.



    How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.



    ===Added =========================



    Select time,location,
    count(DISTINCT userID) as n_uu
    from(
    (Select
    --- here, I construct the example table
    ---
    ) as maintable
    Select maintable.*
    from maintable join
    (Select top 80 percent userID
    from (Select Distinct userID from maintable) newtable
    order by NEWID()
    ) newtable
    on maintable.userID = newtable.userID
    )
    group by time, location









    share|improve this question









    New contributor




    Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      1












      1








      1








      I am using SQL and obtained the following table:



      userID| time | location 
      A10 | 20130801| 1000
      A10 | 20130802| 1002
      A10 | 20130806| 1008
      B21 | 20130803| 1000
      B21 | 20130801| 1099
      C11 | 20130802| 1000
      D33 | 20130802| 1002
      D33 | 20130806| 1877
      E01 | 20130801| 1765
      E01 | 20130801| 1000
      E01 | 20130802| 1000


      where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.



      In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).



      I've written:



      Select time,location,
      count(DISTINCT userID) as n_uu
      from( ---
      --- here, I construct the example table
      ---
      ) as maintable
      where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
      (0x7fffffff AS int)

      group by time, location


      where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):



      time | location | n_uu 
      20130801| 1000 | 2
      20130801| 1765 | 1
      20130802| 1002 | 2
      20130802| 1000 | 2
      20130806| 1008 | 1
      20130806| 1877 | 1


      Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.



      How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.



      ===Added =========================



      Select time,location,
      count(DISTINCT userID) as n_uu
      from(
      (Select
      --- here, I construct the example table
      ---
      ) as maintable
      Select maintable.*
      from maintable join
      (Select top 80 percent userID
      from (Select Distinct userID from maintable) newtable
      order by NEWID()
      ) newtable
      on maintable.userID = newtable.userID
      )
      group by time, location









      share|improve this question









      New contributor




      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I am using SQL and obtained the following table:



      userID| time | location 
      A10 | 20130801| 1000
      A10 | 20130802| 1002
      A10 | 20130806| 1008
      B21 | 20130803| 1000
      B21 | 20130801| 1099
      C11 | 20130802| 1000
      D33 | 20130802| 1002
      D33 | 20130806| 1877
      E01 | 20130801| 1765
      E01 | 20130801| 1000
      E01 | 20130802| 1000


      where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.



      In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).



      I've written:



      Select time,location,
      count(DISTINCT userID) as n_uu
      from( ---
      --- here, I construct the example table
      ---
      ) as maintable
      where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
      (0x7fffffff AS int)

      group by time, location


      where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):



      time | location | n_uu 
      20130801| 1000 | 2
      20130801| 1765 | 1
      20130802| 1002 | 2
      20130802| 1000 | 2
      20130806| 1008 | 1
      20130806| 1877 | 1


      Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.



      How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.



      ===Added =========================



      Select time,location,
      count(DISTINCT userID) as n_uu
      from(
      (Select
      --- here, I construct the example table
      ---
      ) as maintable
      Select maintable.*
      from maintable join
      (Select top 80 percent userID
      from (Select Distinct userID from maintable) newtable
      order by NEWID()
      ) newtable
      on maintable.userID = newtable.userID
      )
      group by time, location






      sql sql-server






      share|improve this question









      New contributor




      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago







      Kehoe













      New contributor




      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      KehoeKehoe

      213




      213




      New contributor




      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Kehoe is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          3














          Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:



          select t.*
          from t join
          (select top 80 percent userid
          from (select distinct userid from t) u
          order by newid()
          ) u
          on t.userid = u.userid;





          share|improve this answer























          • Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

            – Kehoe
            2 days ago











          • @Kehoe . . . It looks like the right idea as far as the sampling goes.

            – Gordon Linoff
            2 days ago











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Kehoe is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55028084%2fsql-random-sampling-x-of-samplesunique-ids-from-duplicated-id-table%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:



          select t.*
          from t join
          (select top 80 percent userid
          from (select distinct userid from t) u
          order by newid()
          ) u
          on t.userid = u.userid;





          share|improve this answer























          • Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

            – Kehoe
            2 days ago











          • @Kehoe . . . It looks like the right idea as far as the sampling goes.

            – Gordon Linoff
            2 days ago
















          3














          Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:



          select t.*
          from t join
          (select top 80 percent userid
          from (select distinct userid from t) u
          order by newid()
          ) u
          on t.userid = u.userid;





          share|improve this answer























          • Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

            – Kehoe
            2 days ago











          • @Kehoe . . . It looks like the right idea as far as the sampling goes.

            – Gordon Linoff
            2 days ago














          3












          3








          3







          Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:



          select t.*
          from t join
          (select top 80 percent userid
          from (select distinct userid from t) u
          order by newid()
          ) u
          on t.userid = u.userid;





          share|improve this answer













          Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:



          select t.*
          from t join
          (select top 80 percent userid
          from (select distinct userid from t) u
          order by newid()
          ) u
          on t.userid = u.userid;






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          Gordon LinoffGordon Linoff

          784k35310415




          784k35310415












          • Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

            – Kehoe
            2 days ago











          • @Kehoe . . . It looks like the right idea as far as the sampling goes.

            – Gordon Linoff
            2 days ago


















          • Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

            – Kehoe
            2 days ago











          • @Kehoe . . . It looks like the right idea as far as the sampling goes.

            – Gordon Linoff
            2 days ago

















          Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

          – Kehoe
          2 days ago





          Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?

          – Kehoe
          2 days ago













          @Kehoe . . . It looks like the right idea as far as the sampling goes.

          – Gordon Linoff
          2 days ago






          @Kehoe . . . It looks like the right idea as far as the sampling goes.

          – Gordon Linoff
          2 days ago













          Kehoe is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          Kehoe is a new contributor. Be nice, and check out our Code of Conduct.












          Kehoe is a new contributor. Be nice, and check out our Code of Conduct.











          Kehoe is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55028084%2fsql-random-sampling-x-of-samplesunique-ids-from-duplicated-id-table%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

          2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived

          Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme