SQL: Random sampling X% of samples(unique IDs) from duplicated ID table2019 Community Moderator ElectionAdd a column with a default value to an existing table in SQL ServerHow to return only the Date from a SQL Server DateTime datatypeHow to check if a column exists in a SQL Server table?Check if table exists in SQL ServerHow to concatenate text from multiple rows into a single text string in SQL server?SQL update from one Table to another based on a ID matchHow do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableFind all tables containing column with specified name - MS SQL ServerHow to drop a table if it exists in SQL Server?
Help! My Character is too much for her story!
Having the player face themselves after the mid-game
Short story about cities being connected by a conveyor belt
Paper published similar to PhD thesis
Why isn't P and P/poly trivially the same?
Short story about an infectious indestructible metal bar?
Averaging over columns while ignoring zero entries
What exactly is the meaning of "fine wine"?
After Brexit, will the EU recognize British passports that are valid for more than ten years?
Mixed Feelings - What am I
What would be the most expensive material to an intergalactic society?
If nine coins are tossed, what is the probability that the number of heads is even?
How spaceships determine each other's mass in space?
Where is the License file location for Identity Server in Sitecore 9.1?
Why does a car's steering wheel get lighter with increasing speed
Why do phishing e-mails use faked e-mail addresses instead of the real one?
Unfamiliar notation in Diabelli's "Duet in D" for piano
Precision notation for voltmeters
Ultrafilters as a double dual
Is the differential, dp, exact or not?
Professor forcing me to attend a conference, I can't afford even with 50% funding
Exempt portion of equation line from aligning?
Why aren't there more Gauls like Obelix?
Vector-transposing function
SQL: Random sampling X% of samples(unique IDs) from duplicated ID table
2019 Community Moderator ElectionAdd a column with a default value to an existing table in SQL ServerHow to return only the Date from a SQL Server DateTime datatypeHow to check if a column exists in a SQL Server table?Check if table exists in SQL ServerHow to concatenate text from multiple rows into a single text string in SQL server?SQL update from one Table to another based on a ID matchHow do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableFind all tables containing column with specified name - MS SQL ServerHow to drop a table if it exists in SQL Server?
I am using SQL and obtained the following table:
userID| time | location
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
I've written:
Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)
group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
===Added =========================
Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location
sql sql-server
New contributor
add a comment |
I am using SQL and obtained the following table:
userID| time | location
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
I've written:
Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)
group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
===Added =========================
Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location
sql sql-server
New contributor
add a comment |
I am using SQL and obtained the following table:
userID| time | location
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
I've written:
Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)
group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
===Added =========================
Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location
sql sql-server
New contributor
I am using SQL and obtained the following table:
userID| time | location
A10 | 20130801| 1000
A10 | 20130802| 1002
A10 | 20130806| 1008
B21 | 20130803| 1000
B21 | 20130801| 1099
C11 | 20130802| 1000
D33 | 20130802| 1002
D33 | 20130806| 1877
E01 | 20130801| 1765
E01 | 20130801| 1000
E01 | 20130802| 1000
where userID is String, Time is YYYYMM, and Location is location ID (numeric) for each userID in YYYYMM.
In this example, I have 5 unique userIDs (A10,B21,C11,D33,E01). I would like to write query that randomly sample X percent of unique userIDs (for example, if X=80, randomly sample 4 unique userIDs from 5).
I've written:
Select time,location,
count(DISTINCT userID) as n_uu
from( ---
--- here, I construct the example table
---
) as maintable
where 0.8 >= CAST(CHECKSUM(NEWID(), userID) & 0x7fffffff AS float)/CAST
(0x7fffffff AS int)
group by time, location
where I finally intend to count the number of userIDs for randomly selected samples (i.e., 80% of unique userIDs). That is, I try to get following table in this example (now suppose that B21 is not sampled):
time | location | n_uu
20130801| 1000 | 2
20130801| 1765 | 1
20130802| 1002 | 2
20130802| 1000 | 2
20130806| 1008 | 1
20130806| 1877 | 1
Yet, this seems not to randomly sample unique userIDs, rather it randomly select rows.
How to fix it, or can I do this in faster by using other query? Any advice would be very appreciated.
===Added =========================
Select time,location,
count(DISTINCT userID) as n_uu
from(
(Select
--- here, I construct the example table
---
) as maintable
Select maintable.*
from maintable join
(Select top 80 percent userID
from (Select Distinct userID from maintable) newtable
order by NEWID()
) newtable
on maintable.userID = newtable.userID
)
group by time, location
sql sql-server
sql sql-server
New contributor
New contributor
edited 2 days ago
Kehoe
New contributor
asked 2 days ago
KehoeKehoe
213
213
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Kehoe is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55028084%2fsql-random-sampling-x-of-samplesunique-ids-from-duplicated-id-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
add a comment |
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
add a comment |
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
Hmmm. If I understand correctly, you want to sample within a hierarchy. So, get the user ids you want with a subquery and then join in the rest of the information:
select t.*
from t join
(select top 80 percent userid
from (select distinct userid from t) u
order by newid()
) u
on t.userid = u.userid;
answered 2 days ago
Gordon LinoffGordon Linoff
784k35310415
784k35310415
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
add a comment |
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
Thanks, Gordon. I edited my question by adding updated code. Is it exactly what you suggest ?
– Kehoe
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
@Kehoe . . . It looks like the right idea as far as the sampling goes.
– Gordon Linoff
2 days ago
add a comment |
Kehoe is a new contributor. Be nice, and check out our Code of Conduct.
Kehoe is a new contributor. Be nice, and check out our Code of Conduct.
Kehoe is a new contributor. Be nice, and check out our Code of Conduct.
Kehoe is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55028084%2fsql-random-sampling-x-of-samplesunique-ids-from-duplicated-id-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown