Does select_for_update see rows added by another select_for_update transaction after it unblocks?2019 Community Moderator ElectionDjango: how to wrap a bulk update/insert operation in transaction?How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?MySQL: Transactions vs Locking TablesHow can I parallelize Writes to the same row in MySQL?select_for_update kind of functionality in django 1.3 to avoid race conditionMysql - Row level locking deadlocks in concurrent transactionsDjango Transactions, select_for_update, concurrent modification?Does the Laravel `increment()` lock the row?Django select_for_update with nowait=True not raising DatabaseError for conflicting transctionsDjango: Transaction and select_for_update()Django & MariaDB/MySQL: Does select_for_update lock rows from subqueries? Causing deadlocks?
cannot log in to the server after changing SSH port
I can't die. Who am I?
What do you call someone who likes to pick fights?
Rationale to prefer local variables over instance variables?
Called into a meeting and told we are being made redundant (laid off) and "not to share outside". Can I tell my partner?
Leveling the sagging side of the home
Why restrict private health insurance?
What does *dead* mean in *What do you mean, dead?*?
Does an unused member variable take up memory?
What is the "determinant" of two vectors?
Locked Away- What am I?
What would be the most expensive material to an intergalactic society?
Short scifi story where reproductive organs are converted to produce "materials", pregnant protagonist is "found fit" to be a mother
Converting from "matrix" data into "coordinate" data
What can I do if someone tampers with my SSH public key?
Will expression retain the same definition if particle is changed?
Why is there an extra space when I type "ls" on the Desktop?
What is the purpose of a disclaimer like "this is not legal advice"?
Giving a career talk in my old university, how prominently should I tell students my salary?
Difference between `nmap local-IP-address` and `nmap localhost`
Why does Central Limit Theorem break down in my simulation?
The (Easy) Road to Code
Help! My Character is too much for her story!
Traveling to heavily polluted city, what practical measures can I take to minimize impact?
Does select_for_update see rows added by another select_for_update transaction after it unblocks?
2019 Community Moderator ElectionDjango: how to wrap a bulk update/insert operation in transaction?How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?MySQL: Transactions vs Locking TablesHow can I parallelize Writes to the same row in MySQL?select_for_update kind of functionality in django 1.3 to avoid race conditionMysql - Row level locking deadlocks in concurrent transactionsDjango Transactions, select_for_update, concurrent modification?Does the Laravel `increment()` lock the row?Django select_for_update with nowait=True not raising DatabaseError for conflicting transctionsDjango: Transaction and select_for_update()Django & MariaDB/MySQL: Does select_for_update lock rows from subqueries? Causing deadlocks?
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update
to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
mysql django django-models django-queryset django-orm
add a comment |
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update
to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
mysql django django-models django-queryset django-orm
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday
add a comment |
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update
to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
mysql django django-models django-queryset django-orm
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update
to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
mysql django django-models django-queryset django-orm
mysql django django-models django-queryset django-orm
edited Mar 7 at 2:43
Michael Harvey
asked Mar 6 at 22:52
Michael HarveyMichael Harvey
17928
17928
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday
add a comment |
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday
add a comment |
2 Answers
2
active
oldest
votes
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED
isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10
is what will be returned.
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
add a comment |
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last()
only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ
.
@KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
You are probably not usingREPEATABLE READ
. That's the MySQL default, but Django sets its transactions toREAD COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."
– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,READ COMMITTED
). See here.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55033473%2fdoes-select-for-update-see-rows-added-by-another-select-for-update-transaction-a%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED
isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10
is what will be returned.
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
add a comment |
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED
isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10
is what will be returned.
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
add a comment |
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED
isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10
is what will be returned.
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED
isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10
is what will be returned.
answered Mar 6 at 23:47
Kevin Christopher HenryKevin Christopher Henry
23.8k56763
23.8k56763
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
add a comment |
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
I don't think this is right because in my particular case, the SQL django generates queries for all the rows in MyModel, not just the one. So while it is true that the second query will see an inconsistent snapshot, it will still scan for all rows, not just the one, and indeed pick up the new one ~because~ the snapshot is inconsistent. What do you think?
– Michael Harvey
Mar 7 at 4:46
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
@MichaelHarvey: The issue is that the query is not rerun after the lock is released; the rows have already been selected. Look in the documentation I linked to for the example of "undesirable results in Read Committed mode". If you were right that the query is re-run, the results would be different than described.
– Kevin Christopher Henry
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
my response is too long to type here, so I added it to the bottom of my answer. Please let me know what you think if you are interested in continuing this discussion. Thank you for your input.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
I just retested this by getting a lock and doing an insert from a shell, and trying to do the select_for_update then insert from the django app. The app failed because it didn't read the new id, and tried to insert the old id. Your answer is correct.
– Michael Harvey
2 days ago
add a comment |
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last()
only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ
.
@KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
You are probably not usingREPEATABLE READ
. That's the MySQL default, but Django sets its transactions toREAD COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."
– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,READ COMMITTED
). See here.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
add a comment |
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last()
only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ
.
@KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
You are probably not usingREPEATABLE READ
. That's the MySQL default, but Django sets its transactions toREAD COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."
– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,READ COMMITTED
). See here.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
add a comment |
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last()
only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ
.
@KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last()
only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ
.
@KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
edited 2 days ago
answered Mar 7 at 4:39
Michael HarveyMichael Harvey
17928
17928
You are probably not usingREPEATABLE READ
. That's the MySQL default, but Django sets its transactions toREAD COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."
– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,READ COMMITTED
). See here.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
add a comment |
You are probably not usingREPEATABLE READ
. That's the MySQL default, but Django sets its transactions toREAD COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."
– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,READ COMMITTED
). See here.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
You are probably not using
REPEATABLE READ
. That's the MySQL default, but Django sets its transactions to READ COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."– Kevin Christopher Henry
2 days ago
You are probably not using
REPEATABLE READ
. That's the MySQL default, but Django sets its transactions to READ COMMITTED
by default. It's possible to override this in settings, but you should certainly be cautious about that: "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read."– Kevin Christopher Henry
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
I checked the db settings and verified it's REPEATABLE READ. I am not the original owner of this Db, so I assume someone else must have overrode it. I will look into whether changing it is a good option. Thanks for info.
– Michael Harvey
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,
READ COMMITTED
). See here.– Kevin Christopher Henry
2 days ago
It doesn't matter what the database setting is, because this can be set by the client. And every time Django opens a connection it sets the isolation level to what is specified in the settings (or, by default,
READ COMMITTED
). See here.– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Perhaps the difference is that my answer is based on the PostgreSQL documentation, and your test is with MySQL. Their documentation doesn't have enough detail for me to tell.
– Kevin Christopher Henry
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
Oh, in that case I am probably using READ COMMITTED then, because I don't see the isolation level explicitly set in the settings file.
– Michael Harvey
2 days ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55033473%2fdoes-select-for-update-see-rows-added-by-another-select-for-update-transaction-a%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
As an alternative, consider using optimistic concurrency. See my answer here for a bit more detail. As I said there: "This kind of approach can work really well if collisions are rare, and really badly if they are frequent."
– Kevin Christopher Henry
2 days ago
Yes, I don't expect collisions often, so I will actually plan to use this. Thank you so much for your help.
– Michael Harvey
yesterday