MySQL and an atomic 'check ... on none insert'

December 9, 2017

This post could have been titled:

“Why SELECT ... FOR UPDATE doesn’t work on non-existent rows like you might think it does”.

FastMail has been a long term user of the MySQL database server, and in particular the InnoDB storage engine. InnoDB provides many core database features (ACID, transactions, row-level locking, etc).

One thing that comes up from time to time is the desire to do the following sequence atomically:

  1. Check if a row with a particular id exists in a table
  2. If yes, fetch the row data
  3. If not, perform an expensive calculation (e.g. external web request) to get the data and insert it into the table

The important point is you want to do this atomically. That is: if one process checks for the row, but another process is already doing the “expensive calculation”, it blocks until that process completes and has inserted the row into the table.

So if you have a table:

CREATE TABLE foo (
    Id INT NOT NULL PRIMARY KEY,
    Data BLOB
); 

The primary key ensures Id is unique.

One thought would be that you could do a SELECT and then INSERT IGNORE if the row doesn’t exist. This doesn’t return an error if the INSERT would result in a duplicate key, but does mean that the “expensive calculation” might be done multiple times.

After looking at some MySQL documentation, you might think a transaction with a SELECT ... FOR UPDATE statement is what you want. This would check for the data row, but lock the row gap if it doesn’t exist. Excellent!

Unfortunately this doesn’t work.

Quoting directly from an answer by Heikki Tuuri (the original developer of InnoDB)

http://forums.mysql.com/read.php?22,10506,11347#msg-11347

READ COMMITTED and REPEATABLE READ only affect plain SELECTs, not
SELECT … FOR UPDATE.

In the 4.1.8 execution:

1.mysql> select * from T where C = 42 for update;

2.mysql> select * from T where C = 42 for update;

-- The above select doesn't block now.

1.mysql> insert into T set C = 42;

-- The above insert blocks.

2.mysql> insert into T set C = 42;

ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction 

the explanation is that the X lock on the ‘gap’ set in SELECT … FOR
UPDATE is purely ‘inhibitive’. It blocks inserts by other users to the
gap. But it does not give the holder of the lock a permission to insert.

Why the inhibitiveness: if we have three index records:

aab <first gap> aba <second gap> acc 

there are two gaps there. Suppose user 1 locks the first gap, and user 2
locks the second gap.

But if ‘aba’ is delete-marked, purge can remove it, and these two gaps
merge. Then BOTH user 1 and user 2 have an exclusive lock on the same
gap. This explains why a lock on gap does not give a user a permission
to insert. There must not be locks by OTHER users on the gap, only then
is the insert allowed.

Best regards,

Heikki
Oracle Corp./Innobase Oy

Although this post is over 12 years old, it appears to be still relevant to InnoDB today, as any quick testing will show.

In fact in some common scenarios, it’s even worse than you might expect. Say you have one table that has an auto increment primary key. You then use that created id to calculate and insert “expensive to calculate” data into another table with the same id. The first FOR UPDATE locks the gap for any id’s after it as well.

Say table T has no C’s > 41.

t1> select * from T where C = 42 for update;
t2> select * from T where C = 43 for update;
t1> mysql> insert into T set C = 42;
-- The above insert blocks.
t2> insert into T set C = 43;
ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction 

There’s some more commentary online about this issue:

The options appear to be:

Sometimes this is rather frustrating, but there don’t appear to be any other good options.

Why FastMail uses MySQL InnoDB

For the curious, when we originally started developing FastMail in 2000 we started with PostgreSQL. However at the time PostgreSQL had a limit of 8kb of data per-row. Because (again, at the time) we stored emails in the database, this was not viable and we switched to MySQL InnoDB. As these things are, it was only a short while later that we switched to IMAP and Cyrus for email storage.

Sometimes the state of a product is the sum of a series of seemingly inconsequential decisions at the time.