Insert or Update pattern for Sql Server

about 1 year ago

A very common problem that is surprisingly difficult to solve properly with SQL is the UPDATE or INSERT problem (sometimes called upsert). I would like to insert a row in to the table if the key does not exist and update a row if a key exists.

Oracle , DB2 and even Sqlite have SQL syntax that allows you to solve this problem in a single statement. This is not the case for SQL server. You have to do this manually.

So, for the table:


create table t (pk int primary key, hitCount int)

The following pattern is very commonly used:


if exists (select * from t where pk = @id)

begin

   update t set hitCount = hitCount+1

   where pk = @id

end 

else

begin 

   insert into t values (@id, 1) 

end

The above code works just fine from single threaded applications, however when called from multiple threads, you start getting primary key violations.

This happens because the batch is not executed atomically. Two or more threads could “think” that there is no data for the existing key and subsequently try to insert data.

To overcome this issue you have two options. The first is, handle primary key violations. The second option is to fix up the transactional integrity of the batch.

Lets explore the second option.

The first step is to put all the SQL in a begin tran commit tran block.


begin tran

if exists(select * from t where pk = @id)

begin

   update t set hitCount = hitCount+1

   where pk = @id

end

else

begin

   insert t (pk, hitCount)

   values (@id, 1)

end

commit tran

This does not solve the problem. Primary key violations keep on happening. The reason for this is that the default isolation level for transactions is “read committed”, meaning that you are allowed to read all committed transactions. However your reads are not repeatable.

So, after you read up about transaction isolations levels you may decide to fix up the code and put it in a serializable transaction:


set transaction isolation level serializable

begin tran

if exists(select * from t where pk = @id)

begin

   update t set hitCount = hitCount+1

   where pk = @id

end

else

begin

   insert t (pk, hitCount)

   values (@id, 1)

end

commit tran

After doing this you stop getting primary key violations, however … you create a much bigger problem. Now you may notice you start getting deadlocks.

Why is this happening? Well, the serializable transaction means the select statement will acquire a shared locks on the table. You will get a RangeS-S lock on the key when the data is missing (and a key / page lock if the data is there). The shared locks are compatible with other shared locks, meaning other threads could acquire the same lock on the same key/page. Later on in the transaction you need to upgrade the lock to an exclusive lock, so you can change the data. However, this can not be done if other transactions are holding shared locks. Which leads to deadlocks, For example:

  1. Key is missing
  1. Transaction 1 acquires RangeS-S on key 1
  1. Transaction 2 acquires RangeS-S on key 1
  1. Transaction 1 wants to upgrade the lock to a RangeX-X (starts waiting)
  1. Transaction 2 wants to upgrade the lock to a RangeX-X (starts waiting)
  1. SQL Server detects a deadlock and kills off one of the transactions

So, we need to clean up the transaction a little more. During the execution of the select statement we would like to block execution the same select statement on other threads. A handy hint we can use is the UPDLOCK hint, it means that we will acquire update locks instead of shared locks. This does not block other readers, but will block statements which need to also hold update locks (like update statements or queries using the UPDLOCK hint).

The result of this is that we have created a critical section in our transaction.

So here is the first good way to solve the problem:


begin tran

if exists (select * from t with (updlock,serializable) where pk = @id)

   begin

   update t set hitCount = hitCount+1

   where pk = @id

end

else

begin

   insert t (pk, hitCount)

   values (@id, 1)

end

commit tran

A second method we can use is:


begin tran

   update t with (serializable)

   set hitCount = hitCount + 1

   where pk = @id

   if @@rowcount = 0

   begin

      insert t (pk, hitCount)

      values (@id,1)

   end

commit tran

Keep in mind the serializable hint is critical, without it we will be very prone to deadlocks, for reasons similar to what was described above.

Both of these methods exhibited very similar performance characteristics during my testing. Keep in mind that in a production system they may perform differently.

Thank you Erland for catching out many bugs in the first draft of this article!

Comments

Mladen

9 months ago

this method can come in handy too

Sam

9 months ago

Mladen, Looks interesting, but DB level mutexes seem to go against everything my inside dba voice tells me.

I mean, how can I be sure 1 million percent that they will be released? Is it the correct level of granularity? meaning I would like to be inserting and updating stuff in two different areas in the same table at the same time. How would I ever port code that uses mutexes to another db?

Mladen

9 months ago

oh yes… those are all valid questions :)) like alwas the answer is: it depends. app locks in sql server are pretty well built and work excellent. i haven’t seen them fail yet. especially if you have set them up for the transaction context.

i don’t count app locks as a granularity level since they never place locks on the data itself. just on the portion of the execution code.

about porting… well i don’t care about that :))) i don’t deal with other db’s

Kent

9 months ago

Excellent article! So, just to make sure I understand things, the point of the SERIALIZABLE hints in your two methods is to make sure the update lock is held for the entire transaction, right?

Sam

8 months ago

Kent,

Thank you! The point of the serializable hint is to ensure the transactional consistency of the block.

take the following example:

begin tran 
select hitCount from t where id = 1 
—returns 1 
—in another query analyzer run “update t set hitCount = 5” 
select hitCount from t where id = 1 
—returns 5 
commit tran

so, the begin tran ensures that either everything happens or nothing, the serializable isolation level ensures that the results of all reads within a transaction are not affected by other transactions.

Cheers Sam

Patrick Greene

3 months ago

You just saved me a long evening trying to figure out how to do this… THANK YOU!

iwan

about 1 month ago

Sam, What are your thoughts on the fact that it needs to perform 2 index seeks? One to check if it exists, then the second one to update?

What do you think of this approach?

CREATE TABLE iwan_test ( pk int not null primary key, cnt int )

CREATE PROCEDURE sp_iwan_update_stats (@pk INT)
AS
BEGIN

    set nocount on
    begin

        UPDATE iwan_test SET cnt = cnt + 1 WHERE pk = @pk
        IF @@ROWCOUNT = 0
        BEGIN TRY
            INSERT INTO iwan_test VALUES (@pk, 1)
        END TRY
        BEGIN CATCH
            if ( @@ERROR = 2627 ) -- pk violation
            UPDATE iwan_test SET cnt = cnt + 1 WHERE pk = @pk
        END CATCH
    end

END

EXEC dbo.sp_iwan_update_stats 1
EXEC dbo.sp_iwan_update_stats 1

Sam

about 1 month ago

iwan,

(note: the technique you listed will only work on SQL 2005 and up)

That code is a little risky since there is no transaction defined you can get some strange results.

Say for example you start throwing in deletes randomly, you may get a situation where you will miss a count. The more scary thing is that it may be a little more prone to deadlocks cause you have less control over the locks you are consuming.

I would probably recommend using the second technique I listed cause its a little bit safer, and only seeks once for updates. But to be honest with you, there is little point in splitting hairs. Your way is probably save enough most of the time. I do however strongly recommend you test it for deadlocks, cause there may be a situation where your update statements can deadlock against each other.

Cheers Sam

Arjan

19 days ago

very nice article. Works like a charm!

thnx