Data Engineering

MSSQL / Azure Synapse MERGE statement slow performance: Rewrite with UPDATE FROM

Had a case merging a column for two large table around 10M rows each, initial MERGE query, taking 9 mins:

MERGE INTO #Table_Dest
USING #Table_Source
ON #Table_Dest.#Column_Ref1 = #Table_Source.#Column_Ref1
AND #Table_Dest.#Column_Ref2 = #Table_Source.#Column_Ref2
WHEN MATCHED
THEN UPDATE 
SET #Table_Dest.#Column_Target = #Table_Source.#Column_Target;

-- 9mins 

Rewrite with UPDATE FROM, runs 3 times faster:

UPDATE #Table_Dest
SET #Table_Dest.#Column_Target = #Table_Source.#Column_Target
FROM #Table_Dest
LEFT JOIN #Table_Source 
ON #Table_Dest.#Column_Ref1 = #Table_Source.#Column_Ref1
AND #Table_Dest.#Column_Ref2 = #Table_Source.#Column_Ref2;

-- 3 mins

Further rewrite a filtered temp view by WITH and then UPDATE FROM, cuts another half elapsed time:

WITH #TmpView AS (
SELECT
#Table_Source.#Column_Ref1,
#Table_Source.#Column_Ref2,
#Table_Source.#Column_Target
FROM #Table_Source
WHERE #Column_Target IS NOT NULL
)
UPDATE #Table_Dest
SET #Table_Dest.#Column_Target = #TmpView.#Column_Target
FROM #Table_Dest
JOIN #TmpView
ON #TmpView.#Column_Ref1 = #Table_Dest.#Column_Ref1 
AND #TmpView.#Column_Ref2 = #Table_Dest.#Column_Ref2;

-- 83s

Further test the above is equivalent to simply adding the filtering condition into UPDATE FROM: Only the WITH temp view would be useful if more complex subqueries needed for example RANK OVER.

UPDATE #Table_Dest
SET #Table_Dest.#Column_Target = #Table_Source.#Column_Target
FROM #Table_Dest
JOIN #Table_Source 
ON #Table_Dest.#Column_Ref1 = #Table_Source.#Column_Ref1
AND #Table_Dest.#Column_Ref2 = #Table_Source.#Column_Ref2
AND #Table_Source.#Column_Target IS NOT NULL;