If a subquery returns any rows at all,
EXISTS subquery
is TRUE
, and NOT EXISTS subquery
is FALSE
. For example: SQL EXISTS and NULL. If the subquery returns NULL, the EXISTS operator still returns the result set. This is because the EXISTS operator only checks for the existence of row returned by the subquery. It does not matter if the row is NULL or not. In the following example, the subquery returns NULL but the EXISTS operator still evaluates to true.
Traditionally, an
EXISTS
subquery starts with SELECT *
, but it could begin with SELECT 5
or SELECT column1
or anything at all. MySQL ignores the SELECT
list in such a subquery, so it makes no difference. For the preceding example, if
t2
contains any rows, even rows with nothing but NULL
values, the EXISTS
condition is TRUE
. This is actually an unlikely example because a [NOT] EXISTS
subquery almost always contains correlations. Here are some more realistic examples: - What kind of store is present in one or more cities?
- What kind of store is present in no cities?
- What kind of store is present in all cities?
The last example is a double-nested
NOT EXISTS
query. That is, it has a NOT EXISTS
clause within a NOT EXISTS
clause. Formally, it answers the question “does a city exist with a store that is not in Stores
”? But it is easier to say that a nested NOT EXISTS
answers the question “is x
TRUE
for all y
?”Which of these queries is the faster?
NOT EXISTS:
Or NOT IN:
The query execution plan says they both do the same thing. If that is the case, which is the recommended form?
This is based on the NorthWind database.
[Edit]
Just found this helpful article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
I think I'll stick with NOT EXISTS.
ilitiritilitirit
10 Answers
I always default to
NOT EXISTS
.The execution plans may be the same at the moment but if either column is altered in the future to allow
NULL
s the NOT IN
version will need to do more work (even if no NULL
s are actually present in the data) and the semantics of NOT IN
if NULL
s are present are unlikely to be the ones you want anyway.When neither
Products.ProductID
or [Order Details].ProductID
allow NULL
s the NOT IN
will be treated identically to the following query.The exact plan may vary but for my example data I get the following.
A reasonably common misconception seems to be that correlated sub queries are always 'bad' compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.
If
[Order Details].ProductID
is NULL
-able the query then becomesThe reason for this is that the correct semantics if
[Order Details]
contains any NULL
ProductId
s is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.If
Products.ProductID
is also changed to become NULL
-able the query then becomesThe reason for that one is because a
NULL
Products.ProductId
should not be returned in the results except if the NOT IN
sub query were to return no results at all (i.e. the [Order Details]
table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.
Additionally the fact that a single
NULL
can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULL
rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example. This is not the only possible execution plan for a
NOT IN
on a NULL
-able column however. This article shows another one for a query against the AdventureWorks2008
database.For the
NOT IN
on a NOT NULL
column or the NOT EXISTS
against either a nullable or non nullable column it gives the following plan.When the column changes to
NULL
-able the NOT IN
plan now looks likeIt adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on
Sales.SalesOrderDetail.ProductID = <correlated_product_id>
to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL
. As this is under an anti semi join if that one returns any rows the second seek will not occur. However if
Sales.SalesOrderDetail
does not contain any NULL
ProductID
s it will double the number of seek operations required.Community♦
Martin SmithMartin Smith
Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null.
This post explains it very well
When the subquery returns even one null, NOT IN will not match any rows.
The reason for this can be found by looking at the details of what the NOT IN operation actually means.
Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4
is equivalent to
Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE
![Not Not](/uploads/1/2/5/6/125604303/543042171.png)
Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned
buckleybuckley
If the execution planner says they're the same, they're the same. Use whichever one will make your intention more obvious -- in this case, the second.
John MillikinJohn Millikin
James CurranJames Curran
I have a table which has about 120,000 records and need to select only those which does not exist (matched with a varchar column) in four other tables with number of rows approx 1500, 4000, 40000, 200. All the involved tables have unique index on the concerned
Varchar
column. ![Exists Exists](/uploads/1/2/5/6/125604303/994302157.png)
NOT IN
took about 10 mins, NOT EXISTS
took 4 secs.I have a recursive query which might had some untuned section which might have contributed to the 10 mins, but the other option taking 4 secs explains, atleast to me that
NOT EXISTS
is far better or at least that IN
and EXISTS
are not exactly the same and always worth a check before going ahead with code.Yella ChalamalaYella Chalamala
In your specific example they are the same, because the optimizer has figured out what you are trying to do is the same in both examples. But it is possible that in non-trivial examples the optimizer may not do this, and in that case there are reasons to prefer one to other on occasion.
NOT IN
should be preferred if you are testing multiple rows in your outer select. The subquery inside the NOT IN
statement can be evaluated at the beginning of the execution, and the temporary table can be checked against each value in the outer select, rather than re-running the subselect every time as would be required with the NOT EXISTS
statement.If the subquery must be correlated with the outer select, then
NOT EXISTS
may be preferable, since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function.Jeffrey L WhitledgeJeffrey L Whitledge
I was using
and found that it was giving wrong results (By wrong I mean no results). As there was a NULL in TABLE2.Col1.
While changing the query to
gave me the correct results.
Since then I have started using NOT EXISTS every where.
ravish.hackerravish.hacker
They are very similar but not really the same.
In terms of efficiency, I've found the left join is null statement more efficient (when an abundance of rows are to be selected that is)
Onga Leo-Yoda VellemOnga Leo-Yoda Vellem
If the optimizer says they are the same then consider the human factor. I prefer to see NOT EXISTS :)
onedaywhenonedaywhen
It depends..
would not be relatively slow the isn't much to limit size of what the query check to see if they key is in. EXISTS would be preferable in this case.
But, depending on the DBMS's optimizer, this could be no different.
As an example of when EXISTS is better
Greg OgleGreg Ogle
protected by Pரதீப்Mar 9 '15 at 21:32
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Would you like to answer one of these unanswered questions instead?