Conditional joins
Posted by anotherdavewebster on August 18, 2010
From time to time I come across the situation where I have to outer join from one table to another but I could actually tell from the driving table, whether or not I could expect an entry on the lookup table. Assuming I’m going to be using a Nested Loop join, I actually waste an awful lot of effort probing the index on the lookup table when I already know that I won’t find anything. Allow me to demonstrate.
We start off by creating a driving table and a “lookup” table. It’s important that the sizes of the tables are such that the CBO prefers a nested loop join rather than a hash join else the whole exercise is rather pointless. You’ll also have to forgive the old school “analyze”, I’m just lazy.
CREATE TABLE driving_table AS SELECT 'meaningless text' text_column , DECODE(MOD(rownum, 10), 0, 'Y', 'N') should_i_join_yn , rownum an_id FROM DUAL CONNECT BY ROWNUM <= 50; CREATE TABLE lookup_table AS SELECT ROWNUM * 10 an_id , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column FROM DUAL CONNECT BY ROWNUM <= 100000; alter table lookup_table add constraint lookup_table_pk primary key (an_id); analyze table driving_table compute statistics; analyze table lookup_table compute statistics;
Note the use of DUAL with a CONNECT BY to generate my data. See n-rows-please for a very short discussion.
The query I want to run is:
SELECT drv.text_column , lup.text_column lup_text_column FROM driving_table drv, lookup_table lup WHERE drv.an_id = lup.an_id(+);
But say I have data on the driving table which tells me if there’s any chance of me making the join. The rather crudely named “should_i_join_yn” column serves as an example. I can write my query as:
SELECT drv.text_column , lup.text_column lup_text_column FROM driving_table drv, lookup_table lup WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);
So I ensure that my join column from the driving table resolves to NULL when I know I can’t join. Fortunately, Oracle is no fool and spots that NULL can’t ever join to my lookup table and doesn’t bother doing the lookup. Using Mr Kyte’s runstats (see Useful Links) I can pick out a few key metrics to demonstrate the point. First of all, the test:
DECLARE CURSOR c_standard IS SELECT drv.text_column , lup.text_column lup_text_column FROM driving_table drv, lookup_table lup WHERE drv.an_id = lup.an_id(+); CURSOR c_conditional IS SELECT drv.text_column , lup.text_column lup_text_column FROM driving_table drv, lookup_table lup WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+); BEGIN runstats_pkg.rs_start; FOR i IN 1 .. 10000 LOOP FOR rec IN c_standard LOOP NULL; END LOOP; END LOOP; runstats_pkg.rs_middle; FOR i IN 1 .. 10000 LOOP FOR rec IN c_conditional LOOP NULL; END LOOP; END LOOP; runstats_pkg.rs_stop; END;
The plans show the difference in the access predicates. The standard query is just a simple outer join:
PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (% -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 2400 | 53 | 1 | NESTED LOOPS OUTER | | 50 | 2400 | 53 | 2 | TABLE ACCESS FULL | DRIVING_TABLE | 50 | 900 | 3 | 3 | TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE | 1 | 30 | 1 |* 4 | INDEX UNIQUE SCAN | LOOKUP_TABLE_PK | 1 | | 0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("DRV"."AN_ID"="LUP"."AN_ID"(+))
and the conditional joining version shows the DECODE in the access predicate:
PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (% -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 2450 | 53 | 1 | NESTED LOOPS OUTER | | 50 | 2450 | 53 | 2 | TABLE ACCESS FULL | DRIVING_TABLE | 50 | 950 | 3 | 3 | TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE | 1 | 30 | 1 |* 4 | INDEX UNIQUE SCAN | LOOKUP_TABLE_PK | 1 | | 0 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("LUP"."AN_ID"(+)=DECODE("DRV"."SHOULD_I_JOIN_YN",'Y',"DRV"."AN_ID",NULL))
Output from the test includes the statistics:
Run1 ran in 145 hsecs Run2 ran in 103 hsecs run 1 ran in 140.78% of the time
So it’s a bit quicker which we’d expect from the IO stats:
Name Run1 Run2 Diff STAT...index fetch by key 500,000 50,000 -450,000 STAT...session logical reads 600,027 140,033 -459,994
So, in summary, if we’re using Nested Loop joins and we have enough information on the driving table to tell us that no join is possible, I can avoid even looking.
This is particularly useful for some super and sub type model implementations where a driving table can have numerous lookups depending on the “type” of the driving record. We can conditionally join to just one of the sub type tables. I also have another example where I join to the same table twice, the second time conditionally. Any thoughts as to why that might give a MASSIVE performance benefit? I’ll write it up when I get some time.
ezuall said
This is one of those posts that I can just tell will be very useful at some point in future. Wish I had a problem to apply it to now, while the information is still fresh in my mind.
Good work, Martin Widlake suggested your blog so I hit subscribe straight away.
Conditional joins: Part 2 « Dave Webster's Blog said
[…] from a chronological perspective but I’ve written them the other way round simply because Part 1 is a little more general whereas Part 2, as useful as it is, solves a very specific […]