Dave Webster's Blog

Tinkering with Oracle

Conditional joins

Posted by anotherdavewebster on August 18, 2010

From time to time I come across the situation where I have to outer join from one table to another but I could actually tell from the driving table, whether or not I could expect an entry on the lookup table. Assuming I’m going to be using a Nested Loop join, I actually waste an awful lot of effort probing the index on the lookup table when I already know that I won’t find anything. Allow me to demonstrate.

We start off by creating a driving table and a “lookup” table.  It’s important that the sizes of the tables are such that the CBO prefers a nested loop join rather than a hash join else the whole exercise is rather pointless.  You’ll also have to forgive the old school “analyze”, I’m just lazy.

CREATE TABLE driving_table
AS
SELECT 'meaningless text' text_column
     , DECODE(MOD(rownum, 10), 0, 'Y', 'N') should_i_join_yn
     , rownum an_id
  FROM DUAL CONNECT BY ROWNUM <= 50;

CREATE TABLE lookup_table
AS
SELECT ROWNUM * 10 an_id
     , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column
  FROM DUAL
CONNECT BY ROWNUM <= 100000;

alter table lookup_table add constraint lookup_table_pk primary key (an_id);

analyze table driving_table compute statistics;
analyze table lookup_table compute statistics;

Note the use of DUAL with a CONNECT BY to generate my data.  See n-rows-please for a very short discussion.

The query I want to run is:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE drv.an_id = lup.an_id(+);

But say I have data on the driving table which tells me if there’s any chance of me making the join. The rather crudely named “should_i_join_yn” column serves as an example. I can write my query as:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);

So I ensure that my join column from the driving table resolves to NULL when I know I can’t join. Fortunately, Oracle is no fool and spots that NULL can’t ever join to my lookup table and doesn’t bother doing the lookup. Using Mr Kyte’s runstats (see Useful Links) I can pick out a few key metrics to demonstrate the point. First of all, the test:

DECLARE
   CURSOR c_standard
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE drv.an_id = lup.an_id(+);

   CURSOR c_conditional
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);
BEGIN
   runstats_pkg.rs_start;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_standard LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_middle;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_conditional LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_stop;
END;

The plans show the difference in the access predicates. The standard query is just a simple outer join:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2400 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2400 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   900 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("DRV"."AN_ID"="LUP"."AN_ID"(+))

and the conditional joining version shows the DECODE in the access predicate:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2450 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2450 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   950 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("LUP"."AN_ID"(+)=DECODE("DRV"."SHOULD_I_JOIN_YN",'Y',"DRV"."AN_ID",NULL))

Output from the test includes the statistics:

Run1 ran in 145 hsecs
Run2 ran in 103 hsecs
run 1 ran in 140.78% of the time

So it’s a bit quicker which we’d expect from the IO stats:

Name                                  Run1        Run2        Diff
STAT...index fetch by key          500,000      50,000    -450,000
STAT...session logical reads       600,027     140,033    -459,994

So, in summary, if we’re using Nested Loop joins and we have enough information on the driving table to tell us that no join is possible, I can avoid even looking.

This is particularly useful for some super and sub type model implementations where a driving table can have numerous lookups depending on the “type” of the driving record. We can conditionally join to just one of the sub type tables. I also have another example where I join to the same table twice, the second time conditionally. Any thoughts as to why that might give a MASSIVE performance benefit? I’ll write it up when I get some time.

2 Responses to “Conditional joins”

  1. ezuall said

    This is one of those posts that I can just tell will be very useful at some point in future. Wish I had a problem to apply it to now, while the information is still fresh in my mind.

    Good work, Martin Widlake suggested your blog so I hit subscribe straight away.

  2. […] from a chronological perspective but I’ve written them the other way round simply because Part 1 is a little more general whereas Part 2, as useful as it is, solves a very specific […]

Leave a comment