Dave Webster's Blog

Tinkering with Oracle

Archive for August, 2010

Concatenating strings from multiple rows and an insight into XML in the DB

Posted by anotherdavewebster on August 30, 2010

This little trick came my way courtesy of a colleague of mine named Matt Cobley. Over the years, I’ve frequently needed to produce a comma delimited list of strings from multiple rows of a query. This is easy enough if you know how many rows you have but if not, I always end up writing a little piece of PL/SQL containing a cursor which builds a string which I then output to dbms_output. All rather long winded and more than a little limiting. That was until Matt sent an eMail (on a subject I no longer recall) containing the little gem XMLAGG which works something like this:

SELECT RTRIM( XMLAGG( XMLELEMENT( any_old_name, table_name || ',' ) ORDER BY last_analyzed DESC ).EXTRACT( '//text()' )
            , ','
            )
  FROM dba_tables
 WHERE ROWNUM < 20 AND owner = 'SYS'

Now I don’t suppose for a minute that it’s cheap but it really is very easy and it provides us with a nice clean:
“CLU$,TAB$,IND$,COL$,UNDO$,USER$,TSQ$,FILE$,VIEWTRCOL$,TYPE_MISC$,SUBCOLTYPE$,REFCON$,OPQTYPE$,NTAB$,LOB$,LIBRARY$,ICOL$,COLTYPE$,ATTRCOL$”.

I made a mental note at the time to have a dig around the XML stuff in Oracle as I’m sure there are lots of other treats in there. I sorry to say that I’ve not yet got started.

UPDATE: A couple of you have kindly pointed out that 11r2 contains LISTAGG, an extension to the wonderful analytics functionality, which does the same job in a more elegant way.

Advertisements

Posted in tips and tricks | Leave a Comment »

“What’s a bind variable?”

Posted by anotherdavewebster on August 26, 2010

It would be reasonable to assume that everything that could ever be written about bind variables has already been written. Certainly regarding the concept of them and why they are a good thing anyway. A little while back, I was digging around on a system which was suffering from very high parse rates. I did the normal stuff and took a look in v$sql:

SELECT   ( SELECT username
            FROM dba_users
           WHERE user_id = parsing_user_id ) username
       , parsing_user_id
       , SUBSTR( sql_text, 1, 30 )
       , COUNT( * )
       , MAX( TO_DATE( first_load_time, 'YYYY-MM-DD/HH24:MI:SS' ) )
       , MAX( last_active_time )
    FROM v$sql
GROUP BY parsing_user_id, SUBSTR( sql_text, 1, 30 )
  HAVING COUNT( * ) > 5
ORDER BY 4 DESC

Sure enough, I found a couple of statements with many thousands of variations, differing only by a handful of source table names and many different literal values. A quick look at the code confirmed that we had a piece of dynamic SQL (to manage a few different source tables) which concatenated a literal value into the SQL string. Something along the lines of:

execute immediate 'insert into table_a select * from ' || l_tab_name || ' where id = ' || l_id;

“Great stuff” I thought, I’ll go and find the developer who wrote that piece of code and ask him to switch to using a bind variable for the id but obviously retain the dynamic nature of the source table name. So I wrote him a quick eMail asking him to use a bind variable and the response rather surprised me. When I tell people the story, I ask them what they think the most worrying answer would be. To a man, they all go for “What’s a bind variable?”. That would indeed be a worrying response, but not unprecedented. The response I actually got was “I have”.

Posted in Uncategorized | 2 Comments »

Deterministic functions sometimes are Deterministic, even on 10.2

Posted by anotherdavewebster on August 24, 2010

A few years back, some colleagues of mine were bemoaning the fact that functions created as DETERMINISTIC weren’t actually implemented as such in 10.2. Just to be clear, this is no longer the case with Oracle 11. DETERMINISTIC functions actually are deterministic:

CREATE OR REPLACE FUNCTION non_det_fn( i_parameter IN NUMBER )
   RETURN NUMBER
IS
BEGIN
   DBMS_OUTPUT.put_line( 'Called non_det_fn with i_parameter = ' || i_parameter );
   RETURN i_parameter;
END;

CREATE OR REPLACE FUNCTION det_fn( i_parameter IN NUMBER )
   RETURN NUMBER DETERMINISTIC
IS
BEGIN
   DBMS_OUTPUT.put_line( 'Called det_fn with i_parameter = ' || i_parameter );
   RETURN i_parameter;
END;

DECLARE
   l_result   PLS_INTEGER;
BEGIN
   dbms_output.put_line('Standard function');
   FOR i IN 1 .. 5 LOOP
      l_result    := non_det_fn( 1 );
   END LOOP;

   dbms_output.put_line('Deterministic function');
   FOR i IN 1 .. 5 LOOP
      l_result    := det_fn( 1 );
   END LOOP;
END;

You get pretty much what you would expect from dbms_output:

Standard function
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Deterministic function
Called det_fn with i_parameter = 1

On 10.2, Oracle doesn’t yet implement the deterministic functionality and we get:

Standard function
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 1
Deterministic function
Called det_fn with i_parameter = 1
Called det_fn with i_parameter = 1
Called det_fn with i_parameter = 1
Called det_fn with i_parameter = 1
Called det_fn with i_parameter = 1

Now this is not news of course but I did stumble on a rather surprising case where Oracle did implement the functionality in 10.2:

SELECT non_det_fn( mod_number )
     , det_fn( mod_number )
  FROM ( SELECT    MOD( ROWNUM, 3 ) mod_number
              FROM DUAL
        CONNECT BY ROWNUM <= 10 );

dbms_output gives us:

Called non_det_fn with i_parameter = 1
Called det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 2
Called det_fn with i_parameter = 2
Called non_det_fn with i_parameter = 0
Called det_fn with i_parameter = 0
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 2
Called non_det_fn with i_parameter = 0
Called non_det_fn with i_parameter = 1
Called non_det_fn with i_parameter = 2
Called non_det_fn with i_parameter = 0
Called non_det_fn with i_parameter = 1

It’s not the same as the PL/SQL implementation in 11.1 but it’s certainly more than I was expecting and may well come in useful at some point. Of course, you can make all functions “deterministic” by calling them from Scalar Subqueries but that’s another matter…

Posted in Uncategorized | Leave a Comment »

Why don’t more people know about the benefits of Scalar Subqueries?

Posted by anotherdavewebster on August 24, 2010

Every now and then, I get involved in interviewing Oracle developers when we’re specifically after “performance” skills. I ask a range of fairly expansive questions but I always ask if they know what Scalar Subqueries are and why they can deliver amazing performance benefits. Not many people seem to have heard of Scalar Subqueries are and even fewer are aware of the caching benefits.

I originally planned to go through all the usual stuff to demonstrate how the caching works. I was then going to link you to a page I found a few years back which contains some detail on how effective the caching is. On closer inspection of the page, I realised there was no point in writing it up myself as Mr Chadderton covers it all rather nicely.

Have a read of this and welcome to the wonderful world of Scalar Subqueries!

I would just like to add that the article from Martin focusses on caching the results of functions (see here for another option) but I tend to use Scalar Subqueries when I’m repeatedly looking up the same data over and over again:

CREATE TABLE fact_table
AS
SELECT 'some data' text_column
     , mod(rownum, 10) a_fk_id
  FROM DUAL CONNECT BY ROWNUM <= 5000;

CREATE TABLE lookup_table
AS
SELECT ROWNUM an_id
     , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column
  FROM DUAL
CONNECT BY ROWNUM <= 100000;

alter table lookup_table add constraint lookup_table_pk primary key (an_id);

analyze table fact_table compute statistics;
analyze table lookup_table compute statistics;

DECLARE
   CURSOR c_standard
   IS
      SELECT fct.text_column
           , lup.text_column lup_text_column
        FROM fact_table fct, lookup_table lup
       WHERE fct.a_fk_id = lup.an_id(+);

   CURSOR c_ssq
   IS
      SELECT fct.text_column
           , ( SELECT lup.text_column
                FROM lookup_table lup
               WHERE fct.a_fk_id = lup.an_id ) lup_text_column
        FROM fact_table fct;
BEGIN
   runstats_pkg.rs_start;

   FOR i IN 1 .. 100 LOOP
      FOR rec IN c_standard LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_middle;

   FOR i IN 1 .. 100 LOOP
      FOR rec IN c_ssq LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_stop;
END;

So let’s just look at the plans quickly. The standard version is a Hash Join:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |  5000 |   200K|   102   (3)|
|*  1 |  HASH JOIN OUTER   |              |  5000 |   200K|   102   (3)|
|   2 |   TABLE ACCESS FULL| FACT_TABLE   |  5000 | 55000 |     5   (0)|
|   3 |   TABLE ACCESS FULL| LOOKUP_TABLE |   100K|  2929K|    95   (2)|
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("FCT"."A_FK_ID"="LUP"."AN_ID"(+))

But the Scalar Subquery is quite different:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |  5000 | 55000 |     5   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     2   (0)|
|*  2 |   INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     1   (0)|
|   3 |  TABLE ACCESS FULL          | FACT_TABLE      |  5000 | 55000 |     5   (0)|
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("LUP"."AN_ID"=:B1)

The runstats timing confirms the big win

Run1 ran in 149 hsecs
Run2 ran in 48 hsecs
run 1 ran in 310.42% of the time

and a big lump of that comes from reducing the IO (just about my favourite tuning aim):

Name                                                      Run1        Run2        Diff
STAT...session logical reads                            53,727       8,535     -45,192

Posted in performance | Leave a Comment »

Conditional joins

Posted by anotherdavewebster on August 18, 2010

From time to time I come across the situation where I have to outer join from one table to another but I could actually tell from the driving table, whether or not I could expect an entry on the lookup table. Assuming I’m going to be using a Nested Loop join, I actually waste an awful lot of effort probing the index on the lookup table when I already know that I won’t find anything. Allow me to demonstrate.

We start off by creating a driving table and a “lookup” table.  It’s important that the sizes of the tables are such that the CBO prefers a nested loop join rather than a hash join else the whole exercise is rather pointless.  You’ll also have to forgive the old school “analyze”, I’m just lazy.

CREATE TABLE driving_table
AS
SELECT 'meaningless text' text_column
     , DECODE(MOD(rownum, 10), 0, 'Y', 'N') should_i_join_yn
     , rownum an_id
  FROM DUAL CONNECT BY ROWNUM <= 50;

CREATE TABLE lookup_table
AS
SELECT ROWNUM * 10 an_id
     , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column
  FROM DUAL
CONNECT BY ROWNUM <= 100000;

alter table lookup_table add constraint lookup_table_pk primary key (an_id);

analyze table driving_table compute statistics;
analyze table lookup_table compute statistics;

Note the use of DUAL with a CONNECT BY to generate my data.  See n-rows-please for a very short discussion.

The query I want to run is:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE drv.an_id = lup.an_id(+);

But say I have data on the driving table which tells me if there’s any chance of me making the join. The rather crudely named “should_i_join_yn” column serves as an example. I can write my query as:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);

So I ensure that my join column from the driving table resolves to NULL when I know I can’t join. Fortunately, Oracle is no fool and spots that NULL can’t ever join to my lookup table and doesn’t bother doing the lookup. Using Mr Kyte’s runstats (see Useful Links) I can pick out a few key metrics to demonstrate the point. First of all, the test:

DECLARE
   CURSOR c_standard
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE drv.an_id = lup.an_id(+);

   CURSOR c_conditional
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);
BEGIN
   runstats_pkg.rs_start;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_standard LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_middle;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_conditional LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_stop;
END;

The plans show the difference in the access predicates. The standard query is just a simple outer join:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2400 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2400 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   900 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("DRV"."AN_ID"="LUP"."AN_ID"(+))

and the conditional joining version shows the DECODE in the access predicate:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2450 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2450 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   950 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("LUP"."AN_ID"(+)=DECODE("DRV"."SHOULD_I_JOIN_YN",'Y',"DRV"."AN_ID",NULL))

Output from the test includes the statistics:

Run1 ran in 145 hsecs
Run2 ran in 103 hsecs
run 1 ran in 140.78% of the time

So it’s a bit quicker which we’d expect from the IO stats:

Name                                  Run1        Run2        Diff
STAT...index fetch by key          500,000      50,000    -450,000
STAT...session logical reads       600,027     140,033    -459,994

So, in summary, if we’re using Nested Loop joins and we have enough information on the driving table to tell us that no join is possible, I can avoid even looking.

This is particularly useful for some super and sub type model implementations where a driving table can have numerous lookups depending on the “type” of the driving record. We can conditionally join to just one of the sub type tables. I also have another example where I join to the same table twice, the second time conditionally. Any thoughts as to why that might give a MASSIVE performance benefit? I’ll write it up when I get some time.

Posted in performance | 2 Comments »

n rows please

Posted by anotherdavewebster on August 9, 2010

Really nice and easy one to get us started. Say you want to generate ten thousand rows of data. The way I normally see this kind of thing done is by selecting values from dba_objects where rownum <= 10000. There’s nothing wrong with this approach of course but what if you need more rows than you have objects in the database? You can do some sort of cartesian product thing but even suggesting that is making me feel a little unclean. I prefer:

SELECT NULL
  FROM DUAL
CONNECT BY ROWNUM <= 10000;

It’s simple, unlimited and even a little cheaper to boot.

The first time I actually needed this, I was trying to generate all of the seconds between two times:

SELECT to_date('01/08/2010 14:26:09','DD/MM/YYYY HH24:MI:SS') /* low */ + ( rownum / 86400 ) the_date
  FROM DUAL
CONNECT BY ROWNUM <= ( to_date('01/08/2010 14:28:29','DD/MM/YYYY HH24:MI:SS') /* high */
                     - to_date('01/08/2010 14:26:09','DD/MM/YYYY HH24:MI:SS') /* low  */ ) * 86400

First one down.  That wasn’t too hard was it?

Posted in tips and tricks | Leave a Comment »

Starter for 10

Posted by anotherdavewebster on August 9, 2010

The pressure of what to open with!  I won’t lie to you, I have a list of topics I reckon would make interesting reading as I wanted to get some idea as to whether or not I should even start this blog. Should I start with the first topic I thought of or the one I consider to be the best?  I want the first one to be fairly interesting but I don’t want to start with the best I have and then fail to maintain that level.  Something in the middle then.  Something which doesn’t take too long to write.  But now the pressure is even more intense.  You’re all going to know that I gave this real thought.  I have to go and lie down….

Posted in Uncategorized | Leave a Comment »