Dave Webster's Blog

Tinkering with Oracle

Archive for the ‘performance’ Category

Do you really need another index?

Posted by anotherdavewebster on August 19, 2013

I was recently asked to take a look at a problem with a nightly batch job which deletes historical data from some transaction tables.  The requirement is as follows:

  • I would like to retain three years of data in my transaction table.
  • Each night, I’d like to delete data older than 3 years which means, providing my nightly job doesn’t fail, I’ll be deleting one day of data.
  • One day of data is around two million rows and everyone would feel more comfortable if we deleted that data in chunks rather than all two million rows in a single transaction.
  • I don’t have partitioning and/or it’s not appropriate to partition the data.
  • I don’t have an index on transaction_date (the column used to determine the “age” of the data).

The general approach that had been taken was along the lines of:
DELETE FROM my_table WHERE transaction_date < trunc( add_months( sysdate, -36 ) ) AND rownum < 10000;

They would run this in a loop, committing after each delete until no rows were deleted.

I’m sure you can see the problem: with no index on transaction_date, we full scan the table around 2m/10k times.  Even if we were happy with a single 2m row delete we’d still be taking a very long time to find the data to be deleted and poor old read consistency would be working it’s little socks off for the OLTP application reading and writing to this table.

The developers wanted to add an index in transaction_date.  There’s a lot of sense in this: it would allow me to very efficiently find each batch of 10k rows I wanted to delete.  The trouble is that I hate adding indexes, especially to big tables like this and even more so when they are there for a single process.  Do I really need that extra index?

Not surprisingly, I had another suggestion (else what would be the point of this blog item?).  There are other indexes already present on the table.  There is one on account_id and one on transaction_id.  You probably have a pretty good idea what these columns are and it’s the second one, the Primary Key that we’re interested in.  As is fairly common, this is a number generated from a sequence at the time of insert.  You may well see where I’m going here but, if not, then please read on.

Because the transaction_id is sourced from a sequence then it will increase in a generally contiguous manner*.  The order of the rows sorted by transaction_id will be similar (although not exactly the same) to the order of the rows sorted by transaction_date.  Although the transaction_id index can’t exactly identify the rows with the transaction_date I want, it can get me very close.  So I rewrite the delete statement as follows:

DELETE
FROM my_table
WHERE transaction_id < ( SELECT min( transaction_id) + 20000
FROM my_table )
AND transaction_date < trunc( add_months( sysdate, -36 ) )
AND rownum < 10000;

So I can using the PK index to read the first 20k rows from the table, via the index.  Obviously this 20k can be played with a little bit depending on how contiguous the PK values are and also how much the order of transaction_id can deviate from the order of transaction_date.  As before, this statement should be run until no rows have been deleted.

Obviously there are cases where we can’t use this approach and we may well need to code some additional logic to cope with large gaps in sequence numbers but, in the case I was trying to solve, we avoided creating another index.

* This sequence does not CYCLE.  If it did then it’s time to go back to the drawing board.  We’re also running on a single instance DB so we don’t have to contend with multiple RAC nodes having multiple caches of the sequence values.

Posted in Patterns, performance | Leave a Comment »

How best to get data from Oracle into our Java code?

Posted by anotherdavewebster on August 2, 2013

I’m constantly surprised by how little information I find on the internet about this so I thought I’d give a short summary of what we consider best practice to be and what other options we’ve looked at.  Let’s take the common case of wanting to return the results of a SQL query to a the client Java code.  There’s just boat loads of this code so surely it’s worth trying to make it efficient.  In terms of performance, I’ve not found anything better than a PreparedStatement wrapping the query and executing over JDBC.  It seems that Oracle have optimised this route as it’s probably the most common.  They give us statement caching, fetch and prefetch size which all help.  I’m not really going to talk much about this approach.  Instead, I’m going to look at using a PL/SQL API (stored procedures) to the data which provide a number of advantages over executing queries directly from Java:

  • Impact analysis for schema changes is improved as packages will not compile if objects are changed.
  • It allows a comprehensive DB Unit Test to be constructed for the SQL.
  • It provides a level of abstraction from the DB allowing some DB changes to be made “transparently” to the middle-tier.
  • It allows security to be more tightly controlled as access is only granted to packages and not underlying tables.
  • It allows me to make multiple procedure calls and queries which result in a single result set.
  • There’s no “danger” of Java coders writing SQL queries.  There you go, I’ve said it, it’s out there.  You were all thinking it anyway.

So I think it’s fair to say that there are some advantages of a PL/SQL API.  The question is, what’s the best way of passing the data back to Java?  I’ve tried a few things over the years:

  • Returning data in an object array as a return value from a function or OUT parameter of a procedure.  It probably comes as no great surprise that the creation of these objects isn’t cheap.  My tests showed a not insignificant overhead.  It also requires the database to be full of TYPEs for each shape of result set (at least it did when I last played with it).
  • Returning data as a list of OUT parameters.  This works well (to a point – see below) for a single row of data.
  • Returning a refcursor to the result set.

It’s the last two options that I’ve been playing with recently.

The refcursor option is good in many ways, mainly because the process remains close to the optimised “query over JDBC” approach.  The only real downside I’ve found is a really annoying one which I was amazed to find out hadn’t been addressed: there’s no prefetch for refcursors through JDBC.

We use the thin JDBC client so we fired up the thick client to see if that helped.  We made sure the prefetch value we were setting was actually being applied.  We tried the latest version of the thin driver.  Nothing.  We couldn’t get it to work.  Reading around on the internet it seems that prefetching refcursors is possible through the PHP driver and I found numerous pieces suggesting that prefetch over JDBC worked but nothing concrete.  In the end, we got word from the man that knows, the JDBC man at Oracle.  He confirmed that it’s not supported.

This is a real nuisance.  The lack of prefetch means at least one extra round trip.  One for the procedure call and one for the fetch from the refcursor.  It also reduces the benefit of returning multiple refcursors from a single call.  That would have given me the edge over direct SQL.  I could have executed and fetched multiple queries with a single round trip.  I don’t like to compromise, I really don’t but now direct SQL has an advantage over calling packages that return refcursors.  Irritating doesn’t cover it.

Up until recently our internal best practice said that for a single row of data, a procedure call with multiple OUT parameters was our best practice.  It doesn’t have the horrible (did I mention that I didn’t like it?) downside of the extra round trip we get with refcursors.  There is an issue though.  We created a stored procedure with 70 OUT parameters.  Everything looked fine until we ramped up the contention and then it got really, really expensive.  When I say expensive, I mean properly expensive i.e. lots of CPU.  The OS (linux) shows very high system time and a little digging reveals lots of slow mmap calls.  I may blog the detail at some point but to summarise: directly selecting into >= 24 OUT parameters triggers lots of mmap calls on the underlying OS.  Some strange specifics:

  • The mmap calls only seem expensive under contention.  Running our tests single threaded still generates the calls but they are quick.
  • The size of the OUT parameters doesn’t have an impact i.e. 23 large parameters are as fast as 23 small parameters.  24 large parameters are as slow as 24 small parameters.
  • If you declare PL/SQL variables in the stored procedure, SELECT INTO those variables and then assign the values to the OUT parameters then the problem goes away!  This obviously provides us with a work around but who wants to publish that as a best practice?  I’d be laughed out of the room.
  • All of this was on 11.0.2.3

I think that’s about the end of it for now.  In summary, we like the PL/SQL API.  We like refcursors as the method of passing data and we’ll like them even more when the JDBC guys give us the prefetch that is, in my opinion, a significant oversight.  We quite like scalar OUT parameters BUT we know we need to avoid >= 24 of them!

Posted in Patterns, performance | Leave a Comment »

Conditional joins: Part 2

Posted by anotherdavewebster on December 30, 2010

There’s nothing like the Christmas break to slow things down at work a little! This has given me a little time to finally write up Conditional Joins Part 2. Part 2 is actually “Part 1” from a chronological perspective but I’ve written them the other way round simply because Part 1 is a little more general whereas Part 2, as useful as it is, solves a very specific problem.

So the real world problem is as follows. Think of a large set of orders which are placed over many months but “settled” at one point in time. My driving table contains the settled orders but I now need to go back to the record of each order being placed to retrieve some data from that record. The placement table is partitioned by placement date (a_date) and has a local non unique index on the ID (an_id). Both the placement date and the ID exist on the settled (driving) table. The vast majority of the settled orders were placed in the last couple of days but the “tail” stretches back many months. I was asked to look at the problem which basically came down to a choice between a hash join and a nested loop join from the settled table to the placement table. Neither approach really suited the problem due to the long tail. I explained the dilemma to the client to which he replied “isn’t it a shame that Oracle can’t choose to hash join to the recent partitions and nested loop to the older ones”. Bingo. That’s exactly what we want and we can do it using a conditional join.
Read the rest of this entry »

Posted in Patterns, performance | Leave a Comment »

Polling for new data: Part 2

Posted by anotherdavewebster on October 7, 2010

Apologies for the delay. The day job has been rather busy recently and I was fortunate enough to get a trip out to Open World again this year. Much more black and white for me this time: some real gems and some truly terrible presentations! My personal highlights include a few wonderful sessions from Craig Shallahamer (you have to see his physical demonstration of the workings of the buffer cache), some real index treats from Richard Foote and a really good session on Query Transformations from a fantastically enthusiastic Joze Senegacnik which shone the light on a few areas of the optimiser which I’d never really thought about (but should have).

So, back to Polling for new data. One point that has come to light since I started this blog is that I should have made clear from the outset that almost everything I do is about performance. Read the rest of this entry »

Posted in Patterns, performance | 2 Comments »

Why don’t more people know about the benefits of Scalar Subqueries?

Posted by anotherdavewebster on August 24, 2010

Every now and then, I get involved in interviewing Oracle developers when we’re specifically after “performance” skills. I ask a range of fairly expansive questions but I always ask if they know what Scalar Subqueries are and why they can deliver amazing performance benefits. Not many people seem to have heard of Scalar Subqueries are and even fewer are aware of the caching benefits.

I originally planned to go through all the usual stuff to demonstrate how the caching works. I was then going to link you to a page I found a few years back which contains some detail on how effective the caching is. On closer inspection of the page, I realised there was no point in writing it up myself as Mr Chadderton covers it all rather nicely.

Have a read of this and welcome to the wonderful world of Scalar Subqueries!

I would just like to add that the article from Martin focusses on caching the results of functions (see here for another option) but I tend to use Scalar Subqueries when I’m repeatedly looking up the same data over and over again:

CREATE TABLE fact_table
AS
SELECT 'some data' text_column
     , mod(rownum, 10) a_fk_id
  FROM DUAL CONNECT BY ROWNUM <= 5000;

CREATE TABLE lookup_table
AS
SELECT ROWNUM an_id
     , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column
  FROM DUAL
CONNECT BY ROWNUM <= 100000;

alter table lookup_table add constraint lookup_table_pk primary key (an_id);

analyze table fact_table compute statistics;
analyze table lookup_table compute statistics;

DECLARE
   CURSOR c_standard
   IS
      SELECT fct.text_column
           , lup.text_column lup_text_column
        FROM fact_table fct, lookup_table lup
       WHERE fct.a_fk_id = lup.an_id(+);

   CURSOR c_ssq
   IS
      SELECT fct.text_column
           , ( SELECT lup.text_column
                FROM lookup_table lup
               WHERE fct.a_fk_id = lup.an_id ) lup_text_column
        FROM fact_table fct;
BEGIN
   runstats_pkg.rs_start;

   FOR i IN 1 .. 100 LOOP
      FOR rec IN c_standard LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_middle;

   FOR i IN 1 .. 100 LOOP
      FOR rec IN c_ssq LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_stop;
END;

So let’s just look at the plans quickly. The standard version is a Hash Join:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |  5000 |   200K|   102   (3)|
|*  1 |  HASH JOIN OUTER   |              |  5000 |   200K|   102   (3)|
|   2 |   TABLE ACCESS FULL| FACT_TABLE   |  5000 | 55000 |     5   (0)|
|   3 |   TABLE ACCESS FULL| LOOKUP_TABLE |   100K|  2929K|    95   (2)|
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("FCT"."A_FK_ID"="LUP"."AN_ID"(+))

But the Scalar Subquery is quite different:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |  5000 | 55000 |     5   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     2   (0)|
|*  2 |   INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     1   (0)|
|   3 |  TABLE ACCESS FULL          | FACT_TABLE      |  5000 | 55000 |     5   (0)|
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("LUP"."AN_ID"=:B1)

The runstats timing confirms the big win

Run1 ran in 149 hsecs
Run2 ran in 48 hsecs
run 1 ran in 310.42% of the time

and a big lump of that comes from reducing the IO (just about my favourite tuning aim):

Name                                                      Run1        Run2        Diff
STAT...session logical reads                            53,727       8,535     -45,192

Posted in performance | Leave a Comment »

Conditional joins

Posted by anotherdavewebster on August 18, 2010

From time to time I come across the situation where I have to outer join from one table to another but I could actually tell from the driving table, whether or not I could expect an entry on the lookup table. Assuming I’m going to be using a Nested Loop join, I actually waste an awful lot of effort probing the index on the lookup table when I already know that I won’t find anything. Allow me to demonstrate.

We start off by creating a driving table and a “lookup” table.  It’s important that the sizes of the tables are such that the CBO prefers a nested loop join rather than a hash join else the whole exercise is rather pointless.  You’ll also have to forgive the old school “analyze”, I’m just lazy.

CREATE TABLE driving_table
AS
SELECT 'meaningless text' text_column
     , DECODE(MOD(rownum, 10), 0, 'Y', 'N') should_i_join_yn
     , rownum an_id
  FROM DUAL CONNECT BY ROWNUM <= 50;

CREATE TABLE lookup_table
AS
SELECT ROWNUM * 10 an_id
     , 'something to lookup: ' || TO_CHAR(ROWNUM) text_column
  FROM DUAL
CONNECT BY ROWNUM <= 100000;

alter table lookup_table add constraint lookup_table_pk primary key (an_id);

analyze table driving_table compute statistics;
analyze table lookup_table compute statistics;

Note the use of DUAL with a CONNECT BY to generate my data.  See n-rows-please for a very short discussion.

The query I want to run is:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE drv.an_id = lup.an_id(+);

But say I have data on the driving table which tells me if there’s any chance of me making the join. The rather crudely named “should_i_join_yn” column serves as an example. I can write my query as:

SELECT drv.text_column
     , lup.text_column lup_text_column
  FROM driving_table drv, lookup_table lup
 WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);

So I ensure that my join column from the driving table resolves to NULL when I know I can’t join. Fortunately, Oracle is no fool and spots that NULL can’t ever join to my lookup table and doesn’t bother doing the lookup. Using Mr Kyte’s runstats (see Useful Links) I can pick out a few key metrics to demonstrate the point. First of all, the test:

DECLARE
   CURSOR c_standard
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE drv.an_id = lup.an_id(+);

   CURSOR c_conditional
   IS
      SELECT drv.text_column
           , lup.text_column lup_text_column
        FROM driving_table drv, lookup_table lup
       WHERE DECODE( drv.should_i_join_yn, 'Y', drv.an_id, NULL ) = lup.an_id(+);
BEGIN
   runstats_pkg.rs_start;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_standard LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_middle;

   FOR i IN 1 .. 10000 LOOP
      FOR rec IN c_conditional LOOP
         NULL;
      END LOOP;
   END LOOP;

   runstats_pkg.rs_stop;
END;

The plans show the difference in the access predicates. The standard query is just a simple outer join:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2400 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2400 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   900 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("DRV"."AN_ID"="LUP"."AN_ID"(+))

and the conditional joining version shows the DECODE in the access predicate:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |    50 |  2450 |    53
|   1 |  NESTED LOOPS OUTER          |                 |    50 |  2450 |    53
|   2 |   TABLE ACCESS FULL          | DRIVING_TABLE   |    50 |   950 |     3
|   3 |   TABLE ACCESS BY INDEX ROWID| LOOKUP_TABLE    |     1 |    30 |     1
|*  4 |    INDEX UNIQUE SCAN         | LOOKUP_TABLE_PK |     1 |       |     0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("LUP"."AN_ID"(+)=DECODE("DRV"."SHOULD_I_JOIN_YN",'Y',"DRV"."AN_ID",NULL))

Output from the test includes the statistics:

Run1 ran in 145 hsecs
Run2 ran in 103 hsecs
run 1 ran in 140.78% of the time

So it’s a bit quicker which we’d expect from the IO stats:

Name                                  Run1        Run2        Diff
STAT...index fetch by key          500,000      50,000    -450,000
STAT...session logical reads       600,027     140,033    -459,994

So, in summary, if we’re using Nested Loop joins and we have enough information on the driving table to tell us that no join is possible, I can avoid even looking.

This is particularly useful for some super and sub type model implementations where a driving table can have numerous lookups depending on the “type” of the driving record. We can conditionally join to just one of the sub type tables. I also have another example where I join to the same table twice, the second time conditionally. Any thoughts as to why that might give a MASSIVE performance benefit? I’ll write it up when I get some time.

Posted in performance | 2 Comments »