How best to get data from Oracle into our Java code?
Posted by anotherdavewebster on August 2, 2013
I’m constantly surprised by how little information I find on the internet about this so I thought I’d give a short summary of what we consider best practice to be and what other options we’ve looked at. Let’s take the common case of wanting to return the results of a SQL query to a the client Java code. There’s just boat loads of this code so surely it’s worth trying to make it efficient. In terms of performance, I’ve not found anything better than a PreparedStatement wrapping the query and executing over JDBC. It seems that Oracle have optimised this route as it’s probably the most common. They give us statement caching, fetch and prefetch size which all help. I’m not really going to talk much about this approach. Instead, I’m going to look at using a PL/SQL API (stored procedures) to the data which provide a number of advantages over executing queries directly from Java:
- Impact analysis for schema changes is improved as packages will not compile if objects are changed.
- It allows a comprehensive DB Unit Test to be constructed for the SQL.
- It provides a level of abstraction from the DB allowing some DB changes to be made “transparently” to the middle-tier.
- It allows security to be more tightly controlled as access is only granted to packages and not underlying tables.
- It allows me to make multiple procedure calls and queries which result in a single result set.
- There’s no “danger” of Java coders writing SQL queries. There you go, I’ve said it, it’s out there. You were all thinking it anyway.
So I think it’s fair to say that there are some advantages of a PL/SQL API. The question is, what’s the best way of passing the data back to Java? I’ve tried a few things over the years:
- Returning data in an object array as a return value from a function or OUT parameter of a procedure. It probably comes as no great surprise that the creation of these objects isn’t cheap. My tests showed a not insignificant overhead. It also requires the database to be full of TYPEs for each shape of result set (at least it did when I last played with it).
- Returning data as a list of OUT parameters. This works well (to a point – see below) for a single row of data.
- Returning a refcursor to the result set.
It’s the last two options that I’ve been playing with recently.
The refcursor option is good in many ways, mainly because the process remains close to the optimised “query over JDBC” approach. The only real downside I’ve found is a really annoying one which I was amazed to find out hadn’t been addressed: there’s no prefetch for refcursors through JDBC.
We use the thin JDBC client so we fired up the thick client to see if that helped. We made sure the prefetch value we were setting was actually being applied. We tried the latest version of the thin driver. Nothing. We couldn’t get it to work. Reading around on the internet it seems that prefetching refcursors is possible through the PHP driver and I found numerous pieces suggesting that prefetch over JDBC worked but nothing concrete. In the end, we got word from the man that knows, the JDBC man at Oracle. He confirmed that it’s not supported.
This is a real nuisance. The lack of prefetch means at least one extra round trip. One for the procedure call and one for the fetch from the refcursor. It also reduces the benefit of returning multiple refcursors from a single call. That would have given me the edge over direct SQL. I could have executed and fetched multiple queries with a single round trip. I don’t like to compromise, I really don’t but now direct SQL has an advantage over calling packages that return refcursors. Irritating doesn’t cover it.
Up until recently our internal best practice said that for a single row of data, a procedure call with multiple OUT parameters was our best practice. It doesn’t have the horrible (did I mention that I didn’t like it?) downside of the extra round trip we get with refcursors. There is an issue though. We created a stored procedure with 70 OUT parameters. Everything looked fine until we ramped up the contention and then it got really, really expensive. When I say expensive, I mean properly expensive i.e. lots of CPU. The OS (linux) shows very high system time and a little digging reveals lots of slow mmap calls. I may blog the detail at some point but to summarise: directly selecting into >= 24 OUT parameters triggers lots of mmap calls on the underlying OS. Some strange specifics:
- The mmap calls only seem expensive under contention. Running our tests single threaded still generates the calls but they are quick.
- The size of the OUT parameters doesn’t have an impact i.e. 23 large parameters are as fast as 23 small parameters. 24 large parameters are as slow as 24 small parameters.
- If you declare PL/SQL variables in the stored procedure, SELECT INTO those variables and then assign the values to the OUT parameters then the problem goes away! This obviously provides us with a work around but who wants to publish that as a best practice? I’d be laughed out of the room.
- All of this was on 184.108.40.206
I think that’s about the end of it for now. In summary, we like the PL/SQL API. We like refcursors as the method of passing data and we’ll like them even more when the JDBC guys give us the prefetch that is, in my opinion, a significant oversight. We quite like scalar OUT parameters BUT we know we need to avoid >= 24 of them!