Street-Light Performance Anti-Method


Connor McDonald writes that It’s just bad code or bad design … most of the time for Oracle performance problems.

Today, I dealt with a variation of this problem where a user told me that the time to load data into a database had worsen considerably since ARCHIVELOG mode was enabled.

When asked for evidence, they responded with a list of run times similar to the following:

Date Size of data to be loaded (GB) Run-time (hrs) User comment
2017-06-01 1,200 146 Bad
2017-06-07 200 5 Good
2017-06-08 200 13 Bad
2017-06-09 50 26 Bad
2017-06-11 200 8 OK
2017-06-12 200 5 Good
2017-06-13 200 16 Bad

From their comments, I deduced that the user considered that good performance for the data load into this application was 40 GB/hr with a minimum rate of 25 GB/hr. There were two (2) good data loads that took five (5) hours to load 200 GB (or 40 GB/hr). The one (1) data load that was considered as “OK” took eight (8) hours for 200 GB (or 25 GB/hr).

Even more importantly, the timings provided by the user did not support their thesis that run-times had worsened since ARCHIVELOG mode was enabled. What had worsened may have been the variance in run-times. The timings varied by a factor of three (3) for the same amount of data.

The performance problem had two (2) parts:

  1. The run-time was unpredictable. The very large variance meant that no-one could be sure when the data load would be complete.
  2. The run-time exceeded user expectations over half the number of runs (four (4) out of the seven (7) runs examined).

In Systems Performance: Enterprise and the Cloud, Brendan Gregg describes the Streetlight Anti-Method of performance analysis:

This method is actually the absence of a deliberate methodology. The user analyzes performance by choosing observability tools that are familar, found on the Internet, or just at random to see if anything obvious shows up. This approach is hit or miss and can overlook many types of issues.

Tuning performance may be attempted in a similar trial-and-error fashion, setting whatever tunable parameters are known and familar to different values to see if that helps.

The Streetlight Anti-Method was being used in this performance problem. The user was most insistent on returning the database to NOARCHIVELOG mode. Their reasoning was that this was the only change that could have made performance worse.

When I compared the AWR reports between a good run and a bad one, I found that the database time was nearly the same in both cases (only a 5% difference!). This database time was about three (3) hours in both runs for 200 GB. This meant that between two (2) and thirteen (13) hours of the run-time was outside of the database.

This was when Streetlight Anti-Method really came to the fore. This issue was a database problem because the Oracle RDBMS engine had the most extensive instrumentation in the whole application stack. The Oracle database was where the light shone the brighest. Thus, it was where people looked for performance problems.

The lack of effective instrumentation in other layers of the application stack means this performance problem cannot be solved.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s