Oracle Performance Tuning : Best Practices
Introduction and Background Information
Oracle has been designed to be a portable database. It is available on every platform from Windows to UNIX to mainframes. Oracle architecture is adequately generalized that you can get a good understanding of how it works on all platforms.
Oracle database consists of many logical storage structures such as data blocks, segments, schema objects etc. The physical storage space in the data files is logically allocated and de-allocated in the form of Oracle data blocks. The three types of physical database files are data files, redo logs, and control files.
Data files contain the actual data of the database that is represented in the form of tables or indexes.
Redo logs are used to record all changes made to the database. In order to function properly, every oracle database should have at least two redo logs.
The control file describes the physical structure of the database. It contains information such as the database name, date and the time of creation etc. Oracle process architecture is depicted below.
Oracle background processes are created at the time of initializing the database. Certain background processes are required for normal operation of the system and others are only used to perform maintenance and recovery operations.
Oracle uses two major types of memory structures for storing and retrieving data in the system. They are System Global Area (SGA) and Program Global Area (PGA).
SGA is a shared memory region used to hold data and internal control structures of the database. PGA (also called as Process Global Area) is the collection of non shared memory regions, each containing the data and control information for an individual server process. When a server process starts, the PGA memory gets allocated for that process.
This white paper reveals oracle performance tuning best practices that any administrator, database designer and performance tester must focus on.
Managing OS Resources
The performance issues in operating system normally involve memory management, process management and scheduling. After tuning the Oracle database instance, if you still need to enhance the performance then you may reduce system time. You should also make sure that there is enough CPU power, swap space and I/O bandwidth. For instance, the number of system calls increases in case of excessive buffer busy waits. You can decrease the total number of system calls by minimizing such buffer busy waits.
Some platforms offer operating system resource managers. These are actually designed to minimize the impact of peak load use patterns. Operating system resource managers are different from domains or similar facilities. Domains can provide thoroughly separated environments within one system. As long as the partitioned memory resource allocation is steady, oracle runs within domains as well as other less partitioning constructs. Administrators generally prioritize the allocation of resources within a global pool of resources.
Note: If you have more than one instance on a node and you want to distribute resources among them, each instance should be assigned to a dedicated resource manager group. To run multiple instances in the managed entity, you need to use instance caging to manage CPU resource distribution among the instances. While managing CPU resources, the Oracle Database Resource Manager expects a fixed amount of CPU resources for the instance. With instance caging, it assumes the available CPU resources to be equal to the value of the CPU_COUNT initialization parameter. Without instance caging, it assumes the available CPU resources to be equal to the total number of CPUs in the managed entity.
In order to address CPU issues, you need to determine whether enough CPU resources are available and also recognize when the system is consuming too many resources.
You can also capture various workload snapshots by using the Statspack, Automatic Workload Repository, or the UTLBSTAT utility. Workload is an important factor while evaluating the utilization of CPU. During peak hours, 90% CPU utilization with 10% waiting time could be acceptable. Even 25% utilization at a time of low workload is understandable. But there will be no scope for peak workload if your systems show high utilization at normal workloads.
For instance, the following figure depicts the workload over time and has peaks at exactly 10 AM and at around 2 PM.
This application has 100 users working 8 hours a day. Each user entering one transaction every 5 minutes translates into 9,600 transactions daily and in 8 hour period, the system must support 1,200 transactions an hour i.e. an average of 20 transactions per minute.
However, usage patterns are not constant and in this context, 20 transactions per minute can be understood as merely a minimum requirement. If the peak rate you need to achieve is 120 transactions per minute, then you should configure a system that can support this workload.
For this example, let us assume that at peak workload, Oracle may use 90% of the CPU resource. As per the calculation shown below, the database may use around 15% of the CPU resource for a period of average workload.
(20 tpm / 120 tpm) * 90% = 15% of the CPU resource.
[tpm = transactions per minute]
If the system requires around 40 to 50 percent of the CPU resource to achieve just 20 tpm then this could be an issue. However, if you can properly tune the system so that it achieves 20 tpm by using only 15% of the CPU, the system can easily support 120 transactions per minute by using 90% of the resources.
The query optimizer is built-in software that chooses an effective method in executing SQL statements.
SQL statements could be executed by a database in many ways such as index scans, full table scans, hash joins and nested loops. The optimizer can acknowledge many factors pertaining to the objects and the conditions mentioned in the query while examining the execution plan. This is an important step in SQL processing and can considerably affect the execution time.
The three main query optimizer operations are Query transformation, Estimation and Plan generation. The following figure illustrates the components of optimizer.
Each query portion is considered as a query block. The input to the query transformer is a parsed query that is defined by a set of query blocks.
The transformer decides whether it is beneficial to rewrite the existing statements into semantically equivalent statements that could be processed effectively.
The estimator evaluates the total cost of an execution plan. Estimator normally uses the available statistics in computing the measures. These statistics can enhance the degree of accuracy.
By trying out different access paths and join methods, the plan generator investigates a variety of plans for a query block. The generator actually selects the plan with the lowest cost.
The query transformer employs many transformation methods including View merging and Predicate pushing.
The view merging optimization is applicable to views that contain selections, joins and projections. In order to authorize the optimizer to use view merging, you should grant the MERGE ANY VIEW privilege to the user. You need to grant the MERGE VIEW privilege to the user on specific views to allow the optimizer to use view merging. In case of predicate pushing, the optimizer pushes relevant predicates into the view query block.
Note: The above privileges are used only under certain conditions, for instance, when a view is not merged because of security check failure.
Controlling the behavior of Optimizer
To increase the performance of SQL execution, you are supposed to use the following parameters. These parameters are helpful in controlling the behavior of optimizer.
This parameter is used in converting literal values of a statement to bind variables. This can promote cursor sharing and could have significant effect on execution plans. The optimizer produces the plan by using these bind variables.
It can control the costing of an index probe in combination with nested loops. The values range from 0 to 100. A value of 100 denotes that 100 percent of the blocks are likely to be found in the buffer cache so that optimizer can adjust the cost of an index probe accordingly.
This adjusts index probe costs. The values range from 1 to 10000. The default value is 100 and it indicates that all indexes are evaluated as an access path as per the normal costing model. A value of 10 denotes that an index access path cost is one-tenth the normal cost.
You can use this parameter to control the memory allocation for hash joins and sorts. We can minimize the optimizer cost by allocating more memory for sorts and hash joins.
This parameter allows the optimizer to cost a star transformation for star queries. Star transformation has the ability to integrate all the bitmap indexes on various fact table columns.
Configuring and Using Memory
Oracle normally recommends using automatic memory management. But you can also adjust the memory pools manually. The information is stored in memory caches and on disk. As Memory access is always faster than disk access (i.e. physical I/O), disk access can increase the CPU resources required. So, it would be better if the data requests of frequently accessed objects are handled by memory.
Our aim should be to minimize the physical I/O overhead as much as we can by making the data retrieval process as efficient as possible.
Automatic Shared Memory Management can simplify SGA configuration. To make good use of Automatic Shared Memory Management,
- You need to set SGA_TARGET initialization parameter to a non-zero value.
- You must set the STATISTICS_LEVEL parameter to TYPICAL.
- The SGA_TARGET parameter should be set to the amount of memory that you would like to allocate for SGA.
Note: You can manually resize memory pool by using the parameters such as SHARED_POOL_SIZE, DB_CACHE_SIZE and JAVA_POOL_SIZE.
Configuring and using the buffer cache
Oracle database can bypass the buffer cache for certain operations such as parallel reads and sorting. Buffer cache could be used effectively by properly tuning SQL statements for the application. While using parallel queries, you need to configure the database to use buffer cache instead of performing direct reads into the PGA. This configuration is more appropriate in case of servers with large amount of memory.
There are few methods to examine buffer cache activity such as V$DB_CACHE_ADVICE and Buffer Cache Hit Ratio. If you need to use V$DB_CACHE_ADVICE then the parameter DB_CACHE_ADVICE should be set to ON.
The Buffer Cache Hit Ratio estimates how frequently a requested block could be detected in the buffer cache without accessing the disk. You must use the Buffer Cache Hit Ratio to verify the physical I/O as anticipated by V$DB_CACHE_ADVICE.
Redo Log buffer configuration
Redo data is generated when server process makes changes to the data blocks in the buffer cache. While log writer (LGWR) attempts to write all the redo entries to a redo log file, the user processes can copy new entries over the previous entries in the memory.
Here are some recommendations for using the redo log buffer effectively:
- Properly batch the commit operations. Here the log writer can efficiently write all the redo log entries.
- Use NOLOGGING operations especially while loading large amount of data.
- The default size of the log buffer must be altered in case of applications that handle large amount of data. The log buffer is actually smaller than the size of SGA. An adequately sized log buffer can improve throughput on machines that perform frequent updates. For such systems, a first estimate is to the default value as shown below:
MAX(0.5M, (128K * number of cpus))
Note: In many cases, sizing the buffer more than 1M may not give performance benefits and it also uses extra memory.
Optimizing the Storage Performance
Reducing the need to provision more storage is a major challenge in IT management. Storage costs could be minimized by partitioning underneath large databases. Many enterprises are using two tiers of storage. A high end storage array is generally used for mission critical applications and a lower cost storage array is used for less demanding platforms. But the reality is that most databases contain a mixture of active data such as call records for the current billing cycle etc and less active data such as orders taken last year etc. For instance, active data can represent around
5 percent of information stored in the database and the other 95 percent could be less active data. This is depicted in the following figure:
Sizing the disk storage as per the workload
The random I/O operations are actuated by inserts, deletes and updates that in turn are activated by queries that scan bulk portions of table. The random I/O operations are calculated in IOPs (I/O operations per second). Sequential I/O operations are calculated in the number of megabytes of data that could be scanned (MB/s).
Storage arrays must be sized by the number of IOPs they can deliver. The total number of IOPs and MB/s required for existing Oracle databases could be determined from the workload repository reports.
Few guidelines for sizing the new systems:
- In case of OLTP (On-Line Transaction Processing) environments, assume that every transaction incurs about five random IOPs.
- For data warehousing environments, assume that a reasonably modern core may require around 200 MB/s in order to be kept completely busy. So a two processor quad core machine needs around 1.6 GB/s scan rate.
- Oracle Automatic Storage Management must be utilized in maximizing the I/O bandwidth. It offers efficient realization of storage array capabilities.
- It is recommended to use Direct and Asynchronous I/O to optimize I/O performance. Reads can be done asynchronously. In case of query intensive workloads, read ahead capability for sequential scans can improve the performance.
- You are supposed to use Direct NFS Client to enhance the performance of network attached storage.
Granting 24/7 access to enterprise applications needs protection from unplanned downtime. This usually requires architecture with redundant components. Oracle offers an exhaustive blueprint known as Maximum Availability Architecture, which includes all the necessary components to eliminate the complexity in datacenters. The following figure illustrates Oracle’s Maximum Availability Architecture:
A big advantage of Oracle’s Maximum Availability Architecture is that it consists of redundant components to improve the efficiency of production systems.
Oracle Real Application Clusters provide protection against server failures and offer more scalability for enterprise applications.
You can use built-in Data Guard technology for disaster recovery. The Active Data Guard allows reporting and backup operations to be offloaded to standby systems.
In an Oracle Data Guard environment, users can be switched between production and standby databases. Version differences of the database and the operating system are always supported. That is, standby database can be upgraded to the next version and users on the older versions can be switched over to the latest versions.
SQL Execution Efficiency
In the designing phase of any system development, you need to make sure that all the developers understand SQL execution efficiency.
As every database connection is an expensive operation, you are supposed to reduce the number of concurrent connections to the database.
As far as multi-tiered applications are concerned, you must ensure that the database connections are pooled.
Maintaining user connections is also important in reducing the parsing activity. Parsing is the process of analyzing SQL statements and developing an execution plan. This process involves syntax checking, loading shared structures and security checking.
Hard parses perform all the operations involved in a parse. They are considered as unscalable and resource intensive. Soft parses are not ideal. They still require security checking that may consume system resources.
As parsing should be reduced to a great extent, developers must design their applications that can parse SQL statements once and execute them many times. This could be achieved through cursors. Experienced developers should be aware of opening and re-executing cursors. Make sure that SQL statements are shared within the shared pool. To achieve this, you can use bind variables to represent the parts of the query.
Performance optimization is an iterative process. Removing the first bottleneck may not give desired improvements because another bottleneck might be revealed that may affect the performance. As this process is iterative, few investigations may have less impact on system performance. It actually takes time and experience to effectively pinpoint critical bottlenecks.
By eliminating resource conflicts, systems could be made scalable to the levels required by any enterprise. Hence, improved performance should be measured in terms of business goals rather than system statistics.