Archives des Oracle

This year again the APEX connect conference spans over three days with mixed topics around APEX, like JavaScript, CSS, SQL and much more.
After the welcome speech and the Keynote about “Reconciling APEX and the Thick Database Paradigm” by Bryn Llewellyn I decided to attend presentations on following topics:
– Temporal Validity and Flashback Data Archive
– Universal Theme and Custom Themes in APEX 5.1
– Using REST Services in APEX applications the nice way
– Uncover the Truth (of APEX application)
– Browser Developer Tools for APEX developers
– Docker for Dummies

Thick Database Paradigm:
What was highlighted by Bryn Llewellyn is that the key to proper Database development, is to encapsulate your database through PL/SQL APIs.
When you work on DB development make your environment such as it can be easily rebuild from scratch with sripts so you have no fear in making anything wrong.
Your schema user should have as less rights as needed so you keep your environment safe.
If you build proper APIs, no mater what kind of client application uses your data (APEX, REST, Java Web App, …), it will be able to interface.

Temporal Validity and Flashback Data Archive:
There is an increasing demand on data history and audit.
Data history means not only keeping track of past data but also managing different versions of the same data over time (e.g. customer delivery address). This is managed by Temporal validity.
Oracle 12c allows to automatically manage such time dependent data by using “ADD PERIOD FOR” on a table.
When retrieving the data use “AS OF PERIOD FOR” in the select statement.
Details can be found on the Oracle website:
Implementing temporal validity
Audit can be managed using the well known trigger business with all issues it can generate but also automatically by using flashback archive.
In this second case data audit is written in a specified tablespace for which you define the data retention period. SYS_FBA tables get automatically created and information tracked is managed by setting context level. This is very powerful tool as it also takes in account DML changes.
Also very important for audit purpose, flashback data cannot be modified.
You can find further information on following Blog:
Oracle 12c Flashback Data Archive

Universal Theme and Custom Themes in APEX 5.1:
After a brief overview of Theme and Templates history in APEX, we were shown how easy (at least it seems) it is to create and manage custom Theme and Templates.
Template options introduced in APEX 5 aim to reduce the number of templates for a specific “object” type to a minimum in order to ease maintenance.
Live template options have been introduced with APEX 5.1 to have a preview of the changes at run time and facilitate their usage.
Theme subscription allows to distribute changes made to a master Theme which can now be defined at workspace level.
Theme styles allow you to have a dedicated CSS file on top of you application standard CSS and define user based styles from the Theme roller.
Note: Themes based on JQuery Mobile for mobile applications should no longer be used, rather use the Universal Theme responsive as JQuery UI wasn’t updated for long and might have issues with the new JQuery core version that might be used in future versions of APEX.

Using REST Services in APEX applications the nice way:
The definition of REST is based on 3 pillars:
– Resources
– Methods (GET, PUT, POST, DELETE, PATCH)
– Representation (JSON, HTML, CSV, …)
The new REST Client Assistant packaged application in APEX 5.1 will be on a great help for developer as it manages to generate the procedures required to parse JSON data returned by a given REST Data service URL as well as the underlying SQL query to display the data in report.
When the amount of data is becoming to large, REST data services can return them on a pagination fashion which needs to be supported on the client side. At this point only classic report can support that feature in APEX 5.1. Filtering on the data query to the service needs also to be managed. The REST Data Sample application is showing how to implement the different kind of interaction with REST Data services based on Oracle standards.
There will be improvements in supporting REST Data service in the upcoming version 5.2 of APEX, such as remote SQL execution.

Uncover the Truth (of APEX application):
When you have to modify an existing APEX application or take over from customer development you need to understand thee heart of the application which can be a challenge. To do so you need to identify it’s structure and how different elements are used and interact.
Various people are interested in this:
– DB developers
– APEX developers
– Cloud developers
– Project leaders
This is all about:
– Functionality (Page function, application logic, Interfaces)
– Complexity (APEX components, PL/SQL objects, JavaScript, CSS, DB objects)
– Transparency (who changed, when, Conditions, relations between pages)
There are already different tools in APEX allowing to see different aspects of those data:
– Searches
– History
– Application utilities
– Reports
– SQL Workshop
So it can be cumbersome to walk through all those.
We were presented a self developed toll which can be seen as a kind of “Dashboard” to analyze everything in one place base on all sorts of charts reading out the APEX metadata tables. I’m looking forward to seeing it released next summer.

Browser Developer Tools for APEX developers:
The IDE for APEX is the web browser, so it’s important to know about the developer tools provided in your web browser.
Each and every web browser has it’s own tools, some being better than others.
The most used browser has also the most complete tool set: Chrome (Firefox Developer Edition is also worth looking at)
As there are a lot of information to be displayed in the Developer tools, its highly recommended to detach the window from the browser to display it on a secondary screen when possible.
CSS usage is showing all level of the style Sheet to understand what is set where.
Specifications of the web page can be modified on the fly to understand their impact.
Debugging of JavaScript can be done with setting breakpoints.
Application panel allows to monitor cookies.
Device mode allows to emulate mobile devices and even set the network speed to have a better understanding of end user experience.
Even remote debugging can be used on attached devices.

Docker for Dummies
I already heard a lot about Docker but I never took some time to look into it, so I took the opportunity to have it shown today.
What is Docker? A light weight VM?
No, a container!
It allows to share resources and get ride of things like Hypervisor and full target OS (which are used for VMs), which makes it light.
Main characteristics are:
– You can put almost anything into it
– It stays locked
– It’s efficient for transport
– It’s small, light weight
– It’s scallable
Actually it can be seen more as a software delivery platform.
The basic component is an image:
– It contains File System and parameters
– It has no state
– Layers are read only
– Layers are shared
– Updates require only updated files to be downloaded
A container is a running instance of an image. It adds a R/W layer on top of the image.
Images are not cloned.
Persistence of the data used in the container is managed by mapping local folder into the container.
Also Docker is command line based, there is a GUI available called Kitematic.
Resources:
Docker website
Kitematic website
You can find further details on following blog:
Overview and installation

Cet article APEX Connect 2017 – Day 1 est apparu en premier sur Blog dbi services.

After the Keynote about “Oracle Application Express – Ahead of it’s time. Leading the way forward” by Joel Kallmann (Director of Software Development for APEX) and some overall presentation about Development with APEX 5.1, I decided to attend presentations on following topics oriented towards tools to make developer’s life easier:
– APEX OOS Tools and helper
– Useful APEX plugins
– LESS is more
– JavaScript Debugging and Tuning
– Introduction to Oracle JET

Oracle Application Express – Ahead of it’s time. Leading the way forward:
In 2008 APEX was already matching the definition of cloud given by the NIST (National Institute of Standards and Technology) and even matching the definition of PaaS in 2002: APEX was ahead of it’s time!
APEX allows to increase productivity and consistency, reducing complexity due to it’s declarative framework setup. It’s therefore to be seen as lowcode development environment: https://apex.oracle.com/lowcode/
What is impressive about APEX is that a packaged app written in 2008 is still able to run on apex.oracle.com today.
It’s the most used development platform within Oracle for running there internal business.
There is now an online APEX curriculum available for free on Oracle academy: APEX on Oracle academy
Oracle database free services online will be launched, replacing apex.oracle.com.

Upcoming features in APEX 5.2:
The next version of APEX may contain features like:
– Blueprint wizard: allows to integrate features from existing packaged apps in your on application declaratively
– Improved packaged apps
– Update of JET and JQuery versions
– Enhancements in Page designer:
-* Dialogs/pop-overs
-* Client side dialogs
-* Adaptive UI based on preference options declaratively enabled/disabled
-* Removal of legacy component view
– Enhancements on JET Charts:
-* New Gantt chart
-* New pyramid report
-* New Box Plot
-* New interactive report and websheet charts
-* Removal of 32k limit of data visualization
– Improved interactive grids
-* Flexible row height
-* Group by view
-* Pivot view
-* Region button position
-* Printing
-* Subscription
-* Computation
-* Required filtering
-* Complex filtering and highlighting
-* No stretch columns
-* Copy down facility
-* API documentation
-* Migration for interactive reports
– Improved REST services
-* Declarative support of REST services
-* REST services as data source for APEX components
-* Simple access to Cloud services
-* ORDS remote SQL data access
-* SQL execution on remote DB
All very nice and promising features :-)

APEX OOS Tools and helper::
Oracle Open Source Tools and other useful Open Source projects can be found in GitHub within different repositories.
Most famous OOS tools for APEX developers are:
– oracle-db-tools
– oraclejet
– db-sample-schemas
– node-dboracle
– docker-images
Link to oracle GitHub: https://github.com/oracle
Beside that to Open Source community provides various other tools for APEX developers:
– oxar (automated installation of full development environment)https://github.com/OraOpenSource/OXAR
– docker images https://github.com/Dani3lSun/docker-db-apex-dev
– Logger https://github.com/OraOpenSource/Logger
– APEX Diff (comparison of application exports based on JSON using node.js and sqlcl) https://github.com/OraOpenSource/apex-diff
– APEX Client extension https://github.com/OraOpenSource/apex-frontend-boost
– PL/SQL libraries like Alexandria https://github.com/mortenbra/alexandria-plsql-utils
– APEX backup scripts https://github.com/OraOpenSource/apexbackup
There is a lot out there to make you life easier. Enjoy them!
Thanks a lot to all contributors!

Useful APEX plugins::
APEX is a very nice development framework but it sometimes needs more than what is provided by Oracle.
Fortunately APEX allows to write extensions to fill-in the gap: plug-ins
There are many to be found on apex.world
Here is an non exhaustive list of useful plug-ins that were presented at APEX connect:
– select2 https://select2.github.io/
– Date range (based on JS moment library) http://apex.pretius.com/apex/f?p=105:DATERANGE:::NO:::
– Help text tooltip http://apex.pretius.com/apex/f?p=105:HELPTEXT:::NO:::
– Dropzone (for multiple file upload) https://github.com/Dani3lSun/apex-plugin-dropzone
– Excel to collections http://www.apex-plugin.com/oracle-apex-plugins/process-type-plugin/excel2collections_271.html
– Enhanced notification http://apex.pretius.com/apex/f?p=105:NOTIFICATIONS:::NO:::
– Nested reports http://apex.pretius.com/apex/f?p=105:NESTED_REPORTS:::NO:::
Thanks to all developers who provide those plug-ins and make APEX be even more enjoyable!

LESS is more::
What is LESS?
LESS is a CSS pre-processor which allows to use variable, mix-ins and nested rules to facilitate the rules CSS management.
It’s already in use in APEXso that you can leverage thattool to adjust your application to your Corporate identity guidelines.
The Theme roller of APEX 5.x make use of it, but the core.css of APEX is not modified that way.
I would suggest you to visit following website if you are interested in who LESS works:
http://lesscss.org/

JavaScript Debugging and Tuning::
APEX makes use of JQuery.
In some cases you might run into performance issues while loading or using your application pages.
Here are some tips and tricks to optimize JavaScript in your application based on experience.
– Use the right selector to reducing searches in the DOM (from the most to the less selective: id, element, class, attribute, pseudo)
– Use loops the right way (arrays can help)
– Use variables to reduce DOM access
– Use detach function to have an “offline” DOM and reduce the cost of parsing and accessing the DOM
– In some case native JavaScript is faster than JQuery
There are tools to help you measure the performance of your page:
– jsperf.com
– ESBench.com
– Chrome Canary developer tools
When it comes about page load performance, the size of your JavaScript library file is key. This can be reduced using tools like uglifyJS.
For Debugging and Logging purpose you can make use of the client console.log or even better the APEX.debug wrapper on it. Unfortunately those logs are only visible on the client console during runtime. One option to centralize them would be to write the result of the logs into a DB table using AJAX. Also stacktrace.js is of help as it captures the user context which can then be put with the logs to better understand the issue.
Ultimately REST service could also be an alternative to send back the logs to the DB.

Introduction to Oracle JET:
Oracle JET: JavaScript Extension Toolkit
It supports multilingual and follows W3C standards.
A JET module is always made of at least 2 parts:
– JavaScript file (view Models)
– HTML file (views)
When using JET modules you always have to take care of the required libraries/modules (dependencies).
APEX only makes use of the JET Charts for now.
I can only recommend to visit the Oracle web page on that subject:
http://www.oracle.com/webfolder/technetwork/jet/index.html/

Cet article APEX Connect 2017 – Day 2 est apparu en premier sur Blog dbi services.

Yesterday I had fun with Opatch again. We wanted to apply the latest PSU (12.1.0.2.170418) on top of a 12.1.0.2 GI without any patches applied:

xxxxx@xxxxx:/disk00/app/12.1.0/grid_2_4/diagnostics/ [+ASM2] opatch lsinv
Oracle Interim Patch Installer version 12.2.0.1.8
Copyright (c) 2017, Oracle Corporation. All rights reserved.

Oracle Home : /disk00/app/12.1.0/grid_2_4
Central Inventory : /disk00/app/oraInventory
from : /disk00/app/12.1.0/grid_2_4/oraInst.loc
OPatch version : 12.2.0.1.8
OUI version : 12.1.0.2.0
Log file location : /disk00/app/12.1.0/grid_2_4/cfgtoollogs/opatch/opatch2017-05-17_15-44-20PM_1.log

Lsinventory Output file location : /disk00/app/12.1.0/grid_2_4/cfgtoollogs/opatch/lsinv/lsinventory2017-05-17_15-44-20PM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: dbprd08s.psi.ch
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Grid Infrastructure 12c 12.1.0.2.0
There are 1 products installed in this Oracle Home.


There are no Interim patches installed in this Oracle Home.


Patch level status of Cluster nodes :

Patching Level Nodes
-------------- -----
0 xxxxx,xxxxx

--------------------------------------------------------------------------------

OPatch succeeded.

As you can see from the above output we applied the latest Opatch version for 12.1.0.2, which is this one:

When we kicked off opatchauto:

	
Customer Problem Description
---------------------------------------------------

Problem Summary
---------------------------------------------------
opatchauto - Topology creation failed.

Problem Description
---------------------------------------------------
[root@xxxxx patchset]$ opatchauto apply /disk00/app/oracle/patchset/25434003 -oh /disk00/app/12.1.0/grid_2_4 -ocmrf /home/cluster_admin/ocm.rsp

OPatchauto session is initiated at Wed May 17 15:30:20 2017

System initialization log file is /disk00/app/12.1.0/grid_2_4/cfgtoollogs/opatchautodb/systemconfig2017-05-17_03-30-22PM.log.

Exception in thread "main" java.lang.UnsatisfiedLinkError: oracle.cluster.impl.common.cnative.ClusterNative.CLSBInit(ILjava/lang/String;ZLoracle/cluster/impl/common/cnative/ClusterNativeResult;Ljava/lang/Object;)J
at oracle.cluster.impl.common.cnative.ClusterNative.CLSBInit(Native Method)
at oracle.cluster.util.SRVMContext.init(SRVMContext.java:130)
at oracle.cluster.util.SRVMContext.init(SRVMContext.java:83)
at oracle.ops.mgmt.has.HASContext.(HASContext.java:135)
at oracle.ops.mgmt.has.HASContext.getInstance(HASContext.java:292)
at oracle.ops.mgmt.has.ClusterUtil.(ClusterUtil.java:80)
at oracle.cluster.impl.common.ClusterImpl.(ClusterImpl.java:114)
at oracle.cluster.impl.common.CommonFactoryImpl.getCluster(CommonFactoryImpl.java:81)
at oracle.cluster.common.CommonFactory.getCluster(CommonFactory.java:79)
at com.oracle.glcm.patch.auto.db.product.driver.crs.CrsProductDriver.getSrvmCluster(CrsProductDriver.java:506)
at com.oracle.glcm.patch.auto.db.product.driver.crs.CrsProductDriver.buildSystemInstance(CrsProductDriver.java:187)
at com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator.createSystemInstance(TopologyCreator.java:204)
at com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator.process(TopologyCreator.java:142)
at com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator.main(TopologyCreator.java:99)

OPatchauto session completed at Wed May 17 15:30:23 2017
Time taken to complete the session 0 minute, 3 seconds

Topology creation failed.

Oh, surprise. We did exactly the same one day before on another Cluster and all went fine. When you check what MOS suggest for this you will not find much information (yes, this is the 64bit version of Opatch). The log file is not much of a help either:

cluster_admin@xxxxx:/home/cluster_admin/ [+ASM2] cat /disk00/app/12.1.0/grid_2_4/cfgtoollogs/opatchautodb/systemconfig2017-05-17_03-30-22PM.log
2017-05-17 15:30:22,800 INFO [1] com.oracle.glcm.patch.auto.db.product.inventory.ClusterInformationLoader - crsType: CRS
2017-05-17 15:30:22,960 INFO [1] com.oracle.glcm.patch.auto.db.product.inventory.ClusterInformationLoader - running: true
2017-05-17 15:30:22,979 INFO [1] com.oracle.glcm.patch.auto.db.product.inventory.ClusterInformationLoader - running: true
2017-05-17 15:30:23,199 INFO [1] com.oracle.glcm.patch.auto.db.product.driver.crs.CrsProductDriver - crsType: CRS
2017-05-17 15:30:23,350 INFO [1] com.oracle.glcm.patch.auto.db.product.driver.crs.CrsProductDriver - running: true

Five lines and all seems to be fine. The only obvious difference to what worked the day before is that we did not have any patch applied to the GI yet. Once we downgraded Opatch to 12.1.0.1.8 all went fine:

[root@xxxxxx grid_2_4]$ opatchauto apply /disk00/app/oracle/patchset/25434003 -oh /disk00/app/12.1.0/grid_2_4 -ocmrf /home/cluster_admin/ocm.rsp
OPatch Automation Tool
Copyright (c)2014, Oracle Corporation. All rights reserved.

OPatchauto Version : 12.1.0.1.8
OUI Version : 12.1.0.2.0
Running from : /disk00/app/12.1.0/grid_2_4

opatchauto log file: /disk00/app/12.1.0/grid_2_4/cfgtoollogs/opatchauto/25434003/opatch_gi_2017-05-17_16-34-00_deploy.log

Parameter Validation: Successful

Configuration Validation: Successful 
...

I could not verify this on my local lab so I am not sure what exactly caused this, but at least the workaround did the trick.

Hope this helps in case you run into the same issue.

Cet article opatchauto and “java.lang.UnsatisfiedLinkError”, Topology creation failed when applying 12.1.0.2.170418 PSU est apparu en premier sur Blog dbi services.

In a project I have to move an Oracle 11.2.0.4-DB (around 7TB in size) from Sparc Solaris (called Source in the remainder of this Blog) to Linux x86-64 (called Target in the remainder of the Blog). I.e. a platform migration from Big Endian to Little Endian. A good method to do that is described in My Oracle Support Note 1389592.1:

11G – Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup

Basically I copy the DB-Files using dbms_file_transfer.get_file, which also does the conversion from Big to Little Endian. Afterwards incremental Backups are applied to the transferred files and in a final step the files are attached to the Target DB using the metadata-import of transportable tablespaces.

Following the steps mentioned in that Note 1389592.1 I ran into 2 Bugs:

1.) Bug 19574653 – ORA-7445 [sxorchk] and call stack has function krhcvt_filhdr_v10 (Doc ID 19574653.8)
2.) Bug 19566227 (base bug 20563128) described in MOS Note 2015883.1: Cross Platform Migration Using XTT Method With RMAN

The first bug 19574653 could be resolved easily, because I was allowed to install that associated patch on the target system. The second bug was more difficult, because it requires the patch 20563128 to be installed on the source and the target system. As no downtime was possible, I couldn’t install that patch on the source system.

Bug 19574653 actually shows the following behavior:

During transfer ($ORACLE_HOME/perl/bin/perl xttdriver.pl -G) the following error happens for a couple of files:

$ more getfile_sourcedir2_dwh_bub_data_p06_01.dbf_34.sql BEGIN DBMS_FILE_TRANSFER.GET_FILE( source_directory_object => 'SOURCEDIR2', source_file_name => 'DWH_BUB_DATA_P06_01.dbf', source_database => 'TTSLINK', destination_directory_object => 'DESTDIR2', destination_file_name => 'DWH_BUB_DATA_P06_01.dbf'); END; / quit $ sqlplus / as sysdba SQL> @getfile_sourcedir2_dwh_bub_data_p06_01.dbf_34.sql BEGIN * ERROR at line 1: ORA-19504: failed to create file "/DWHT/u01/oradata/DWH_BUB_DATA_P06_01.dbf" ORA-17502: ksfdcre:1 Failed to create file /DWHT/u01/oradata/DWH_BUB_DATA_P06_01.dbf ORA-17501: logical block size 141558272 is invalid ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 37 ORA-06512: at "SYS.DBMS_FILE_TRANSFER", line 132 ORA-06512: at line 2

The nasty thing is that the process of transferring files stops and you have to manually adjust the files xttnewdatafiles.txt and getfile.sql to be able to continue from the point where it failed. That’s difficult in a 7TB-DB with 875 Datafiles.

To get the failed file transferred, a workaround is described in MOS Note 2015883.1: E.g. assume DWH_BUB_DATA_P06_01.dbf couldn’t be transferred;

Source:

SQL> select file_id from dba_data_files where file_name like '%DWH_BUB_DATA_P06_01.dbf'; FILE_ID ---------- 357 $ rman target=/ RMAN> backup as copy datafile 357 format '/tmp/copy_df357.bck'; RMAN> quit $ scp /tmp/copy_df357.bck <target-system>:/tmp/copy_df357.bck

Target:

$ rman target=/ RMAN> Convert from platform 'Solaris[tm] OE (64-bit)' datafile '/tmp/copy_df357.bck' format '/DWHT/u01/oradata/DWH_BUB_DATA_P06_01.dbf'; RMAN> quit

What I wanted to achieve was that the transfer of datafiles runs through without error and I do report the failed transfers in a table and use above workaround at the end of the whole transfer phase.

Here’s what I did:

On the target system I created a table system.transfer_errors to catch the affected files of bug 19574653/20563128:

sqlplus / as sysdba drop table system.transfer_errors purge; create table system.transfer_errors (sourcedir varchar2(128), file_name VARCHAR2(513));

The workaround is to adjust the file xttdriver.pl:

Original:

Modified:

my $sqlQuery = "BEGIN DBMS_FILE_TRANSFER.GET_FILE( source_directory_object => '".$2."', source_file_name => '".$3."', source_database => '".$props{'srclink'}."', destination_directory_object => '".$4."', destination_file_name => '".$5."'); EXCEPTION when others then begin insert into system.transfer_errors values ('".$2."','".$3."'); commit; end; END; / ";

I.e. I wrote an exception to ignore any error and report the failed transfers in my table system.transfer_errors.

REMARK: The workaround could be more sophisticated by writing an exception handler for ORA-19504 only and write the full error-stack plus destination directories in the table system.transfer_errors:

create table system.transfer_errors (error_code number, error_message varchar2(200), sourcedir varchar2(128), source_file_name VARCHAR2(513), targetdir varchar2(128), target_file_name VARCHAR2(513));

And adjust the code in xttdriver.pl:

my $sqlQuery = "DECLARE bug20563128 EXCEPTION; PRAGMA EXCEPTION_INIT(bug20563128, -19504); BEGIN DBMS_FILE_TRANSFER.GET_FILE( source_directory_object => '".$2."', source_file_name => '".$3."', source_database => '".$props{'srclink'}."', destination_directory_object => '".$4."', destination_file_name => '".$5."'); EXCEPTION when bug20563128 then begin insert into system.transfer_errors values (SQLCODE,substr(SQLERRM,1,200),'".$2."','".$3."','".$4."','".$5."'); commit; end; END; / ";

With above code in xttdriver.pl the transfer runs through and I just checked the table system.transfer_errors at the end to see what files I have to transfer manually.
I do hope that the workaround helps others to ease such a Cross Platform Migration.

Cet article Workaround for bug 19566227/20563128 doing Cross Platform Migration (MOS Note 1389592.1) est apparu en premier sur Blog dbi services.

When you do not know what the Oracle Trace File Analyzer (tfa) is or you have heard about it but you don’t know what it is for then you probably should read this, especially when you are working in clustered Oracle environments. You know, Oracle loves to create plenty of trace files in various places (yes, this got much better with the Automatic Diagnostics Repository(ADR)). Now imagine you have a cluster and something goes wrong, where do you start? We created a very nice picture for our Grid Infrastructure workshop which gives an idea of all the components and where it can go wrong. Lets start with the components.

This is approximately what you get when you install a three node Oracle RAC infrastructure (I know there are even more when you go for a flex cluster but this is not important for the scope of this post):

These are quite a few components and all need to work together properly for the cluster being healthy and doing what is expected. And when you have so many components many things can go wrong:

Depending on how good your monitoring is you might know where to start looking for the issue quite fast or you can rely on your experience for starting to troubleshoot. Always a good staring point are the alert logs of the cluster, asm and the database. They all are usually located under $ORACLE_BASE and there are many, many other trace files and directories. For a 12.2 database it looks like this:

oracle@oelrac1:/u01/app/oracle/diag/rdbms/db1/DB1_1/ [DB1_1] ls –la
drwxr-x---.  2 oracle asmadmin    20 Mar 21 14:55 alert
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 cdump
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 hm
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 incident
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 incpkg
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 ir
drwxr-x---.  2 oracle asmadmin  4096 Mar 21 15:00 lck
drwxr-x---.  7 oracle asmadmin    60 Mar 21 14:55 log
drwxr-x---.  2 oracle asmadmin  4096 Mar 21 15:00 metadata
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 metadata_dgif
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 metadata_pv
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 stage
drwxr-x---.  2 oracle asmadmin     6 Mar 21 14:55 sweep
drwxr-x---.  2 oracle asmadmin 36864 May 23 08:26 trace

For a 12.2 Grid Infrastrucure it looks like this:

oracle@oelrac1:/u01/app/oracle/diag/crs/oelrac1/crs/ [DB1_1] ls -la
drwxrwxr-x.  2 oracle oinstall    20 Mar 21 12:59 alert
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 cdump
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 incident
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 incpkg
drwxrwxr-x.  2 oracle oinstall  4096 Mar 21 12:59 lck
drwxrwxr-x.  4 oracle oinstall    29 Mar 21 12:59 log
drwxrwxr-x.  2 oracle oinstall  4096 Mar 21 12:59 metadata
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 metadata_dgif
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 metadata_pv
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 stage
drwxrwxr-x.  2 oracle oinstall     6 Mar 21 12:59 sweep
drwxrwxr-x.  2 oracle oinstall 20480 May 23 08:28 trace

But there are even more when you go one level up:

oracle@oelrac1:/u01/app/oracle/diag/ [DB1_1] ls -la
drwxrwxr-x.  3 oracle oinstall   22 Mar 21 13:02 afdboot
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 apx
drwxrwxr-x.  5 oracle oinstall   51 Mar 21 13:04 asm
drwxrwxr-x.  4 oracle oinstall   40 Mar 21 13:04 asmtool
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 bdsql
drwxrwxr-x.  4 oracle oinstall   40 Mar 21 13:05 clients
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 diagtool
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 dps
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 em
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 gsm
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 ios
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 lsnrctl
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 netcman
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 ofm
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 plsql
drwxrwxr-x.  2 oracle oinstall    6 Mar 21 12:57 plsqlapp
drwxrwxr-x.  3 oracle oinstall   20 Mar 21 13:08 tnslsnr

It can quite take some time to locate the trace file which contains the information you need for troubleshooting or for uploading to Oracle support. And this is where the Oracle Trace File Analyzer is a great help. There is a great support note to get started and to download the latest bundle: TFA Collector – TFA with Database Support Tools Bundle (Doc ID 1513912.1).

As the note explains TFA is a bundle of tools. What you get when you download the bundle is this:

So in words TFA consists of: A collecter, an Analyzer, the tfactl command line utility and a bunch of tools. Probably you already know one or more of these tools and maybe you already installed and used some of them. Stop doing that immediately: Use TFA which brings them all. To come back to the components picture from above: Once you installed TFA the picture looks like this:

You use tfactl (the initiator) to talk to a local TFA daemon that can talk to the TFA daemons on all other nodes in the cluster. When you install an Oracle 12.2 Grid Infrastructure tfactl is already there:

oracle@oelrac1:/var/tmp/ [+ASM1] which tfactl
/u01/app/12.2.0.1/grid/bin/tfactl

The issue with that is that you do not have all of the support tools available, e.g. you won’t have SQLT, DA/RDA, Procwatcher and OSWatcher. For getting those you’ll need to download the complete bundle from the support note referenced above and then remove the current TFA installation and re-install it:

[root@oelrac1 tmp]$ /u01/app/12.2.0.1/grid/bin/tfactl uninstall
[root@oelrac1 tmp]$ ./installTFALite 
Enter a location for installing TFA (/tfa will be appended if not supplied) [/var/tmp/tfa]:
/u01/app/12.2.0.1/grid/tfa
Enter a Java Home that contains Java 1.5 or later : /u01/app/12.2.0.1/grid/jdk/

Once this is through you should see all the tools:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl toolstatus
.--------------------------------------.
|        External Support Tools        |
+---------+--------------+-------------+
| Host    | Tool         | Status      |
+---------+--------------+-------------+
| oelrac1 | oswbb        | RUNNING     |
| oelrac1 | darda        | DEPLOYED    |
| oelrac1 | prw          | NOT RUNNING |
| oelrac1 | orachk       | DEPLOYED    |
| oelrac1 | vi           | DEPLOYED    |
| oelrac1 | changes      | DEPLOYED    |
| oelrac1 | ps           | DEPLOYED    |
| oelrac1 | param        | DEPLOYED    |
| oelrac1 | events       | DEPLOYED    |
| oelrac1 | alertsummary | DEPLOYED    |
| oelrac1 | ls           | DEPLOYED    |
| oelrac1 | dbperf       | DEPLOYED    |
| oelrac1 | sqlt         | DEPLOYED    |
| oelrac1 | summary      | DEPLOYED    |
| oelrac1 | pstack       | DEPLOYED    |
| oelrac1 | tail         | DEPLOYED    |
| oelrac1 | oratop       | DEPLOYED    |
| oelrac1 | dbglevel     | DEPLOYED    |
| oelrac1 | exachk       | DEPLOYED    |
| oelrac1 | grep         | DEPLOYED    |
| oelrac1 | srdc         | DEPLOYED    |
| oelrac1 | history      | DEPLOYED    |
'---------+--------------+-------------'

I will not go into all the tools but highlight some of them. Lets start with oratop. When you want to know which statements are currently executing in your instance:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl 

tfactl> oratop -database DB1

This bring up something very similar to the “top” command which comes with the operating system but displays information of what is going on in the database:

Can be quite handy when you only have access via ssh. Another great tool is OSWatcher which gathers operating statistics in the background and is able to create graphs:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl run oswbb

Starting OSW Analyzer V7.3.3
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c)  2014 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...


Parsing file oelrac1_iostat_17.05.23.1500.dat ...
Parsing file oelrac1_iostat_17.05.23.1600.dat ...
Parsing file oelrac1_iostat_17.05.24.0800.dat ...
Parsing file oelrac1_iostat_17.05.24.0900.dat ...
Parsing file oelrac1_iostat_17.05.24.1000.dat ...
Parsing file oelrac1_iostat_17.05.24.1100.dat ...


Parsing file oelrac1_vmstat_17.05.23.1500.dat ...
Parsing file oelrac1_vmstat_17.05.23.1600.dat ...
Parsing file oelrac1_vmstat_17.05.24.0800.dat ...
Parsing file oelrac1_vmstat_17.05.24.0900.dat ...
Parsing file oelrac1_vmstat_17.05.24.1000.dat ...
Parsing file oelrac1_vmstat_17.05.24.1100.dat ...


Parsing file oelrac1_netstat_17.05.23.1500.dat ...
Parsing file oelrac1_netstat_17.05.23.1600.dat ...
Parsing file oelrac1_netstat_17.05.24.0800.dat ...
Parsing file oelrac1_netstat_17.05.24.0900.dat ...
Parsing file oelrac1_netstat_17.05.24.1000.dat ...
Parsing file oelrac1_netstat_17.05.24.1100.dat ...


Parsing file oelrac1_top_17.05.23.1500.dat ...
Parsing file oelrac1_top_17.05.23.1600.dat ...
Parsing file oelrac1_top_17.05.24.0800.dat ...
Parsing file oelrac1_top_17.05.24.0900.dat ...
Parsing file oelrac1_top_17.05.24.1000.dat ...
Parsing file oelrac1_top_17.05.24.1100.dat ...


Parsing file oelrac1_ps_17.05.23.1500.dat ...
Parsing file oelrac1_ps_17.05.23.1600.dat ...
Parsing file oelrac1_ps_17.05.24.0800.dat ...
Parsing file oelrac1_ps_17.05.24.0900.dat ...
Parsing file oelrac1_ps_17.05.24.1000.dat ...
Parsing file oelrac1_ps_17.05.24.1100.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter T to Alter Graph Time Scale Only (Does not change analysis dataset)
Enter D to Return to Default Graph Time Scale
Enter R to Remove Currently Displayed Graphs

Enter A to Analyze Data
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)

Enter P to Generate A Profile
Enter X to Export Parsed Data to File
Enter Q to Quit Program

Please Select an Option:4

When you for example chose “4 Display Memory Graphs” this will bring up nice graphs about the memory consumption:

Very handy as well. But what do you do when you have a seriou issues and need to collect all the trace/log files? Quite easy as well, lets do a short demo and force an ORA-07445:

oracle@oelrac1:/home/oracle/ [+ASM1] kill -l | grep SIGSEGV
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
oracle@oelrac1:/home/oracle/ [+ASM1] ps -ef | grep dbw  | grep DB1
oracle   14821     1  0 08:20 ?        00:00:00 ora_dbw0_DB1_1
oracle@oelrac1:/home/oracle/ [+ASM1] kill -11 14821

This will cause the database to restart, so please don’t do this on other systems than your lab systems. What we can do now is to ask TFA to collect all the files from all nodes which are required for troubleshooting the issue:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl diagcollect -srdc ora7445
Enter the time of the ORA-07445 [YYYY-MM-DD HH24:MI:SS,=ALL] :            
Enter the Database Name [=ALL] : DB1

1. May/24/2017 09:39:52 : [db1] ORA-07445: exception encountered: core dump [semtimedop()+10] [SIGSEGV] [ADDR:0xD43100006015] [PC:0x7F746C15DFCA] [unknown code] []

Please choose the event : 1-1 [1] 1
…
Logs are being collected to: /u01/app/oracle/tfa/repository/srdc_ora7445_collection_Wed_May_24_09_43_07_CEST_2017_node_local
/u01/app/oracle/tfa/repository/srdc_ora7445_collection_Wed_May_24_09_43_07_CEST_2017_node_local/oelrac1.tfa_srdc_ora7445_Wed_May_24_09_43_07_CEST_2017.zip

Get the zip file and either start to analyze yourself or upload to Oracle Support. Usually they should not ask you for additional log or trace files anymore Saves a lot of time.

If you prefer a menu driven interface to all the tools you can have that as well:

oracle@oelrac1:/home/oracle/ [+ASM1] tfactl run darda

This will bring up a menu similar to a Linux text based installation program:

Chose what you want to do, e.g. “1” for “Oracle Database: Collectors”:

Go to “2 Optimize Performance”:

You have a “1 Slow Running Database”, so:

… and you have the possibility to run an AWR report from here. Of course you should monitor that TFA is running so that you get notified when things like this happen:

oracle@oelrac1:/home/oracle/ [+ASM1] sudo $ORACLE_HOME/bin/tfactl status

.----------------------------------------------------------------------------------------------.
| Host    | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+---------+---------------+------+------+------------+----------------------+------------------+
| oelrac1 | RUNNING       | 2532 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
| oelrac2 | NOT RUNNING   | -    |      |            |                      |                  |
'---------+---------------+------+------+------------+----------------------+------------------'

There is much more you can do with tfa (check the documentation) and tfa is useful for single instances as well. Hope this helps ….

Cet article You do use the Oracle Trace File Analyzer, don’t you? est apparu en premier sur Blog dbi services.

As this project is going to start soon and I learned that there is a x86 version of Solaris 8, today I thought it would be great if I can get a test setup up and running. First issue: Where to get the Solaris 8 distribution from. This turned out be quite easy, there is a support note which links to all the releases: Where to download Oracle Solaris ISO images and Update Releases (Doc ID 1277964.1).

The only settings I changed in VirtualBox from the default are these (Audio off, USB off, Network-Adapter):

Booted the machine from the “sol-8-u7-install-ia.iso” and went through the installer. Here are some screenshots:

Up and running … The next challenge will be to get openssh up and running.

Cet article Getting Solaris 8 x86 up and running in Virtual Box est apparu en premier sur Blog dbi services.

In a previous post I introduced the new 12cR2 feature where some DDL operations can use the same rolling invalidation than what is done with dbms_stats. On tables, DDL deferred invalidation is available only for operations on partitions. Here is how it works for partition exchange.

Here is my session environment:
SQL> whenever sqlerror exit failure SQL> alter session set nls_date_format='hh24:mi:ss'; Session altered. SQL> alter session set session_cached_cursors=0; Session altered. SQL> alter session set optimizer_dynamic_sampling=0; Session altered. SQL> alter system set "_optimizer_invalidation_period"=5; System SET altered. SQL> show user USER is "DEMO"

I create a partitioned table with one local index
SQL> create table DEMO (n number, p number) partition by list(p) (partition P1 values(1), partition P2 values(2)); Table DEMO created. SQL> create index DEMO on DEMO(n) local; Index DEMO created.

I create the table with same structure for exchange
SQL> create table DEMOX for exchange with table DEMO; Table DEMOX created. SQL> create index DEMOX on DEMOX(n); Index DEMOX created.

The CREATE TABLE FOR EXCHANGE do not create the indexes, but for rolling invalidation we need them. Without the same indexes, immediate invalidation occurs.

In order observe invalidation, I run queries on the partitioned tables, involving or not the partition I’ll exchange. I also run a query on the table used for exchange.
SQL> SELECT * FROM DEMO partition (P1); no rows selected SQL> SELECT * FROM DEMO partition (P2); no rows selected SQL> SELECT * FROM DEMO; no rows selected SQL> SELECT * FROM DEMOX; no rows selected

Here are the cursors and some execution plans:
SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,first_load_time,last_load_time,last_active_time,is_rolling_invalid from v$sql where sql_text like 'S%DEMO%' order by sql_text; SQL_ID SQL_TEXT CHILD_NUMBER INVALIDATIONS LOADS PARSE_CALLS EXECUTIONS FIRST_LOAD_TIME LAST_LOAD_TIME LAST_ACTIVE_TIME IS_ROLLING_INVALID ------ -------- ------------ ------------- ----- ----------- ---------- --------------- -------------- ---------------- ------------------ dd3ajp6k49u1d SELECT * FROM DEMO 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N 1ft329rx910sa SELECT * FROM DEMO partition (P1) 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N 9pp3h276waqvm SELECT * FROM DEMO partition (P2) 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N by2m6mh16tpsz SELECT * FROM DEMOX 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N SQL> select * from table(dbms_xplan.display_cursor('1ft329rx910sa',0,'basic +partition')); PLAN_TABLE_OUTPUT ----------------- EXPLAINED SQL STATEMENT: ------------------------ SELECT * FROM DEMO partition (P1) Plan hash value: 3520634703 ------------------------------------------------------ | Id | Operation | Name | Pstart| Pstop | ------------------------------------------------------ | 0 | SELECT STATEMENT | | | | | 1 | PARTITION LIST SINGLE| | 1 | 1 | | 2 | TABLE ACCESS FULL | DEMO | 1 | 1 | ------------------------------------------------------ SQL> select * from table(dbms_xplan.display_cursor('dd3ajp6k49u1d',0,'basic +partition')); PLAN_TABLE_OUTPUT ----------------- EXPLAINED SQL STATEMENT: ------------------------ SELECT * FROM DEMO Plan hash value: 1180220521 --------------------------------------------------- | Id | Operation | Name | Pstart| Pstop | --------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 1 | PARTITION LIST ALL| | 1 | 2 | | 2 | TABLE ACCESS FULL| DEMO | 1 | 2 | ---------------------------------------------------

I exchange the partition P1 with the table DEMOX. I include indexes and add the DEFERRED INVALIDATION clause

SQL> alter table DEMO exchange partition P1 with table DEMOX including indexes without validation deferred invalidation; Table DEMO altered.

If I do the same without the DEFERRED INVALIDATION clause, or without including indexes, or having different indexes, then I would see all cursors invalidated. Here only the select on the DEMOX table is invalidated:

SQL> select sql_text,child_number,invalidations,loads,parse_calls,executions,first_load_time,last_load_time,last_active_time,is_rolling_invalid from v$sql where sql_text like 'S%DEMO%' order by sql_text; SQL_TEXT CHILD_NUMBER INVALIDATIONS LOADS PARSE_CALLS EXECUTIONS FIRST_LOAD_TIME LAST_LOAD_TIME LAST_ACTIVE_TIME IS_ROLLING_INVALID -------- ------------ ------------- ----- ----------- ---------- --------------- -------------- ---------------- ------------------ SELECT * FROM DEMO 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N SELECT * FROM DEMO partition (P1) 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N SELECT * FROM DEMO partition (P2) 0 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N SELECT * FROM DEMOX 0 1 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:12 N

I expected to see the IS_ROLLING_INVALID flag changed to ‘Y’ as we can observe with other operations. I have opened an SR for that.

Rolling invalidation sets a timestamp at next execution:

SQL> SELECT * FROM DEMO partition (P1); no rows selected SQL> SELECT * FROM DEMO partition (P2); no rows selected SQL> SELECT * FROM DEMO; no rows selected SQL> SELECT * FROM DEMOX; no rows selected SQL> select sql_text,child_number,invalidations,loads,parse_calls,executions,first_load_time,last_load_time,last_active_time,is_rolling_invalid from v$sql where sql_text like 'S%DEMO%' order by sql_text; SQL_TEXT CHILD_NUMBER INVALIDATIONS LOADS PARSE_CALLS EXECUTIONS FIRST_LOAD_TIME LAST_LOAD_TIME LAST_ACTIVE_TIME IS_ROLLING_INVALID -------- ------------ ------------- ----- ----------- ---------- --------------- -------------- ---------------- ------------------ SELECT * FROM DEMO 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMO partition (P1) 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMO partition (P2) 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMOX 0 1 2 1 1 2017-05-26/10:06:12 2017-05-26/10:06:14 10:06:14 N

I expected to see IS_ROLLING_INVALID going from ‘Y’ to ‘X’ here when the random time is set for invalidation.

By default, the random time is set within a 5 hours window, but I changed “_optimizer_invalidation_period” to 5 seconds instead and I wait for this time window to be sure that invalidation occurs. And then run my queries again.

SQL> host sleep 5 SQL> SELECT * FROM DEMO partition (P1); no rows selected SQL> SELECT * FROM DEMO partition (P2); no rows selected SQL> SELECT * FROM DEMO; no rows selected SQL> SELECT * FROM DEMOX; no rows selected

Here are the new child cursors created for the ones that were marked for rolling invalidation. The IS_ROLLING_INVALID did not display anything, but it seems that it works as expected:

SQL> select sql_text,child_number,invalidations,loads,parse_calls,executions,first_load_time,last_load_time,last_active_time,is_rolling_invalid from v$sql where sql_text like 'S%DEMO%' order by sql_text; SQL_TEXT CHILD_NUMBER INVALIDATIONS LOADS PARSE_CALLS EXECUTIONS FIRST_LOAD_TIME LAST_LOAD_TIME LAST_ACTIVE_TIME IS_ROLLING_INVALID -------- ------------ ------------- ----- ----------- ---------- --------------- -------------- ---------------- ------------------ SELECT * FROM DEMO 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMO 1 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:19 10:06:19 N SELECT * FROM DEMO partition (P1) 1 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:19 10:06:19 N SELECT * FROM DEMO partition (P1) 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMO partition (P2) 1 0 1 1 1 2017-05-26/10:06:12 2017-05-26/10:06:19 10:06:19 N SELECT * FROM DEMO partition (P2) 0 0 1 2 2 2017-05-26/10:06:12 2017-05-26/10:06:12 10:06:14 N SELECT * FROM DEMOX 0 1 2 2 2 2017-05-26/10:06:12 2017-05-26/10:06:14 10:06:19 N

Here is the confirmation that those 3 cursors were not shared because they have passed the rolling invalidation window:

SQL> select sql_id,child_number,reason from v$sql_shared_cursor join v$sql using(sql_id, child_number) where sql_text like 'S%DEMO%'; SQL_ID CHILD_NUMBER REASON ------ ------------ ------ 1ft329rx910sa 0 033Rolling Invalidate Window Exceeded(3)2x414957859751495785979 1ft329rx910sa 1 by2m6mh16tpsz 0 dd3ajp6k49u1d 0 033Rolling Invalidate Window Exceeded(3)2x414957859771495785979 dd3ajp6k49u1d 1 9pp3h276waqvm 0 033Rolling Invalidate Window Exceeded(3)2x414957859781495785979 9pp3h276waqvm 1

So what?

The first observation is that deferred invalidation works with partition exchange, despite the fact that the V$SQL.IS_ROLLING_INVALID flag is not updated. I was surprised to see that rolling invalidation occurs even for the cursors accessing to the partition which was exchanged. However, the rolling invalidation occurs only if the indexes are the same. If we do not exchange the indexes, then all cursors are invalidated immediately. This means that probably the cursor parsed is compatible to run after the exchange as the indexes are guaranteed to have same structure, type, compression,…
This is a very nice feature when exchange partition is used to keep the fact table when loading new data: you load into a table and then exchange it with the latest partition. The new values are now exposed immediately and this new feature avoids a hard parse peak.

Cet article Oracle 12cR2: exchange partition deferred invalidation est apparu en premier sur Blog dbi services.

It is hot in Europe, it is summer, enjoy, but technology moves fast so you have the chance to already prepare for the next conferences. The IT Tage 2017 will happen the 11th to 14th of December this year and we are happy to be there again.

This event covers a wide range of topics and we will be there again covering and talking about:

In addition you will have the chance to join Jan from EnterpriseDB speaking about the differences between Oracle and PostgreSQL. If you want to hear more about that, you might want to join the PostgreSQL Conference Europe this year.

Maybe Jan and me get the chance to talk there as well, we submitted an interesting session, stay tuned.

Chances are high that you may find us at the ukoug, too:

… and we’ll be at the #DOAG2017 for sure this year again.

Hope to see you soon… Cheers, discussions ahead …

Cet article Summer, autumn and winter: a lot of conferences ahead est apparu en premier sur Blog dbi services.

The Oracle Database Appliance is shipped with a bare-metal installation which may not be the latest version. You may want to have it virtualized, or get the latest version to avoid further upgrade, or install an earlier version to be in the same configuration as another ODA already in production. The easiest for all cases is to start with a re-image as soon as the ODA is plugged. This post is not a documentation, just a quick cheat sheet.

I don’t want to spend hours in the data center, so the first step, once the ODA is racked, cabled and plugged, is to get it accessible from the management network. Then all tasks can be done from a laptop, accessing the ILOM interface through a browser (Java required, and preferably 32-bits) before the public network is setup.

NET MGMT

Here is the back of the server where you find the management network port.

This was an X5-2 but very similar to X6-2. You can look at the 3D view of X6-2 to get a better view.
There is also VGA and USB to plug a monitor and keyboard just for the time to setup the management network.

You can also use the serial port which is just next to it but I usually don’t have a serial adapter for my laptop.

First checks

You can plug a keyboard and monitor and log on the server: root password is welcome1 (no need to change it for the moment as we will re-image the machine)

In ODA X6-HA you have two nodes, numbered 0 and 1 and called oak1 and oak2 … be careful. You may wonder which server is node 0 and which one is node 1, because the servers are the same. Yes, that’s true. The nodes are identified from the storage port they are connected to. The node 0 is the one connected to the blue mini-SAS and the red ones are for node 1.

Blue cable – Node 0 – oak1
Red cable – Node 0 – oak2

Here I’m on node 0 and check hardware version
[root@oak1 ~]# /opt/oracle/oak/bin/oakcli show env_hw BM ODA X6-2
This means Bare Metal X6-2 HA, a virtualized one would show VM-ODA_BASE ODA X6-2, and ODA X6-2S would show BM ODA_Lite X6-2 Small.

From storage topology, we confirm that this node has been recognized as node 0:

[root@oak1 ~]# /opt/oracle/oak/bin/oakcli validate -c StorageTopology It may take a while. Please wait... INFO : ODA Topology Verification INFO : Running on Node0 INFO : Check hardware type SUCCESS : Type of hardware found : X5-2 INFO : Check for Environment (Bare Metal or Virtual Machine)

And finally check the version because you may be lucky to get the version you want and then don’t need to re-image (I switched to node 1 here):

Of course, you don’t need to re-image when you want an higher version. You can upgrade it, but re-image is simple.

BMC Network

The management network interface can get an address from DHCP. But who runs DHCP on the management network? There are two ways to assign a static IP to the management network.

You may use the IPMI Tool commands:
ipmitool -I open sunoem cli cd SP cd Network ls set pendingipadress=192.168.15.101 set pendingnetmask=255.255.255.0 set pendingipgateway=192.168.15.1 set commitpending=true
Here is an example:

Or you can go to BIOS. It goes fast but filming in slow motion shows that the key is F2:
ODA-reimage-001
Once you are in the BIOS interface, go to Advanced, then choose ‘BMC Network configuration’ to configure IPMI and enter the following BMC Lan information:

IPv4 IP Assignment [Static] IPv4 address 192.168.15.102
IPv4 Subnet Mask 255.255.255.0
IPv4 Default Gateway 192.168.15.1

Once you are in the BIOS you can choose to boot on the CD-ROM first because I sometimes have problems to set that from the ILOM Web interface.

Once you have validated that the ILOM IP address can be reached from your office, you can remove the keyboard and monitor and leave the datacenter. the hardware is ok. Now the software can be deployed from the ILOM Web interface. The password to connect to ILOM from the Web browser is ‘changeme’ and you should change it.

Software

The latest ISO image for Bare Metal or Virtualized ODA can be found from Note 88888.1
You can also find the oldest versions:

Bare Metal: https://updates.oracle.com/download/12999313.html
Virtualized: https://updates.oracle.com/download/16186163.html

From the ILOM you you get to the console of node 0:

From there, you can attach the ISO image: ILOM Remote Control / Devices / CD-ROM Images

and then reboot on the CD-ROM:

Reboot (Host Control / Next Boot Device / CD-ROM )

Do that for both nodes (you can run them at the same time) and prepare the information for the deployment and download the ‘End-User RDBMS Clone files’ for the database version you want.

You can download the Oracle Appliance Manager Configurator and take your time to setup and verify the configuration.

configure firstnet

Here is part of the information to prepare. First, you will configure the network to be able to scp the software (Grid Infrastructure and Database):

[root@oak1 ~]# /opt/oracle/oak/bin/oakcli configure firstnet Configure the network for node(s) (local,global) [global]: The network configuration for both nodes: hostname: Hostname lookup failure Domain Name: pachot.net DNS Servers: Primary DNS Server: 8.8.8.8 Node Name Host Name 0 myserver1 1 myserver2 Choose the network interface to configure (net1,net2) [net1]: Enter the IP address for net1 on Node 0: 192.168.16.101 Enter the IP address for net1 on Node 1: 192.168.16.102 Netmask for net1: 255.255.255.0 Gateway address for net1 [192.168.16.1]:

Note that this is a funny example. I hope you don’t use the 192.168.16/24 as you public network because this is used for the private interconnect where IP addresses 192.168.16.24 and 192.168.16.25 are hardcoded. But thanks to that the configure-network can be run fron one node only.

Deploy

Now that you have access through the public network, you can copy (scp) the Oracle Home clones and the configuration file to /tmp, unpack the .zip (for i in *.zip ; do /opt/oracle/oak/bin/oakcli unpack -package $i ; done) and run deploy (ssh -X /opt/oracle/oak/bin/oakcli deploy), loading the configuration from your file or entering all information from there. Crossing the fingers, this should go to the end without any problem. On the opposite, My Oracle Support notes may help. The nice thing with ODA is that most of the configurations are similar so there a good chances that a problem was already encountered and documented.

Cet article ODA X6 installation: re-image est apparu en premier sur Blog dbi services.

When creating a DB on the ODA using

# oakcli create database -db <db-name>

a template is used to set a couple of DB parameters like e.g.

_datafile_write_errors_crash_instance=FALSE _db_writer_coalesce_area_size=16777216 _disable_interface_checking=TRUE _ENABLE_NUMA_SUPPORT=FALSE _FILE_SIZE_INCREASE_INCREMENT=2143289344 _gc_policy_time=0 _gc_undo_affinity=FALSE db_block_checking='FULL' db_block_checksum='FULL' db_lost_write_protect='TYPICAL' filesystemio_options='setall' parallel_adaptive_multi_user=FALSE parallel_execution_message_size=16384 parallel_threads_per_cpu=2 use_large_pages='ONLY'

In recent projects I saw a couple of DBs running on ODA, which did not have (all) those parameters set, because the DBs were migrated from a non-ODA-platform and the customer took over the previous settings.

The questions are: Are above parameters mandatory on ODA and where do I find them?

Actually Oracle writes in the documentation

http://docs.oracle.com/cd/E83239_01/doc.121/e83201/database-templates-oracle-database-appliance.htm#CMTAR269

“Oracle strongly recommends that you use the Oracle Database Appliance templates, because they implement best practices and are configured specifically for Oracle Database Appliance.”

So it’s not mandatory, but it’s “strongly recommended” by Oracle to set those parameters on ODA.

The parameters are actually defined in the XML-files
/opt/oracle/oak/onecmd/templates/OAK_oltp.dbt /opt/oracle/oak/onecmd/templates/OAK_dss.dbt
E.g. on a virtualized X5-2 HA with 12.1.2.8.0 installed:

# grep "initParam name" /opt/oracle/oak/onecmd/templates/OAK_oltp.dbt <initParam name="AUDIT_SYS_OPERATIONS" value="TRUE"/> <initParam name="AUDIT_TRAIL" value="DB"/> <initParam name="GLOBAL_NAMES" value="TRUE"/> <initParam name="OS_AUTHENT_PREFIX" value=""/> <initParam name="SQL92_SECURITY" value="TRUE"/> <initParam name="PARALLEL_ADAPTIVE_MULTI_USER" value="FALSE"/> <initParam name="PARALLEL_EXECUTION_MESSAGE_SIZE" value="16384"/> <initParam name="PARALLEL_THREADS_PER_CPU" value="2"/> <initParam name="_disable_interface_checking" value="TRUE"/> <initParam name="_gc_undo_affinity" value="FALSE"/> <initParam name="_gc_policy_time" value="0"/> <initParam name="SESSION_CACHED_CURSORS" value="100"/> <initParam name="OPEN_CURSORS" value="1000"/> <initParam name="CURSOR_SHARING" value="EXACT"/> <initParam name="_ENABLE_NUMA_SUPPORT" value="FALSE"/> <initParam name="DB_LOST_WRITE_PROTECT" value="TYPICAL"/> <initParam name="DB_BLOCK_CHECKSUM" value="FULL"/> <initParam name="DB_BLOCK_CHECKING" value="FULL"/> <initParam name="FAST_START_MTTR_TARGET" value="300"/> <initParam name="UNDO_RETENTION" value="900"/> <initParam name="_FILE_SIZE_INCREASE_INCREMENT" value="2143289344"/> <initParam name="FILESYSTEMIO_OPTIONS" value="setall"/> <initParam name="use_large_pages" value="only"/> <initParam name="DB_FILES" value="1024"/> <initParam name="processes" value="4800"/> <initParam name="pga_aggregate_target" value="49152" unit="MB"/> <initParam name="sga_target" value="98304" unit="MB"/> <initParam name="db_create_file_dest" value="+DATA"/> <initParam name="log_buffer" value="64000000" /> <initParam name="cpu_count" value="48"/> <initParam name="pga_aggregate_limit" value="49152" unit="MB"/> <initParam name="_datafile_write_errors_crash_instance" value="false"/> <initParam name="_fix_control" value="18960760:on"/> <initParam name="db_block_size" value="8" unit="KB"/> <initParam name="compatible" value="11.2.0.x.0"/> <initParam name="undo_tablespace" value="UNDOTBS1"/> <initParam name="control_files" value="("{ORACLE_BASE}/oradata/{DB_UNIQUE_NAME}/control01.ctl")"/> <initParam name="audit_file_dest" value="{ORACLE_BASE}/admin/{DB_UNIQUE_NAME}/adump"/> <initParam name="audit_trail" value="db"/> <initParam name="diagnostic_dest" value="{ORACLE_BASE}"/> <initParam name="remote_login_passwordfile" value="EXCLUSIVE"/> <initParam name="dispatchers" value="(PROTOCOL=TCP) (SERVICE={SID}XDB)"/> <initParam name="db_recovery_file_dest" value="+RECO"/> <initParam name="db_recovery_file_dest_size" value="1843200" unit="MB"/> <initParam name="db_create_online_log_dest_1" value="+REDO" /> <initParam name="_db_writer_coalesce_area_size" value="16777216"/>

Oracle does not take those parameters blindly when creating a DB with oakcli, but adjusts them as e.g. described in the Blog

https://blog.dbi-services.com/oda-32gb-template-but-got-a-database-with-16gb-sga/

I.e. the SGA_TARGET and PGA_AGGREGATE_TARGET parameters are adjusted based on the chosen database class. Also the COMPATIBLE-parameter is set to the current release (e.g. to 12.1.0.2.0).

So if you’re not able to create the DB on ODA using
# oakcli create database -db <db-name>
then I recommend to check the XML-file OAK_oltp.dbt (or in case of a Decision Support System/Warehouse-DB the file OAK_dss.dbt) and set the parameters in your database accordingly. Alternatively (and probably even better) you may create a dummy-DB using oakcli and check Oracle’s settings, which can then be used in your migrated DB, and drop the dummy-DB afterwards.

Here the parameters of a 12c-DB created on a virtualized ODA X6-2 HA 12.1.2.11.0 with oakcli create database using the smallest setting with 1 Core and the DSS-template:

*._datafile_write_errors_crash_instance=false *._db_writer_coalesce_area_size=16777216 *._disable_interface_checking=TRUE *._ENABLE_NUMA_SUPPORT=FALSE *._FILE_SIZE_INCREASE_INCREMENT=2143289344 *._fix_control='18960760:on' *._gc_policy_time=0 *._gc_undo_affinity=FALSE *.audit_file_dest='/u01/app/oracle/admin/C12TEST/adump' *.audit_sys_operations=TRUE *.audit_trail='db' *.cluster_database=true *.compatible='12.1.0.2.0' *.control_files='/u01/app/oracle/oradata/datastore/C12TEST/C12TEST/controlfile/o1_mf_dpw4ljnv_.ctl' *.cpu_count=2 *.cursor_sharing='EXACT' *.db_block_checking='FULL' *.db_block_checksum='FULL' *.db_block_size=8192 *.db_create_file_dest='/u02/app/oracle/oradata/datastore/.ACFS/snaps/C12TEST' *.db_create_online_log_dest_1='/u01/app/oracle/oradata/datastore/C12TEST' *.db_domain='' *.db_files=1024 *.db_lost_write_protect='TYPICAL' *.db_name='C12TEST' *.db_recovery_file_dest='/u01/app/oracle/fast_recovery_area/datastore/C12TEST' *.db_recovery_file_dest_size=476160m *.diagnostic_dest='/u01/app/oracle' *.dispatchers='(PROTOCOL=TCP) (SERVICE=C12TESTXDB)' *.fast_start_mttr_target=300 *.filesystemio_options='setall' *.global_names=TRUE *.inmemory_size=0m *.log_archive_format='%t_%s_%r.dbf' *.log_buffer=16000000 *.nls_language='AMERICAN' *.nls_territory='AMERICA' *.open_cursors=1000 *.os_authent_prefix='' *.parallel_adaptive_multi_user=TRUE *.parallel_degree_policy='MANUAL' *.parallel_execution_message_size=16384 *.parallel_force_local=FALSE *.parallel_max_servers=80 *.parallel_min_servers=8 *.parallel_threads_per_cpu=2 *.pga_aggregate_limit=2048m *.pga_aggregate_target=2048m *.processes=200 *.remote_login_passwordfile='exclusive' *.session_cached_cursors=100 *.sga_target=1024m *.sql92_security=TRUE *.undo_retention=900 *.use_large_pages='ONLY'

Cet article Recommended DB Parameters on the Oracle Database Appliance (ODA) est apparu en premier sur Blog dbi services.

In this serie, I completed the passwords I wanted to talk about on the Content Server. Therefore in this blog, I will talk about the only Database Account that is relevant for Documentum: the Database Schema Owner. Since there are a few steps to be done on the Content Server, I’m just doing everything from there… In this blog, I will assume there is one Global Registry (GR_DOCBASE) and one normal Repository (DocBase1). Each docbase has a different Database Schema Owner of course but both Schemas are on the same Database and therefore the same SID will be used.

In High Availability setups, you will have to execute the steps below for all Content Servers. Of course, when it comes to changing the password inside the DB, this needs to be done only once since the Database Schema Owner is shared between the different Content Servers of the HA setup.

In this blog, I’m using a CS 7.2. Please note that in CS 7.2, there is a property inside the dfc.properties of the Content Server ($DOCUMENTUM_SHARED/config/dfc.properties) that defines the crypto repository (dfc.crypto.repository). The repository that is used for this property is the one that Documentum will use for encryption/decryption of passwords and therefore I will use this one below to encrypt the password. By default, the Repository used for this property is the last one created… I tend to use the Global Registry instead, but it’s really up to you.

As said before, I’m considering two different repositories and therefore two different accounts and two different passwords. So, let’s start with encrypting these two passwords:

[dmadmin@content_server_01 ~]$ read -s -p "Please enter the NEW GR_DOCBASE Schema Owner's password: " new_gr_pw; echo
Please enter the NEW GR_DOCBASE Schema Owner's password:
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ read -s -p "Please enter the NEW DocBase1 Schema Owner's password: " new_doc1_pw; echo
Please enter the NEW DocBase1 Schema Owner's password:
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ iapi `cat $DOCUMENTUM_SHARED/config/dfc.properties | grep crypto | tail -1 | sed 's/.*=//'` -Udmadmin -Pxxx << EOF
> encrypttext,c,${new_gr_pw}
> encrypttext,c,${new_doc1_pw}
> EOF


    EMC Documentum iapi - Interactive API interface
    (c) Copyright EMC Corp., 1992 - 2015
    All rights reserved.
    Client Library Release 7.2.0150.0154


Connecting to Server using docbase GR_DOCBASE
[DM_SESSION_I_SESSION_START]info:  "Session 010f12345605ae7b started for user dmadmin."


Connected to Documentum Server running Release 7.2.0160.0297  Linux64.Oracle
Session id is s0
API> ...
DM_ENCR_TEXT_V2=AAAAEH7UNwFub2ubf92h+21/rc8HEc3rd1C82hc52c8bz2cFl1cQ721zex2nxWDEegwqgdotwncZVVqgZlDLmfflWK6+f8AGf0dSRzi5rr3h3::GR_DOCBASE
API> ...
DM_ENCR_TEXT_V2=AAAAEGBQ6Zy7FxQ10idQdFj+Gn20nFlif02ieMx+AGBHLz+vQfmGu2GAiv8KeIN2PhPOf1oiF9u2fP98zEFhhuBAmxY+d5AoBCGNf61ZRavpa::GR_DOCBASE
API> Bye
[dmadmin@content_server_01 ~]$

If you have more repositories, you will have to encrypt those too, if you want to change them of course. Once the new password has been encrypted, we can change it on the Database. To avoid any issues and error messages, let’s first stop Documentum (the docbases at the very least) and then printing the Database Connection information:

[dmadmin@content_server_01 ~]$ service documentum stop
  ** JMS stopped
  ** DocBase1 stopped
  ** GR_DOCBASE stopped
  ** Docbroker stopped
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora
<sid> =
    (DESCRIPTION =
        (ADDRESS_LIST =
            (ADDRESS = (PROTOCOL = TCP)(HOST = <database_hostname>)(PORT = <database_port>))
        )
        (CONNECT_DATA =
            (SERVICE_NAME = <service_name>)
        )
    )
[dmadmin@content_server_01 ~]$

Once you know what the SID is, you can now login to the database to change the password so I will do that for both repositories. This could also be scripted to retrieve the list of docbases, create new passwords for them, encrypt them all automatically and then connect to each database using different SQL scripts to change the passwords, however I will use here manual steps:

[dmadmin@content_server_01 ~]$ sqlplus GR_DOCBASE@<sid>

SQL*Plus: Release 12.1.0.2.0 Production on Sat Jul 22 15:05:08 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password:
    -->> Enter here the OLD GR_DOCBASE Schema Owner's password
Last Successful login time: Sat Jul 22 2017 15:04:18 +00:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options

SQL> PASSWORD
Changing password for GR_DOCBASE
Old password:
    -->> Enter here the OLD GR_DOCBASE Schema Owner's password
New password:
    -->> Enter here the NEW GR_DOCBASE Schema Owner's password
Retype new password:
    -->> Re-enter here the NEW GR_DOCBASE Schema Owner's password
Password changed

SQL> quit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ sqlplus DocBase1@<sid>

SQL*Plus: Release 12.1.0.2.0 Production on Sat Jul 22 15:08:20 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter password:
    -->> Enter here the OLD DocBase1 Schema Owner's password
Last Successful login time: Sat Jul 22 2017 15:07:10 +00:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options

SQL> PASSWORD
Changing password for DocBase1
Old password:
    -->> Enter here the OLD DocBase1 Schema Owner's password
New password:
    -->> Enter here the NEW DocBase1 Schema Owner's password
Retype new password:
    -->> Re-enter here the NEW DocBase1 Schema Owner's password
Password changed

SQL> quit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics, Real Application Testing
and Unified Auditing options
[dmadmin@content_server_01 ~]$

At this point, the passwords have been changed in the database and we encrypted them properly. The next step is therefore to update the password files for each repository with the encrypted password so that the repositories can start again:

[dmadmin@content_server_01 ~]$ cd $DOCUMENTUM/dba/config
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ for i in `ls -d *`; do echo "  ** dbpasswd.txt for ${i} **"; cat ./${i}/dbpasswd.txt; echo; done
  ** dbpasswd.txt for GR_DOCBASE **
DM_ENCR_TEXT_V2=AAAAEH7UNwFgncubfd1C82hc5l1cwqgdotwQ7212c8bz2cFZVVqgZub2zex8bz2cFWK92h+21EelDLmffl2/rc82c8bz2cFf0dSRazi5rr3h3::GR_DOCBASE

  ** dbpasswd.txt for DocBase1 **
DM_ENCR_TEXT_V2=AAAAQ10idQdFj+Gn2EGBPZy7e0niF9uQfAGBHLz+vv8KQ62fP98zE+02iFhhuBAmxY+FFxeMxIN2Phl1od5AoBCGNf61ZRifmGu2GAiOfavpa::GR_DOCBASE

[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ for i in `ls -d *`; do cp ./${i}/dbpasswd.txt ./${i}/dbpasswd.txt_bck_$(date +"%Y%m%d-%H%M%S"); done
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ echo "DM_ENCR_TEXT_V2=AAAAEH7UNwFub2ubf92h+21/rc8HEc3rd1C82hc52c8bz2cFl1cQ721zex2nxWDEegwqgdotwncZVVqgZlDLmfflWK6+f8AGf0dSRzi5rr3h3::GR_DOCBASE" > ./GR_DOCBASE/dbpasswd.txt
[dmadmin@content_server_01 ~]$ echo "DM_ENCR_TEXT_V2=AAAAEGBQ6Zy7FxQ10idQdFj+Gn20nFlif02ieMx+AGBHLz+vQfmGu2GAiv8KeIN2PhPOf1oiF9u2fP98zEFhhuBAmxY+d5AoBCGNf61ZRavpa::GR_DOCBASE" > ./DocBase1/dbpasswd.txt
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ for i in `ls -d *`; do echo "  ** dbpasswd.txt for ${i} **"; cat ./${i}/dbpasswd.txt; echo; done
  ** dbpasswd.txt for GR_DOCBASE **
DM_ENCR_TEXT_V2=AAAAEH7UNwFub2ubf92h+21/rc8HEc3rd1C82hc52c8bz2cFl1cQ721zex2nxWDEegwqgdotwncZVVqgZlDLmfflWK6+f8AGf0dSRzi5rr3h3::GR_DOCBASE

  ** dbpasswd.txt for DocBase1 **
DM_ENCR_TEXT_V2=AAAAEGBQ6Zy7FxQ10idQdFj+Gn20nFlif02ieMx+AGBHLz+vQfmGu2GAiv8KeIN2PhPOf1oiF9u2fP98zEFhhuBAmxY+d5AoBCGNf61ZRavpa::GR_DOCBASE

[dmadmin@content_server_01 ~]$

Once the dbpasswd.txt files have been updated with the new encrypted password that has been generated at the beginning of this blog, then we can restart Documentum and verify that the docbases are up&running. If they are, then the password has been changed properly!

[dmadmin@content_server_01 ~]$ service documentum start
  ** Docbroker started
  ** GR_DOCBASE started
  ** DocBase1 started
  ** JMS started
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ ps -ef | grep "documentum.*docbase_name"
...
[dmadmin@content_server_01 ~]$
[dmadmin@content_server_01 ~]$ grep -C3 "DM_DOCBROKER_I_PROJECTING" $DOCUMENTUM/dba/log/GR_DOCBASE.log
2017-07-22T15:28:40.657360      9690[9690]      0000000000000000        [DM_SERVER_I_START]info:  "Sending Initial Docbroker check-point "

2017-07-22T15:28:40.671878      9690[9690]      0000000000000000        [DM_MQ_I_DAEMON_START]info:  "Message queue daemon (pid : 9870, session 010f123456000456) is started sucessfully."
2017-07-22T15:28:40.913699      9869[9869]      010f123456000003        [DM_DOCBROKER_I_PROJECTING]info:  "Sending information to Docbroker located on host (content_server_01) with port (1490).  Information: (Config(GR_DOCBASE), Proximity(1), Status(Open), Dormancy Status(Active))."
Tue Jul 22 15:29:38 2017 [INFORMATION] [AGENTEXEC 10309] Detected during program initialization: Version: 7.2.0160.0297  Linux64
Tue Jul 22 15:29:44 2017 [INFORMATION] [AGENTEXEC 10309] Detected during program initialization: Agent Exec connected to server GR_DOCBASE:  [DM_SESSION_I_SESSION_START]info:  "Session 010f123456056d00 started for user dmadmin."

[dmadmin@content_server_01 ~]$

When the docbase has been registered to the Docbroker, you are sure that it was able to contact and log in to the database so that the new password is now used properly. To be sure that everything in Documentum is working properly however, I would still check the complete log file…

Cet article Documentum – Change password – 7 – DB – Schema Owner est apparu en premier sur Blog dbi services.

This is the start of a series on PostgreSQL execution plans, access path, join methods, hints and execution statistics. The approach will compare Postgres and Oracle. It is not a comparison to see which one is better, but rather to see what is similar and where the approaches diverge. I have a long experience of reading Oracle execution plans and no experience at all on Postgres. This is my way to learn and share what I learn. You will probably be interested if you are in the same situation: an Oracle DBA wanting to learn about Postgres. But you may also be an experienced Postgres DBA who wants to see a different point of view from a different ‘culture’.

I’ll probably use the Oracle terms more often as I’m more familiar with them: blocks for pages, optimizer for query planner, rows for tuples, tables for relations…

Please, don’t hesitate to comment on the blog posts or through twitter (@FranckPachot) if you find some mistakes in my Postgres interpretation. I tend to verify any assumption in the same way I do it with Oracle: the documented behavior and the test result should match. My test should be fully reproducible (using Postgres 9.6.2 here with all defaults). But as I said above, I’ve not the same experience as I have on Oracle when interpreting execution statistics.

Postgres

I’m using the latest versions here. Postgres 9.6.2 (as the one I installed here)
I’ve installed pg_hint_plan to be able to control the execution plan with hints. This is mandatory when doing some research. In order to understand an optimizer (query planner) choice, we need to see the estimated cost for different possibilities. Most of my tests will be done with: EXPLAIN (ANALYZE,VERBOSE,COSTS,BUFFERS)
fpa=# explain (analyze,verbose,costs,buffers) select 1; QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: 1 Planning time: 0.060 ms Execution time: 0.036 ms (4 rows)
I my go further with unix tools (like strace to see the system calls)

Oracle

I’m using Oracle 12.2 here and the tests are done by running the statement after setting ALTER SESSION SET STATISTICS_LEVEL=ALL and displaying the execution plan with DBMS_XPLAN:
select * from dbms_xplan.display_cursor(format=>'+cost allstats last -plan_hash +projection');
Note that if you are in lower Oracle versions, you need to call dbms_xplan through the table() function:
select * from table(dbms_xplan.display_cursor(format=>'+cost allstats last -plan_hash +projection'));
Example:
SQL> set arraysize 5000 linesize 150 trimspool on pagesize 1000 feedback off termout off SQL> alter session set statistics_level=all; SQL> select 1 from dual; SQL> set termout on SQL> select * from dbms_xplan.display_cursor(format=>'+cost allstats last -plan_hash +projection'); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 520mkxqpf15q8, child number 0 ------------------------------------- select 1 from dual -------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | | 1 | FAST DUAL | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | --------------------------------------------------------------------------------------

I’ll probably never compare the execution time, as this depends on the system and makes no sense on artificial small examples. But I’ll try to compare all other statistics: estimated cost, the actual number of pages/blocks read, etc.

Table of content

I’ll update (or rather insert /*+ append */) the links to the series posts as soon as they are published.

Cet article Postgres vs. Oracle access paths – intro est apparu en premier sur Blog dbi services.

Here is the first test I’ve done for my Postgres vs. Oracle access paths series and the first query did a sequential scan. It illustrates the first constant you find in the documentation for the query planner:
seq_page_cost (floating point)
Sets the planner’s estimate of the cost of a disk page fetch that is part of a series of sequential fetches. The default is 1.0.

Table creation

I start by creating a very simple table with 10000 rows and 3 columns. The first column(n) is indexed:
create table demo1 as select generate_series n , 1 a , lpad('x',1000,'x') x from generate_series(1,10000); SELECT 10000 create unique index demo1_n on demo1(n); CREATE INDEX analyze verbose demo1; INFO: analyzing "public.demo1" INFO: "demo1": scanned 1429 of 1429 pages, containing 10000 live rows and 0 dead rows; 10000 rows in sample, 10000 estimated total rows ANALYZE select relkind,relname,reltuples,relpages from pg_class where relname='demo1'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- r | demo1 | 10000 | 1429 select relkind,relname,reltuples,relpages from pg_class where relname='demo1_n'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- i | demo1_n | 10000 | 30
I checked the table and index statistics that will be used by the optimizer: 10000 rows, all indexed, 1429 table blocks and 30 index blocks. Note that blocks are called pages, but that’s the same idea: the minimal size read and written to disk. They are also called buffers as they are read into a buffer and cached in the buffer cache.

Here is how I create a similar table in Oracle:
create table demo1 as select rownum n , 1 a , lpad('x',1000,'x') x from xmltable('1 to 10000'); Table created. create unique index demo1_n on demo1(n); Index created. exec dbms_stats.gather_table_stats(user,'demo1'); PL/SQL procedure successfully completed. select table_name,num_rows,blocks from user_tables where table_name='DEMO1'; TABLE_NAME NUM_ROWS BLOCKS ---------- ---------- ---------- DEMO1 10000 1461 select index_name,num_rows,leaf_blocks,blevel from user_indexes where table_name='DEMO1'; INDEX_NAME NUM_ROWS LEAF_BLOCKS BLEVEL ---------- ---------- ----------- ---------- DEMO1_N 10000 20 1
The same rows are stored in 1421 table blocks and the index entries in 20 blocks. Both use 8k blocks, but different storage layout and different defaults. This is about 7 rows per table blocks, for rows that are approximately larger than 1k and about 500 index entries per index block to store the number for column N plus the pointer to table row (a few bytes called TID in Postgres or ROWID for Oracle). I’ll not get into the details of the number here. More about the row storage:

Jeremiah Peschka: https://facility9.com/2011/03/postgresql-row-storage-fundamentals/
Frits Hoogland: https://fritshoogland.wordpress.com/category/postgresql/

My goal is to detail the execution plans and the execution statistics.

Postgres Seq Scan

I start with a very simple query on my table: SELECT SUM(N) from DEMO1;

explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=4.616..4.616 rows=1 loops=1) Output: sum(n) Buffers: shared hit=1429 -> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.011..3.614 rows=10000 loops=1) Output: n, a, x Buffers: shared hit=1429 Planning time: 0.468 ms Execution time: 4.661 ms

This query does a sequential scan (Seq Scan), which is the equivalent of Oracle Full Table Scan: read all rows from the table. You might tell me that it would be cheaper to scan the index because the index I’ve created holds all required columns. We will see that in the next post. Here, after having created the table as I did above, the query planner prefers to scan the table.

Here are the maths: my table has 1429 pages and each page access during a sequential scan has cost=1 as defined by:
show seq_page_cost; seq_page_cost --------------- 1
Here, I see a cost estimated from 0 to 1529 for the Seq Scan operation.
The first number, 0.00 is the initialization cost estimating the work done before returning any rows. A Seq Scan has nothing to do before, and reading the first block can already return rows.
The second number is the cost to return all rows. We have seen that the scan itself costs 1429 but the rows (tuples) must be read and processed. This is evaluated using the following constant:
show cpu_tuple_cost; cpu_tuple_cost ---------------- 0.01
For 10000 rows, the cost to process them is 0.01*10000=100 which is an additional cost over the Seq Scan 1429 to get it to 1529. This explains cost=0.00..1529.00

Then there is a SUM operation applied to 10000 rows and there is a single parameter for the CPU cost of operators and functions:
show cpu_operator_cost; cpu_operator_cost ------------------- 0.0025

Capturepgoraseqscan001
The sum (Aggregate) operation adds 0.0025*10000=25 to the cost and then the cost is 1554. You can see this cost in the minimal cost for the query, the first number in cost=1554.00..1554.01, which is the cost before retrieving any rows. This makes sense because before retrieving the first row we need to read (Seq Scan) and process (Aggregate) all rows, which is exactly what the cost of 1554 is.

Then there is an additional cost when we retrieve all rows. It is only one row here because it is a sum without group by, and this adds the default cpu_tuple_cost=0.01 to the initial cost: 1554.01

In summary, The total cost of the query is cost=1554.00..1554.01 and we have seen that it depends on:
– number of pages in the table
– number of rows from the result of the scan (we have no where clause here)
– number of rows summed and retrieved
– the planner parameters seq_page_cost, cpu_tuple_cost, and cpu_operator_cost

Oracle Full Table Scan

When I run the same query on Oracle, the optimizer chooses an index fast full scan rather than a table full scan because all rows and columns are in the index that I’ve created:

all rows because the SUM(N) do not need to get rows where N is not null (which are not stored in the index)
all columns because I need nothing else than the values for N

We will see that in the next post, for the moment, in order to compare with Postgres, I forced a full table scan with the FULL() hint.
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID bhsjquhh6y08q, child number 0 ------------------------------------- select /*+ full(demo1) */ sum(n) from demo1 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 397 (100)| 1 |00:00:00.01 | 1449 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1449 | | 2 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1449 | --------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=0) SUM("N")[22] 2 - (rowset=256) "N"[NUMBER,22]

We have seen that Postgres cost=1 is for sequential scans (similar to what we call multiblock reads in Oracle) and random reads (single block reads) have by default cost=4 according to random_page_cost.

Oracle cost unit is based on single block reads and this is why the cost here (397) is lower than the number of blocks (1461). Different units. Postgres counts cost=1 for reads and counts a higher cost when a seek is involved. Oracle counts cost=1 for single block reads (including seek) and lower cost for larger I/O size.
Capturepgoraseqscan002
With the default system statistics, where latency is estimated 10 milliseconds and transfer is estimated to 4KB/ms. The single block read time is estimated to 12 milliseconds (10 + 8192/4096).
Again with the default system statistics where optimizer estimates 8 blocks per multiblock read, the multiblock read time is estimated to 26 milliseconds (10 + 8*8192/4096) which is on average 26/8=3.25 millisecond per block. This means that the ratio of single vs. multi block read is very similar for Oracle (3.25/12=0.27833333) and Postgres (seq_page_cost /random_page_cost=1/4=0.25) with default parameters.

Our table is stored in 1461 blocks and the full table scan involves reading all of them plus some segment header blocks. 1461*0.27833333=396

There is also the costing of CPU (the equivalent to cpu_tuple_cost) which is included here but I’ll not go into the details which are more complex than in Postgres and depends on your processor frequency. The goal of those posts is about Postgres. For Oracle, all this is explained in Jonathan Lewis and Chris Antognini books.

But basically, the idea is the same: Postgres Seq Scan and Oracle Full table Scan read the contiguous table blocks sequentially and the cost mainly depends on the size of the table (number of blocks) and the estimated time for sequential I/O (where bandwidth counts more than latency).

Buffers

In my tests, I’ve not only explained the query, but I executed it to get execution statistics. This is done with EXPLAIN ANALYZE in Postgres and DBMS_XPLAN.DISPLAY_CURSOR in Oracle. The statistics include the number of blocks read at each plan operation, with the BUFFERS option in Postgres and with STATISTICS_LEVEL=ALL in Oracle.

explain (analyze,buffers) select sum(n) from demo1 ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=3.622..3.622 rows=1 loops=1) Buffers: shared hit=1429 -> Seq Scan on demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.008..1.724 rows=10000 loops=1) Buffers: shared hit=1429 Planning time: 0.468 ms Execution time: 4.661 ms

‘Buffers’ displays the number of blocks that have been read by the Seq Scan and is exactly the number of pages in my table. ‘shared hit’ means that they come from the buffer cache.

Let’s run the same when the cache is empty:
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=13.837..13.837 rows=1 loops=1) Output: sum(n) Buffers: shared read=1429 -> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.042..12.506 rows=10000 loops=1) Output: n, a, x Buffers: shared read=1429 Planning time: 3.754 ms Execution time: 13.906 ms
The buffers are now ‘shared read’ instead of ‘shared hit’. In Postgres, the number of logical reads, as we know them in Oracle, is the sum of hits and reads. In Oracle, all blocks are counted as logical reads, which includes the smaller set of physical reads.

IO calls

Here is more about the reads when the block is not in the buffer cache. On Linux, we can trace the system calls to see how those sequential I/Os are implemented.

I get the ‘relfilenode':
postgres=# select relname,relnamespace,reltype,relowner,relfilenode,relpages,reltuples from pg_class where relname='demo1'; relname | relnamespace | reltype | relowner | relfilenode | relpages | reltuples ---------+--------------+---------+----------+-------------+----------+----------- demo1 | 2200 | 42429 | 10 | 42427 | 1429 | 10000

I get the pid of my session process:
select pg_backend_pid(); -[ RECORD 1 ]--+------ pg_backend_pid | 30732

I can trace system calls:
strace -p 30732

And look at the trace concerning my file (identified with its ‘relfilenode’):
30732 open("base/12924/42427", O_RDWR) = 33 30732 lseek(33, 0, SEEK_END) = 11706368 30732 open("base/12924/42427_vm", O_RDWR) = 43 30732 lseek(33, 0, SEEK_END) = 11706368 30732 lseek(33, 0, SEEK_END) = 11706368 30732 lseek(33, 0, SEEK_SET) = 0 30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192 30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192 30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192 ... 1429 read(33) in total

We see two open() calls with the relfilenode of my table in the file name: one for the table and one for the visibility map
The file descriptor for the table file is 33 and I’ve grepped only the related calls.
The lseek(33,0,SEEK_END) goes to the end of the file (11706368 bytes, which is 11706368/8192=1429 pages.
The lseek(33,0,SEEK_SET) goes to the beginning of the file.
Subsequent read() calls read the whole file, reading page per page (8192 bytes), in sequential order.

This is how sequential reads are implemented in Postgres: one lseek() and sequential read() calls. The I/O size is always the same (8k here). The benefit of sequential scan is not larger I/O calls but simply the absence of seek() in between. The optimization is left to the underlying layers filesystem and read-ahead.

This is very different from Oracle. Not going into the details, here are the kind of system calls you see during the full table scan:
open("/u01/oradata/CDB1A/PDB/users01.dbf", O_RDWR|O_DSYNC) = 9 fcntl(9, F_SETFD, FD_CLOEXEC) = 0 fcntl(9, F_DUPFD, 256) = 258 ... pread(258, "\6\242\2\5\3\276\25%\2\4\24\270\1\313!\1x\25%"..., 1032192, 10502144) = 1032192 pread(258, "\6\242\202\5\3\300\25%\2\4\16\247\1\313!\1x\25%"..., 1032192, 11550720) = 1032192 pread(258, "\6\242\2\6\3\302\25%\2\4x\226\1\313!\1x\25%"..., 417792, 12599296) = 417792

Those are also sequential reads of contiguous blocks but done with larger I/O size (126 blocks here). So in addition to the absence of seek() calls, it is optimized to do less I/O calls, not relying on the underlying optimization at OS level.

Oracle can also trace the system calls with wait events, which gives more information about the database calls:
WAIT #140315986764280: nam='db file scattered read' ela= 584 file#=12 block#=1282 blocks=126 obj#=74187 tim=91786554974 WAIT #140315986764280: nam='db file scattered read' ela= 485 file#=12 block#=1410 blocks=126 obj#=74187 tim=91786555877 WAIT #140315986764280: nam='db file scattered read' ela= 181 file#=12 block#=1538 blocks=51 obj#=74187 tim=91786556380

The name ‘scattered’ is misleading. ‘db file scattered read’ are actually multiblock reads: read more than one block in one I/O call. Oracle does not rely on the Operating System read-ahead and this is why we can (and should) use direct I/O and Async I/O if the database buffer cache is correctly sized.

Output and Projection

I’ve run the EXPLAIN with the VERBOSE option which shows the ‘Output’ for each operation, and I’ve done the equivalent in Oracle by adding the ‘+projection’ format in DBMS_XPLAN.

In the Oracle execution plan, we see the columns remaining in the result of each operation, after the projection:
Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=0) SUM("N")[22] 2 - (rowset=256) "N"[NUMBER,22]

The operation 2, the Full Table Scan, reads all rows with all columns, but selects only the one we need: N

In the Postgres equivalent, it seems that the Output mentions the columns available before the projection because we see all table columns here:
explain verbose select sum(n) from demo1 ; QUERY PLAN ------------------------------------------------------------------------- Aggregate (cost=1554.00..1554.01 rows=1 width=8) Output: sum(n) -> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) Output: n, a, x

I prefer to see the columns after the projection and I use it a lot in Oracle to know which columns are needed from the table. A great optimization can be done when we have a covering index where all selected columns are present so that we don’t have to go to the table. But we will see that in the next post about Index Only Scan.

Cet article Postgres vs. Oracle access paths I – Seq Scan est apparu en premier sur Blog dbi services.

With metered cloud services, keeping all your instances running may become expensive. The goal is to start them only when you need them. Here is a script that stops all instances you have on the Oracle Cloud Service PaaS. You can schedule it for example to stop them at the end of the business day, or when they are not active for a long time. The scripts use the REST API called with curl, JSON output parsed with jq, HTTP status explained with links.

In the first part of the script, I set the variables. Set them to your user:password, identity domain, cloud service url, ssh key:
u="MyEmail@Domain.net:MyPassword" i=a521642 r=https://dbcs.emea.oraclecloud.com k="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxAEm1WHYbJa50t61YhM53u4sljjSFGK458fgdljjkNqfihcRxSf2ENw6iaYhiBTPogG9IDaEqW+SbwpororD2/bep16/hHybGswD34jU7bf9kaaKi5gOgASChid4e322zrnwOtlzfHiiquhiUDgLwpQxCYVV5zU1RQ2NS3F3a45bepqkn/GuPY5x/KSn576+7HBCYVbao/PTwZAeTVbo6Xb4ZQQrOIsLJxxDCQsr0/g7ZS8/OJHt8kotydu13n3rANB2y312XtTsW9mAwpfXuCuqDM5+dIjUdmtflkYtqsfrqSpLevVfVt1L7hqo+JGo7OBO0doVs6lQSCblZhYHh Me@MyLaptop"

Here is the script. It starts to download the certificate if not already there. Then queries for all non stopped services and stops them. Finally, the last line displays the status of all services.

[ -f cacert.pem ] || curl --remote-name --time-cond cacert.pem https://curl.haxx.se/ca/cacert.pem for s in $( curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq -r '.services[]|select(.status!="Stopped")|.service_name' ) do # call the 'Stop service' REST API and get the http status httpstatus=$(curl --include --request POST --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" --header "Content-Type:application/json" --data '{"lifecycleState":"Stop"}' $r/paas/service/dbcs/api/v1.1/instances/$i/$s | awk '{print >"/dev/stderr"} /^HTTP/{print $2}') # look for http status in documentation links -dump -width 300 https://docs.oracle.com/en/cloud/paas/java-cloud/jsrmr/Status%20Codes.html | grep -B 1 -A 1 " $httpstatus " done sleep 1 curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq .

The script requires:

curl to call the REST API
jq to format and extract the returned JSON
links to get the HTTP status description from the documentation

The Cloud is all about automation and the REST API makes it very easy to do from command line or script.

Cet article Oracle Cloud: script to stop all PaaS services est apparu en premier sur Blog dbi services.

In the previous post I’ve explained a sequential scan by accident: my query needed only one column which was indexed, and I expected to read the index rather than the table. And I had to hint the Oracle example to get the same because the Oracle optimizer chooses the index scan over the table scan in that case. Here is where I learned a big difference between Postgres and Oracle. They both use MVCC to query without locking, but Postgres MVCC is for table rows (tuples) only whereas Oracle MVCC is for all blocks – tables and indexes.

So this second post is about Index Only Scan and the second constant you find in the documentation for the query planner:
random_page_cost (floating point)
Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0.

I am here in the situation after the previous post: created table and index, have run a query which did a sequential scan on the table:
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=17.430..17.430 rows=1 loops=1) Output: sum(n) Buffers: shared hit=1429 -> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.031..13.011 rows=10000 loops=1) Output: n, a, x Buffers: shared hit=1429 Planning time: 1.791 ms Execution time: 17.505 ms

Index Only Scan

I want to understand why the query planner did not choose an access to the index only. This is where hints are useful: force a plan that is not chosen by the optimizer in order to check if this plan is possible, and then check its cost:
/*+ IndexOnlyScan(demo1) */ explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1727.29..1727.30 rows=1 width=8) (actual time=5.424..5.425 rows=1 loops=1) Output: sum(n) Buffers: shared hit=1429 read=29 -> Index Only Scan using demo1_n on public.demo1 (cost=0.29..1702.29 rows=10000 width=4) (actual time=0.177..4.613 rows=10000 loops=1) Output: n Heap Fetches: 10000 Buffers: shared hit=1429 read=29 Planning time: 0.390 ms Execution time: 5.448 ms
From there you see that an Index Only Scan is possible but more expensive. The estimated cost is higher than the Seq Scan (cost=0.29..1702.29 instead of cost=0.00..1529.00). And the execution statistics shows that I’ve read the 1429 table pages in addition to the 29 pages of the index.

From the hit/read statistics we can note that the create table has left all the table pages in the buffer cache, but this is not the case for the create index. But that’s another story. My concern is why and index only access goes to read all table blocks in addition to the index ones, which brings the cost to 1727.30-1554.01=173.29 higher than the sequential scan.

The clue is in this line showing that all my rows were fetched from heap page, which is the table: Heap Fetches: 10000

Tuple visibility

In ACID databases, a modification must not be visible by others until the transaction completion (commit). There are two ways to achieve that. The first way is to read the latest version of data: lock in share mode what you read, so that no concurrent update can happen. The other solution is to query a previous version of data (MVCC – Multi Version Concurrency Control) where uncommitted changes are not visible. Both Oracle and Postgres use MVCC which is great because you can have transactions and queries on the same database. But they do the versioning at a different level.

Oracle MVCC is physical, at block level. Then everything is versioned: tables as well as index, with their transaction information (ITL) which, with the help of the transaction table, give all information about visibility: committed or not, and with the commit SCN. With this architecture, a modified block can be written to disk even with uncommitted changes and there is no need to re-visit it later once the transaction is committed.

Postgres MVCC is logical at row (‘tuple’) level: new version is a new row, and committed changes set the visibility of the row. The table row is versioned but not the index entry. If you access by index, you still need to go to the table to see if the row is visible to you. This is why I had heap fetches here and the table blocks were read.

This explains that the cost of Index Only Scan is high here. In addition to about 30 index blocks to read, I’ve read about 1429 table blocks. But that can be worse. For each index entry, and I have 10000 of them, we need to go to the table row, which is exactly what the 10000 heap fetches are. But I’m lucky because I have a very good clustering factor: I have created the table with increasing values for the column N (generated by generate_series). With a bad clustering factor (physical storage of rows in the table not correlated with the order of index) you would see up to 10000 additional shared hits. Thankfully, the query planner estimates this and has switched to table scan which is cheaper in this case.

Vacuum and Visibility Map

Always going to the table rows to see if they are committed would always be more expensive than a table scan. The Postgres vacuum process maintains a Visibility Map as a bitmap of pages that have been vacuumed and have no more tuples to vacuum. This means that all rows in those pages are visible to all transactions. When there is an update on the page, the flag is unset, and remains unset until the modification is committed and the vacuum runs on it. This visibility flag is used by the Index Only Scan to know if it is needed to get to the page.

Let’s run the vacuum and try again the same query:
vacuum demo1; VACUUM explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1) Output: sum(n) Buffers: shared hit=30 -> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1) Output: n Heap Fetches: 0 Buffers: shared hit=30 Planning time: 0.450 ms Execution time: 2.213 ms

Here, without any hint, the query planner has chosen the Index Only Scan which is now less expensive than a Seq Scan: cost=0.29..270.29

Cost of Index Only Scan

There is an initial cost of 0.29 is calculated from cpu_operator_cost which defaults 0.0025 which means that about 0.29/0.0025=116 operations were charged here. This cost is minimal and I don’t go into details.
CaptureIndexScanpgora
Then, to get rows we have to

read 30 blocks from the index. Those seem to be random scan (with random_page_cost=4) and then the cost for all rows is 4*30=120
process the index entries (with cpu_index_tuple_cost=0.005) and then the cost for all 10000 rows is 0.005*10000=50
process the result rows (with cpu_tuple_cost=0.01) and then the cost for all 10000 rows is 0.01*10000=100

This brings the cost to the total of 270.29

For the above operation, the SUM(N) this is exactly the same as in the previous post on Seq Scan: cost=25 (cpu_operator_cost=0.0025 for 10000 rows) and is this initial cost because the sum is now only when all rows are processed, and an additional 0.01 for the result row.

Oracle

In the previous post I used the FULL() hint to compare Oracle Full Table Scan to Postgres Seq Scan, but by default, Oracle chose an index only access because the index covers all the rows and columns we need.

All columns that we need:

In the previous post we have seen the column projection (from the +projeciton format of dbms_xplan):
Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=0) SUM("N")[22] 2 - (rowset=256) "N"[NUMBER,22]
I need only the column N from the table DEMO1, and this column is in the index DEMO1_N

All rows that we need:

In Oracle an index does not have an entry for every row but only for rows where at least one of the indexed columns is not null. Here because we have no where clause predicate on N, and because we have not declared the column N as NOT NULL, the access by index may not return all rows. However, the SUM() function does not need to know about the null values, because they don’t change the sum and then the optimizer can safely choose to do an index only access.

Here is the query without hints:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 6z194712fvcfu, child number 0 ------------------------------------- select /*+ */ sum(n) from demo1 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 7 (100)| 1 |00:00:00.01 | 26 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 26 | | 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 26 | -------------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=0) SUM("N")[22] 2 - "N"[NUMBER,22]

This plan looks very similar to the Postgres one after the vacuum: 51 buffers which is approximately the number of blocks in my index here. However, Oracle does not have the ‘vacuum’ requirement because the MVCC applies to the index and Oracle does not need to go to the table to undo the uncommitted changes. But there is something else here. If you remember the previous post, the Oracle cost=1 is equivalent to the cost of a random read (single block) and the cost of reading one block through a larger I/O (multiblock read) is, with default statistics, about 0.278 times cheaper. Here, 7/26= 0.2692 which proves that the cost is based on multiblock reads. Oracle can read indexes with INDEX FAST FULL SCAN in the same way it reads table with FULL TABLE SCAN: with larger I/O. We don’t need any ordering of rows here, because we just do the sum, and then we don’t need to follow the chain of leaf blocks, scattered within the index segment. Just read all blocks as they come, with fast I/O.

Index Fast Full Scan is possible in Oracle because MVCC is at block level for indexes as well as tables. You can just read the blocks as of the point in time of the query, without being concerned by concurrent operations that update the index entries or split the blocks. Postgres Index Only Scan is limited because MVCC is on tables only, and then must scan the index in the order of leaves, and must read the visibility map and maybe the table pages.

In Oracle, an index can be used to partition vertically a table, as a redundant storage of a few columns in order to avoid full table scans on large rows, allowing queries to avoid completely to read the table when the index covers all required rows and columns. We will see more about the ‘all rows’ requirement in the next post.

Cet article Postgres vs. Oracle access paths II – IndexOnlyScan est apparu en premier sur Blog dbi services.

In the previous post I said that an Index Only Access needs to find all rows in the index. Here is a case where, with similar data, Postgres can find all rows but Oracle needs additional considerations.

In the previous post I’ve executed:
select sum(n) from demo1
The execution plan was:
Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1) Output: sum(n) Buffers: shared hit=30 -> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1) Output: n Heap Fetches: 0 Buffers: shared hit=30

Basically, this reads all values of the column N and then aggregates them to the sum.
If I remove the SUM() I have only the part that reads all values from N:
explain (analyze,verbose,costs,buffers) select n from demo1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.284 rows=10000 loops=1) Output: n Heap Fetches: 0 Buffers: shared hit=30 Planning time: 0.440 ms Execution time: 1.972 ms

Oracle

This sounds logical. Now let’s run the same query, a simple ‘select n from demo1′ in Oracle:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID ad4z7tpt0dkta, child number 0 ------------------------------------- select /*+ */ n from demo1 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 397 (100)| 10000 |00:00:00.01 | 1451 | | 1 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1451 | -------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "N"[NUMBER,22]

Here the access path is different: a full table scan instead of an index only access (Index Fast Full Scan). It is not a cost decision. If we try to force an index access, with INDEX_FFS() or INDEX() hints, the query will still do a Full Table Scan. The reason is that and index only access is possible only if all columns and all rows are present in the index. But Oracle does not always index all rows. The Oracle index has no entry for the rows where all the indexed columns are nulls.

Where n is not null

If I run the same query with the purpose of showing only non-null values, with a ‘where n is not null’ predicate, then an index only access is possible:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 2gbjpw5u0v9cw, child number 0 ------------------------------------- select /*+ */ n from demo1 where n is not null ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 | | 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N" IS NOT NULL)

Constraints

An alternative, if we know that we will never have null values here, is to give the information to the optimizer that there are no null values in the column N:
In Oracle:
alter table demo1 modify n not null;
This is the equivalent of the PostgreSQL
alter table demo1 alter column n set not null;
Then, in addition to ensuring the verification of the constraint, the constraint informs the optimizer that there is no null values and that all rows can be find in the index:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID ad4z7tpt0dkta, child number 0 ------------------------------------- select /*+ */ n from demo1 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 | | 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 | ------------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "N"[NUMBER,22]

Additional columns

Even if the column can have some null values, it is easy to have an index on null values in Oracle, just by adding a non-null column or expression. And if you don’t need this additional column, you can even add a constant, such as in the following index definition:
create unique index demo1_n on demo1(n,0);
This works because all index entries have at least one non null value. But looking at the buffers you can see that this additional byte (0 is stored in 1 byte) has a little overhead (31 blocks read here instead of 28):
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID ad4z7tpt0dkta, child number 0 ------------------------------------- select /*+ */ n from demo1 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 8 (100)| 10000 |00:00:00.01 | 31 | | 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 8 (0)| 10000 |00:00:00.01 | 31 | ------------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "N"[NUMBER,22]

Oracle Partial Indexes

In Oracle, all indexes that include a nullable column are partial indexes: not all rows are indexed, and an index access is possible only if the WHERE clause, or a constraint, guarantees that we don’t need the non-indexed rows. Combined with expression, it can be a way to implement partial indexes when the expression returns null for a specific condition. Oracle even provides computed columns (aka virtual columns) so that the expression does not have to be coded in the where clause of the query.

As an example with expressions, the following index has entries only for the values lower than 10:
create index demo_top10 on demo1(case when n<=10 then n end)

However, to use it, we must mention the expression explicitly:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 863drbjwayrt7, child number 0 ------------------------------------- select /*+ */ (case when n<=10 then n end) from demo1 where (case when n<=10 then n end)<=5 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 (100)| 4 |00:00:00.01 | 2 | |* 1 | INDEX RANGE SCAN| DEMO1_N_TOP10 | 1 | 5 | 1 (0)| 4 |00:00:00.01 | 2 | --------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("DEMO1"."SYS_NC00004$"<=5) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "DEMO1"."SYS_NC00004$"[NUMBER,22]
We can see that internally, a virtual column (“SYS_NC00004$”) has been created for the indexed expression, and is used for the predicate and the projection which uses the same expression. There is another possibility with the ‘partial index’ feature introduced in 12c but it has not the flexibility of a predicate: it is based on partitioning where only some partitions can be indexed.

Postgres Partial Indexes

Postgres does not need those workarounds. An index indexes all rows, including null entries, and partial indexes can be defined with a where clause:
create index demo_top10 on demo1(n) where n<=10

No need to change the query. As long as the result can come from the partial index, we can use the column without an expression on it:
explain (analyze,verbose,costs,buffers) select n from demo1 where n<=5 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n_top10 on public.demo1 (cost=0.14..4.21 rows=4 width=4) (actual time=0.114..0.114 rows=5 loops=1) Output: n Index Cond: (demo1.n <= 5) Heap Fetches: 0 Buffers: shared hit=2 Planning time: 0.557 ms Execution time: 0.129 ms
Here the smaller partial index (demo1_n_top10) has been chosen by the query planner.

As you see I’ve not used exactly the same condition. The query planner understood that n<=5 (in the WHERE clause) is a subset of n<=10 (in the index definition). However, if the predicate is too different, it cannot use the index:
fpa=# explain (analyze,verbose,costs,buffers) select n from demo1 where 2*n<=10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n on public.demo1 (cost=0.29..320.29 rows=3333 width=4) (actual time=0.020..1.086 rows=5 loops=1) Output: n Filter: ((2 * demo1.n) <= 10) Rows Removed by Filter: 9995 Heap Fetches: 0 Buffers: shared hit=30

Here, instead of “Index Cond” we have a simple “Filter”. The Index Only Scan has read all the rows, and they were filtered afterward (“Rows Removed by Filter”).

Index condition

With the VERBOSE option of EXPLAIN we see the condition used by the index access:
Index Cond: (demo1.n <= 5)
‘Index Cond.’ is not a simple filter removing rows after an operation, but it is the condition which is used for fast access to the index entries in the sorted index structure. We have the equivalent in Oracle with the ‘+predicate’ format of dbms_xplan:
Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("N"<=5)

Before going further on index access for WHERE clause predicate, the next post will show the major characteristic of indexes (besides the fact that it stores a redundant subset of columns and rows): they are maintained sorted and may return the resulting rows in order.

Cet article Postgres vs. Oracle access paths III – Partial Index est apparu en premier sur Blog dbi services.

Since X4 we can do Capacity on Demand on Exadata: disable some CPU cores to lower cost of Oracle Database licenses. Depending on the models, and the configuration, there are different minimums and here is a recap table about those.

Here is the summary of Capacity on Demand minimum, maximum and increment. Those numbers come from the configuration file of OEDA, the Oracle Exadata Deployment Assistant (es.properties) and you can see that it already has an option for Exadata X7-2

Exadata model	sockets	cores per socket	cores per server	thread per core	Capacity on Demand minimum	Cod maximum	CoD increment
X2-2	2	6	12	2
X3-2	2	8	16	2
X4-2	2	12	24	2	12 (not for 1/8th)	24	2
X5-2	2	18	36	2	14	36	2
X6-2	2	22	44	2	14 (8 for 1/8th)	44	2
X7-2	2	24	48	2	14 (8 for 1/8th)	48	2
X2-8	8	8	12	2
X3-8	8	10	12	2
X4-8	8	15	32	2	48	120	8
X5-8	8	18	32	2	56	144	8
X6-8	8	18	32	2	56
X7-8	8	24	32	2
SL6	2	32	64	8	14 (8 for 1/8th)	64	2
T7-2	2	32	62	8

Special minimums for 1/8th of Rack

The smallest configuration (1/8th of Rack) is a bit special. First, because it is physically identical to the 1/4th one with just some processors and disks disabled. But also, for this entry-level, the minimum required is lower – 8 cores per node – in X6.

Here is the Oracle Exadata Deployment Assistant for X6-2 1/8th of Rack:

When having selected 1/8th of Rack we are allowed to enable a minimum of 8 cores per nodes, as mentioned in the table above:

Elastic Rack

Elastic Rack configuration allows to configure any combination of database nodes and storage cells:

With Elastic Rack configuration, the next screen is not only displaying the configuration, but you can customize it.
Here I define the same configuration as an 8th of RAC:

However, because it is not an 1/8th Rack configuration, the minimum is 14 cores per node and not 8:

So be careful. Elastic configuration gives more flexibility, but CoD minimums are is different than the equivalent configuration.

/opt/oracle.SupportTools/resourcecontrol

As I’m talking about elastic configuration here is how the cores are enabled. The configuration assistant calls /opt/oracle.SupportTools/resourcecontrol which displays or updates the BIOS configuration. You may wonder why you can do that here and not in your own servers? Because here Oracle can trace what happened. You will find the log in /var/log/oracleexa/systemconfig.log and here is an example where the Elastic Rack has been deployed with 16 cores per database node Capacity on Demand:

Fri Aug 04 16:12:18 CEST 2017 Executing command: /opt/oracle.SupportTools/resourcecontrol -show [INFO] Validated hardware and OS. Proceed. [SHOW] Number of physical cores active per socket: 22 [SHOW] Total number of cores active: 44 Mon Aug 07 11:24:31 CEST 2017 Executing command: /opt/oracle.SupportTools/resourcecontrol -core 16 -force [INFO] Validated hardware and OS. Proceed. [INFO] Enabling 8 cores on each socket. [INFO] Import all bios settings [INFO] All bios settings have been imported with success [ACTION] Reboot server for settings to take effect [SHOW] Number of physical cores active per socket: 8 [SHOW] Total number of cores active: 16 Mon Aug 07 11:31:24 CEST 2017 Executing command: /opt/oracle.SupportTools/resourcecontrol -show [INFO] Validated hardware and OS. Proceed. [SHOW] Number of physical cores active per socket: 8 [SHOW] Total number of cores active: 16

This does not stay on your server. There is a rule that you can do Capacity on Demand only if you have configured Platinum support, or use Oracle Configuration Manager, or Enterprise Manager. All those may store history of the CPU count, which means that it is auditable.

Cet article Exadata Capacity on Demand and Elastic Rack est apparu en premier sur Blog dbi services.

I realize that I’m talking about indexes in Oracle and Postgres, and didn’t mention yet the best website you can find about indexes, with concepts and examples for all RDBMS: http://use-the-index-luke.com. You will probably learn a lot about SQL design. Now let’s continue on execution plans with indexes.

As we have seen two posts ago, an index can be used even with a 100% selectivity (all rows), when we don’t filter any rows. Oracle has INDEX FAST FULL SCAN which is the fastest, reading blocks sequentially as they come. But this doesn’t follow the B*Tree leaves chain and does not return the rows in the order of the index. However, there is also the possibility to read the leaf blocks in the index order, with INDEX FULL SCAN and random reads instead of multiblock reads.
It is similar to the Index Only Scan of Postgres except that there is no need to get to the table to filter out uncommitted changes. Oracle reads the transaction table to get the visibility information, and goes to undo records if needed.

The previous post had a query with a ‘where n is not null’ predicate to be sure having all index entries in Oracle indexes and we will continue on this by adding an order by.

For this post, I’ve increased the size of the column N in the Oracle table, by adding 1/3 to each number. I did this for this post only, and for the Oracle table only. The index on N is now 45 blocks instead of 20. The reason is to show what happens when the cost of ‘order by’ is high. I didn’t change the Postgres table because there is only one way to scan the index, where result is always sorted.

Oracle Index Fast Full Scan vs. Index Full Scan

Index Full Scan, the random read version of index read is chosen here by the Oracle optimizer because we want the result on the column N and the index can provide this without additional sorting.

We can force the optimizer to do multiblock reads, with INDEX_FFS hint:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID anqfbf5caat2a, child number 0 ------------------------------------- select /*+ index_ffs(demo1) */ n from demo1 where n is not null order by n ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 82 (100)| 10000 |00:00:00.01 | 51 | | | | | 1 | SORT ORDER BY | | 1 | 10000 | 82 (2)| 10000 |00:00:00.01 | 51 | 478K| 448K| 424K (0)| | 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 14 (0)| 10000 |00:00:00.01 | 51 | | | | ----------------------------------------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=1) "N"[NUMBER,22] 2 - "N"[NUMBER,22]

The estimated cost is higher: the index read is cheaper (cost=14 instead of 46) but then the sort operation brings this to 82. We can see additional columns in the execution plan here because the sorting operation needs a workarea in memory (estimated 478K, actually 424K used during the execution). Note that the multiblock read has a few blocks of overhead (reads 51 blocks instead of 48) because it has to read the segment header to identify the extents to scan.

Postgres Index Only Scan

In PostgreSQL there’s only one way to scan indexes: random reads by following the chain of leaf blocks. This returns the rows in the order of the index and does not require an additional sort:

explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.125..1.277 rows=10000 loops=1) Output: n Index Cond: (demo1.n IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=30 Planning time: 0.532 ms Execution time: 1.852 ms

In the previous posts, we have seen a cost of cost=0.29..270.29 for the Index Only Scan. Here we have an additional cost of 25 for the cpu_operator_cost because I’ve added the ‘where n is not null’. As the default constant is 0.0025 this is the query planner estimating to evaluate it for 10000 rows.

First Rows

The Postgres cost always shows two values. The first one is the startup cost: the cost just before being able to return the first row. Some operations have a very small startup cost, others have some blocking operations that must finish before sending their first result rows. Here, as we have no sort operation, the first row retrieved from the index can be returned immediately and the startup cost is small: 0.29
In Oracle you can see the initial cost by optimizing the plan to retrieve the first row, with the FIRST_ROWS() hint:

The actual number of blocks read (48) is the same as before because I finally fetched all rows, but the cost is small because it was estimated for two rows only. Of course, we can also tell Postgres or Oracle that we want only the first rows. This is for the next post.

Character strings

The previous example is an easy one because the column N is a number and both Oracle and Postgres stores number in a binary format that follows the same order as the numbers. But that’s different with character strings. If you are not in America, there is a very little chance that the order you want to see follows the ASCII order. Here I’ve run a similar query but using the column X instead of N, which is a text (VARCHAR2 in Oracle):
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID fsqk4fg1t47v5, child number 0 ------------------------------------- select /*+ */ x from demo1 where x is not null order by x -------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2493 (100)| 10000 |00:00:00.27 | 1644 | 18 | | | | | 1 | SORT ORDER BY | | 1 | 10000 | 2493 (1)| 10000 |00:00:00.27 | 1644 | 18 | 32M| 2058K| 29M (0)| |* 2 | INDEX FAST FULL SCAN| DEMO1_X | 1 | 10000 | 389 (0)| 10000 |00:00:00.01 | 1644 | 18 | | | | -------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("X" IS NOT NULL) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=1) NLSSORT("X",'nls_sort=''FRENCH''')[2000], "X"[VARCHAR2,1000] 2 - "X"[VARCHAR2,1000]
I have created an index on X, and as you can see it can be used to get all X values, but with an Index Fast Full Scan, the multiblock index only access which is fast but does not return rows in the order of the index. And then a sort operation is applied. I can force an Index Full Scan with INDEX() hint but the sort will still have to be done.

The reason can be seen in the column projection note. My Oracle client application is running on a laptop where the OS is in French and Oracle returns the setting according to what the end-user can expect. This is National Language Support. An Oracle database can be accessed by users all around the world and they will see ordered lists, date format, decimal separators,… according to their country and language.

ORDER BY … COLLATE …

My databases has been created in a system which is in English. In Postgres we can get results sorted in French with the COLLATE option of ORDER BY:

explain (analyze,verbose,costs,buffers) select x from demo1 where x is not null order by x collate "fr_FR" ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=5594.17..5619.17 rows=10000 width=1036) (actual time=36.163..37.254 rows=10000 loops=1) Output: x, ((x)::text) Sort Key: demo1.x COLLATE "fr_FR" Sort Method: quicksort Memory: 1166kB Buffers: shared hit=59 -> Index Only Scan using demo1_x on public.demo1 (cost=0.29..383.29 rows=10000 width=1036) (actual time=0.156..1.559 rows=10000 loops=1) Output: x, x Index Cond: (demo1.x IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=52 Planning time: 0.792 ms Execution time: 38.264 ms
Same idea here as in Oracle: there is an additional sort operation, which is a blocking operation that needs to be completed before being able to return the first row.

The detail of the cost is the following:

The index on the column X has 52 blocks witch is estimated at cost=208 (random_page_cost=4)
We have 10000 index entries to process, estimated at cost=50 (cpu_index_tuple_cost=0.005)
We have 10000 result rows to process, estimated at cost=100 (cpu_tuple_cost=0.01)
We have evaluated 10000 ‘is not null’ conditions, estimated at cost=25 (cpu_operator_cost=0.0025)

In Oracle we can use the same COLLATE syntax, but the name of the language is different, consistent across platforms rather than useing the OS one:

PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 82az4syppyndf, child number 0 ------------------------------------- select /*+ */ x from demo1 where x is not null order by x collate "French" ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2493 (100)| 10000 |00:00:00.28 | 1644 | | | | | 1 | SORT ORDER BY | | 1 | 10000 | 2493 (1)| 10000 |00:00:00.28 | 1644 | 32M| 2058K| 29M (0)| |* 2 | INDEX FAST FULL SCAN| DEMO1_X | 1 | 10000 | 389 (0)| 10000 |00:00:00.01 | 1644 | | | | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("X" IS NOT NULL) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=1) NLSSORT("X" COLLATE "French",'nls_sort=''FRENCH''')[2000], "X"[VARCHAR2,1000] 2 - "X"[VARCHAR2,1000]

In Oracle, we do not need to use the COLLATE option. The language can be set for the session (NLS_LANGUAGE=’French’) or from the environment (NLS_LANG=’=French_.’). Oracle can share cursors across sessions (to avoid to waste resource compiling and optimizing the same statements used by different sessions) but will not share execution plans among different NLS environments because, as we have seen, the plan can be different. Postgres do not have to manage that because each PREPARE statement does a full compilation and optimization. There is no cursor sharing in Postgres.

Indexing for different languages

We have seen in the Oracle execution plan Column Projection Information that an NLSSORT operation is applied on the column to get a value that follows the collation order of the language. We have seen in the previous post that we can index a function on a column. Then we have the possibility to create an index for different languages. The following index will be used to avoid sort from French users:
create index demo1_x_fr on demo1(nlssort(x,'NLS_SORT=French'));
Since 12cR2 we can create the same with de collate syntax:
create index demo1_x_fr on demo1(x collate "French");
Both syntaxes create the same index, which can be used by queries with ORDER BY … COLLATE or with session that set the NLS_LANGUAGE:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 82az4syppyndf, child number 0 ------------------------------------- select /*+ */ x from demo1 where x is not null order by x collate "French" ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4770 (100)| 10000 |00:00:00.02 | 4772 | |* 1 | TABLE ACCESS BY INDEX ROWID| DEMO1 | 1 | 10000 | 4770 (1)| 10000 |00:00:00.02 | 4772 | | 2 | INDEX FULL SCAN | DEMO1_X_FR | 1 | 10000 | 3341 (1)| 10000 |00:00:00.01 | 3341 | ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("X" IS NOT NULL) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "X"[VARCHAR2,1000] 2 - "DEMO1".ROWID[ROWID,10], "DEMO1"."SYS_NC00004$"[RAW,2000]

There’s no sort operation here as the INDEX FULL SCAN returns the rows in order.

PostgreSQL has the same syntax:
create index demo1_x_fr on demo1(x collate "fr_FR");
and then the query can use this index and bypass the sort operation:
explain (analyze,verbose,costs,buffers) select x from demo1 where x is not null order by x collate "fr_FR" ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_x_fr on public.demo1 (cost=0.29..383.29 rows=10000 width=1036) (actual time=0.190..1.654 rows=10000 loops=1) Output: x, x Index Cond: (demo1.x IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=32 read=20 Planning time: 1.049 ms Execution time: 2.304 ms

Avoiding a sort operation can really improve the performance of queries in two ways: save the resources required by a sort operation (which will have to spill to disk when the workarea do not fit in memory) and avoid a blocking operation and then be able to return the first rows quickly.

We have seen how indexes can be used to access a subset of columns from a smaller structure, and how they can be used to access a sorted version of the rows. Future posts will show how the index access is used to quickly filter a subset of rows. But for the moment I’ll continue on this blocking operation. We have seen a lot of Postgres costs, and they have two values (startup cost and total cost). More on startup cost in the next post.

Cet article Postgres vs. Oracle access paths IV – Order By and Index est apparu en premier sur Blog dbi services.

We have seen how an index can help to avoid a sorting operation in the previous post. This avoids a blocking operation: the startup cost is minimal and the first rows can be immediately returned. This is often desired when displaying rows to the user screen. Here is more about Postgres startup cost, Oracle first_rows costing, and fetching first rows only.

Here is the execution plan we had in Oracle to get the values of N sorted. The cost for Oracle is the cost to read the index leaves: estimated to 46 random reads:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID dbck3rgnqbakg, child number 0 ------------------------------------- select /*+ */ n from demo1 where n is not null order by n --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 46 (100)| 10000 |00:00:00.01 | 48 | | 1 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 46 (0)| 10000 |00:00:00.01 | 48 | --------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "N"[NUMBER,22]

In PostreSQL, we have two costs (cost=0.29..295.29):
explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.194..2.026 rows=10000 loops=1) Output: n Index Cond: (demo1.n IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=30 Planning time: 1.190 ms Execution time: 2.966 ms

I explained where the total cost (295.29) comes from:

The index on the column X has 30 blocks witch is estimated at cost=120 (random_page_cost=4)
We have 10000 index entries to process, estimated at cost=50 (cpu_index_tuple_cost=0.005)
We have 10000 result rows to process, estimated at cost=100 (cpu_tuple_cost=0.01)
We have evaluated 10000 ‘is not null’ conditions, estimated at cost=25 (cpu_operator_cost=0.0025)

But the Postgres EXPLAIN also show the startup cost (0.29) which is the cost before returning the first rows (only few cpu_operator_cost here).

From that, I can guess that fetching 1 row will have the following cost:

The startup cost of 0.29
Read the first index page, cost=4 (random_page_cost=4)
1 index entry to process at cpu_index_tuple_cost=0.005
1 result row to process, estimated at cpu_tuple_cost=0.01
1 ‘is not null’ conditions, estimated at cpu_operator_cost=0.0025

This should be approximately cost=4.3075 for one row. Roughly the cost to read one index page. We will see later that the query planner do not count this first index page.

Oracle First Rows

In Oracle, we have only the total cost in the execution plan, but we can estimate the cost to retrieve 1 row with the FIRST_ROWS(1) hint:

The cost here is small, estimated to 2 random reads (1 B*Tree branch and 1 leaf) which is sufficient to get the first row. Of course, I’ve estimated it for 1 row but I finally retrieved all rows (A-Rows=10000), reading all blocks (Buffers=48). However, my execution plan is optimized for fetching one row.

Fetch first rows

I can run the previous query and finally fetch only one row, but I can also explicitly filter the result to get one row only. If you use older versions of Oracle, you may have used the ‘rownum’ way of limiting rows, and this implicitly adds the first_rows hint. Here I’m using the FETCH FIRST syntax and I need to explicitely add the FIRST_ROWS() hint to get the plan optimized for that.
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 9bcm542sk64az, child number 0 ------------------------------------- select /*+ first_rows(1) */ n from demo1 where n is not null order by n fetch first 1 row only --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 | |* 1 | VIEW | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 | |* 2 | WINDOW NOSORT STOPKEY| | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 | | 3 | INDEX FULL SCAN | DEMO1_N | 1 | 10000 | 2 (0)| 2 |00:00:00.01 | 3 | --------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "N")<=1) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "from$_subquery$_002"."N"[NUMBER,22], "from$_subquery$_002"."rowlimit_$$_rownumber"[NUMBER,22] 2 - (#keys=1) "N"[NUMBER,22], "DEMO1".ROWID[ROWID,10], ROW_NUMBER() OVER ( ORDER BY "N")[22] 3 - "DEMO1".ROWID[ROWID,10], "N"[NUMBER,22]

The cost is the same, estimated to 2 random reads, but we see how Oracle implements the FETCH FIRST: with window functions. And only one row has been fetched (A-Rows) reading 3 blocks (buffers). Note that because the index is sorted, the window function is a NOSORT operation.

Postgres

I can run the same query on PostgreSQL and get the execution plan:
explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null order by n fetch first 1 row only; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..0.31 rows=1 width=4) (actual time=0.124..0.124 rows=1 loops=1) Output: n Buffers: shared hit=3 -> Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.124..0.124 rows=1 loops=1) Output: n Index Cond: (demo1.n IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=3 Planning time: 0.576 ms Execution time: 0.143 ms

Here, the total cost of the query is lower than the total cost of the Index Only Scan, because we know we will not read all index entries. Then the total cost of the query (0.31) is based on the startup cost (0.29) of the index access. I suppose there is 0.01 for the cpu_tuple_cost but I expected to see the cost to get the first page because we cannot get a row without reading the whole page. My guess is that Postgres divides the total cost (295) by the number of rows (10000) and uses that as a per-row estimation. This makes sense for a lot of rows but underestimates the cost to get the first page.

In order to validate my guess, I force a Seq Scan to have a higher cost and fetch first 5 rows:
explain (analyze,verbose,costs,buffers) select n from demo1 where n is not null fetch first 5 row only ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.76 rows=5 width=4) (actual time=0.026..0.029 rows=5 loops=1) Output: n Buffers: shared hit=1 -> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.022..0.024 rows=5 loops=1) Output: n Filter: (demo1.n IS NOT NULL) Buffers: shared hit=1 Planning time: 1.958 ms Execution time: 0.057 ms
My guess is: ( 1529.00 / 10000 ) * 5 = 0.7645 which is exactly the cost estimated for the Limit operation. This approximation does not take the page granularity into account.

MIN/MAX

The “order by n fetch first 1 row only” finally reads only one index entry, the first one, and returns the indexed value. We can get the same value with a “select max(N)” and Oracle has a special operation for that:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 29bsqfg69nudp, child number 0 ------------------------------------- select /*+ */ min(n) from demo1 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 2 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 2 | | 2 | INDEX FULL SCAN (MIN/MAX)| DEMO1_N | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 2 | ------------------------------------------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=0) MIN("N")[22] 2 - "N"[NUMBER,22]

This goes through the index branches (blevel=1 here in this small index so root is the first and only one branch) to the first leaf in order to get the value in the first entry. This has read 2 blocks here. The same can be done to get the last index entry in case we “select max(N)”.

Postgres do not show a special operation for it, but a plan which is very similar to the one we have seen above when fetching the first row: Index Only Scan, with a Limit:

explain (analyze,verbose,costs,buffers) select min(n) from demo1 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.31..0.32 rows=1 width=4) (actual time=0.123..0.124 rows=1 loops=1) Output: $0 Buffers: shared hit=3 InitPlan 1 (returns $0) -> Limit (cost=0.29..0.31 rows=1 width=4) (actual time=0.121..0.121 rows=1 loops=1) Output: demo1.n Buffers: shared hit=3 -> Index Only Scan using demo1_n on public.demo1 (cost=0.29..295.29 rows=10000 width=4) (actual time=0.119..0.119 rows=1 loops=1) Output: demo1.n Index Cond: (demo1.n IS NOT NULL) Heap Fetches: 0 Buffers: shared hit=3 Planning time: 0.415 ms Execution time: 0.140 ms

If we look at the ‘Index Only Scan’ we see exactly what I had at the top of this post with “select n from demo1 where n is not null order by n”.

Above it, there’s the Limit clause which is exactly the same as the one with the “fetch 1 row only” because the query planner understands that getting the MIN(N) is the same as getting the first value from the ordered index on N.

This is processed as a non-correlated subquery (query block), also called InitPlan. The result of it ($0) is used by the result with an additional cost of 0.01 for the cpu_tuple_cost in this additional step. I don’t really know the reason for this additional step here, but anyway, the cost is minimal. Basically, both Oracle and Postgres take advantage of the index structure to get the minimum – or first value – from the sorted index entries.

In this series, I’m running very simple queries in order to show how it works. In this post, we reached the minimum: one column and one row. The next post will finally select one additional column, which is not in the index.

Cet article Postgres vs. Oracle access paths V – FIRST ROWS and MIN/MAX est apparu en premier sur Blog dbi services.

In the previous post my queries were still reading the indexed column only, from a table which had no modifications since the last vacuum, and then didn’t need to read table pages: it was Index Only Scan. However, we often need more columns than the ones that are in the index. Here is the Index Scan access path.

I’m continuing on the table that I’ve created in the first post of the series. I’ve run VACUUM (the lazy one, not the full one) and did not do any modification after that, as we have seen that Index Only Access is efficient only when there are no modifications.
create table demo1 as select generate_series n , 1 a , lpad('x',1000,'x') x from generate_series(1,10000); SELECT 10000 create unique index demo1_n on demo1(n); CREATE INDEX vacuum demo1; VACUUM

I have 10000 rows, a unique column N with decimal numbers, indexed and another column A which is not indexed.

Index Only Scan

I’ll now query one row, the one with N=1000.
explain (analyze,verbose,costs,buffers) select n from demo1 where n=1000 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Only Scan using demo1_n on public.demo1 (cost=0.29..4.30 rows=1 width=4) (actual time=0.123..0.124 rows=1 loops=1) Output: n Index Cond: (demo1.n = 1000) Heap Fetches: 0 Buffers: shared hit=3 Planning time: 0.625 ms Execution time: 0.137 ms

It seems that the query planner estimates to read one block:

The startup cost of 0.29 as we have seen before
Read one index page, cost=4 (random_page_cost=4)
1 result row to process, estimated at cpu_tuple_cost=0.01

As the index is a B*Tree with 30 pages, I expect to read at least one branch in addition to the leaf block. The execution has actually read 3 blocks (Buffers: shared hit=3). Here it seems that Postgres decides to ignore the branches and count only the leaf blocks.

In Oracle, the estimation cost=1 and execution has read 2 blocks:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID gusay436hpzck, child number 0 ------------------------------------- select /*+ */ n from demo1 where n=1000 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 (100)| 1 |00:00:00.01 | 2 | |* 1 | INDEX UNIQUE SCAN| DEMO1_N | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("N"=1000) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "N"[NUMBER,22]

Both Oracle and Postgres read only the index here. This is the fastest access to one indexed column: no need to read the table because the column is in the index. The use-case is quite limited here: just testing the existence of the column. I will now select another column than the one used in the where clause.

Select another column

I filter on N but now query the column A which is not in the index. The Index Only Scan changes to an Index Scan:
explain (analyze,verbose,costs,buffers) select a from demo1 where n=1000 ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using demo1_n on public.demo1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1) Output: a Index Cond: (demo1.n = 1000) Buffers: shared hit=3 Planning time: 0.639 ms Execution time: 0.030 ms

The cost is the same except that there is one additional page to read, which pushes it to cost=8.30:

The startup cost of 0.29 as we have seen before
Read one index page, and one table page: cost=8 (random_page_cost=4)
1 result row to process, estimated at cpu_tuple_cost=0.01

In Oracle it is not a different operation. We still have the INDEX UNIQUE SCAN, but in addition to it, an additional operation to read the table: TABLE ACCESS BY INDEX ROWID. The index entry returns the ROWID (physical address of the table block, equivalent to the Postgres TID). And then we have the detail of the cost, and execution buffer reads: one more block.
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 8q4tcxgk1n1vn, child number 0 ------------------------------------- select /*+ */ a from demo1 where n=1000 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 (100)| 1 |00:00:00.01 | 3 | | 1 | TABLE ACCESS BY INDEX ROWID| DEMO1 | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 3 | |* 2 | INDEX UNIQUE SCAN | DEMO1_N | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 2 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N"=1000) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "A"[NUMBER,22] 2 - "DEMO1".ROWID[ROWID,10]

The important thing here is within the predicate information where we see the part of the where clause which is not a filter applied after the scan, but is used for optimal access by the index. It is displayed as access() in Oracle execution plan:
access("N"=1000)

In PostgreSQL execution plan, the same information is displayed as ‘Index Cond':
Index Cond: (demo1.n = 1000)

Postgres Range Scan

That was retrieving only one row with an equality predicate on a unique index column. The index scan helps to get directly to the value because of the B*Tree structure. As the index is sorted, an inequality predicate can also use the index to find the rows in a range of values.

The Postgres plan looks the same, with Index Scan:
explain (analyze,verbose,costs,buffers) select a from demo1 where n<=1000 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Index Scan using demo1_n on public.demo1 (cost=0.29..175.78 rows=1000 width=4) (actual time=0.029..0.780 rows=1000 loops=1) Output: a Index Cond: (demo1.n <= 1000) Buffers: shared hit=147 Planning time: 1.019 ms Execution time: 0.884 ms

Same plan but of course we have more index blocks to scan, and more rows to fetch from the table, which is why the cost is higher.

In order to understand the cost, I’ve changed the query planner constants one by one. Here is what I got:

(cost=0.29..33.78 rows=1000 width=4) when seq_page_cost=0 instead of 1, which means that it estimates (175.78-33.78)/1=142 sequential reads
(cost=0.29..159.78 rows=1000 width=4) when random_page_cost=0 instead of 4, which means that it estimates (175.78-159.78)/4=4 random reads
(cost=0.29..165.78 rows=1000 width=4) when cpu_tuple_cost=0 instead of 0.01, which means that it estimates (175.78-165.78)/0.01=1000 rows
(cost=0.29..170.78 rows=1000 width=4) when cpu_index_tuple_cost=0 instead of 0.005, which means that it estimates (175.78-170.78)/0.005=1000 index entries
(cost=0.00..173.00 rows=1000 width=4) when cpu_operator_cost=0 instead of 0.0025, which means that it estimates (175.78-173.00)/0.0025=1112 cpu operations (116 for initial cost + 996 to get all rows)

I understand the 4 random reads from the index pages. However, I expected random reads, and not sequential reads, to fetch the rows from the table. But this is a case where the clustering factor is very good: the rows have been inserted in the same order as the indexed column, and this means that those reads from table probably read consecutive pages.

In order to validate this guess, I’ve traced the system calls on Linux
25734 open("base/12924/42427", O_RDWR) = 42 25734 lseek(42, 0, SEEK_END) = 11706368 25734 open("base/12924/42433", O_RDWR) = 43 25734 lseek(43, 0, SEEK_END) = 245760
The file descriptor 42 is my table (demo1) and the descriptor 43 is the index (demo1_n). The file name is in the open() call and it includes the file id:
select relname,relfilenode from pg_class where relname='demo1'; -[ RECORD 1 ]--+------ relname | demo1 relfilenode | 42427 select relname,relfilenode from pg_class where relname='demo1_n'; -[ RECORD 1 ]--+-------- relname | demo1_n relfilenode | 42433

Then we see some random reads from the index (branches and first leaf):
25734 lseek(43, 0, SEEK_SET) = 0 25734 read(43, "100036037360374 b152"..., 8192) = 8192 25734 lseek(43, 24576, SEEK_SET) = 24576 25734 read(43, "121000836360374 35023720330237 "..., 8192) = 8192 25734 lseek(43, 8192, SEEK_SET) = 8192 25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192

Then we see 53 reads from the table:
25734 lseek(42, 0, SEEK_SET) = 0 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 ...

Only one lseek. The other reads are all single block (8k) I/O calls but without seek, which means that they are sequential. When relying on filesystem prefetching, this may avoid the latency for each I/O call.

Then the next leaf block from the index is read, and then 52 reads from the table (no lseek):
25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 ...

And again, one index block and 38 contiguous table blocks:
25734 lseek(43, 32768, SEEK_SET) = 32768 25734 read(43, "13245t360374 211 340237 "..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 25734 read(42, "40042203 4 36023330103402273010"..., 8192) = 8192 ...

Here is the summary of the cost 175.78

The startup cost of 0.29 as we have seen before
Estimates 4 random reads (reading 1000 rows from a 30 pages index which contains 10000 rows): cost=16 (random_page_cost=4)
Estimates 142 sequential reads: cost=142 (seq_page_cost=1)
1000 index entries to process, estimated at cost=5 (cpu_index_tuple_cost=0.005)
1000 result row to process, estimated at cost=10 (cpu_tuple_cost=0.01)
about 1000 operators or functions estimated at cpu_operator_cost=0.0025

The very interesting thing here is that the query planner is totally aware of the clustering factor and uses sequential read estimation.

Oracle Range Scan

Here is the same query on the similar table on Oracle:
PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID a3gqx19xs9wxq, child number 0 ------------------------------------- select /*+ */ a from demo1 where n<=1000 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 147 (100)| 1000 |00:00:00.01 | 148 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO1 | 1 | 1000 | 147 (0)| 1000 |00:00:00.01 | 148 | |* 2 | INDEX RANGE SCAN | DEMO1_N | 1 | 1000 | 4 (0)| 1000 |00:00:00.01 | 4 | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N"<=1000) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "A"[NUMBER,22] 2 - "DEMO1".ROWID[ROWID,10]

The straces shows calls to pread:
open("/u01/oradata/CDB1A/PDB/users01.dbf", O_RDWR|O_DSYNC) = 7 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 fcntl(7, F_DUPFD, 256) = 258 fcntl(258, F_SETFD, FD_CLOEXEC) = 0 close(7) = 0 pread(258, "62422313G275"142532'!1?275""..., 8192, 2252800 ) = 8192 pread(258, "62422413C275"14x2432'!1?275""..., 8192, 2260992 ) = 8192 pread(258, "6242313v3362274"24b+1&!1354274""..., 8192, 24731648 ) = 8192 pread(258, "6242314v3362274"24e*1&!1354274""..., 8192, 24739840 ) = 8192 pread(258, "6242315v3362274"24d51&!1354274""..., 8192, 24748032 ) = 8192 pread(258, "6242316v3362274"24g41&!1354274""..., 8192, 24756224 ) = 8192 pread(258, "6242317v3362274"24f71&!1354274""..., 8192, 24764416 ) = 8192 pread(258, "6242320v3362274"24y71&!1354274""..., 8192, 24772608 ) = 8192

pread is similar to lseek()+read() here and, as far as I know, Linux detects when there is no need to seek, and this allows prefetching as well. Oracle has also its own prefetching but I’ll not go into the detail here (read Timur Akhmadeev on Pythian blog about this).

With Oracle, there is no need to run strace because all system calls are instrumented as ‘wait events’ and here is a trace:
PARSE #140375247563104:c=2000,e=1872,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=187737470,tim=53267437268 EXEC #140375247563104:c=0,e=147,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=187737470,tim=53267437481 WAIT #140375247563104: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267437532 WAIT #140375247563104: nam='db file sequential read' ela= 8 file#=12 block#=275 blocks=1 obj#=74023 tim=53267437679 WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=276 blocks=1 obj#=74023 tim=53267437785 WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=3019 blocks=1 obj#=74022 tim=53267437902 FETCH #140375247563104:c=0,e=368,p=3,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=187737470,tim=53267437977 WAIT #140375247563104: nam='PGA memory operation' ela= 14 p1=0 p2=0 p3=0 obj#=74022 tim=53267438017 WAIT #140375247563104: nam='SQL*Net message from client' ela= 280 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267438385 WAIT #140375247563104: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=74022 tim=53267438419 WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3020 blocks=1 obj#=74022 tim=53267438443 WAIT #140375247563104: nam='PGA memory operation' ela= 7 p1=1114112 p2=2 p3=0 obj#=74022 tim=53267438475 WAIT #140375247563104: nam='db file sequential read' ela= 5 file#=12 block#=3021 blocks=1 obj#=74022 tim=53267438504 WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3022 blocks=1 obj#=74022 tim=53267438532 WAIT #140375247563104: nam='db file sequential read' ela= 2 file#=12 block#=3023 blocks=1 obj#=74022 tim=53267438552 WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3024 blocks=1 obj#=74022 tim=53267438576 WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3025 blocks=1 obj#=74022 tim=53267438603 WAIT #140375247563104: nam='db file sequential read' ela= 26 file#=12 block#=3026 blocks=1 obj#=74022 tim=53267438647 WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3027 blocks=1 obj#=74022 tim=53267438680 WAIT #140375247563104: nam='db file sequential read' ela= 2 file#=12 block#=3028 blocks=1 obj#=74022 tim=53267438699 WAIT #140375247563104: nam='db file sequential read' ela= 4 file#=12 block#=3029 blocks=1 obj#=74022 tim=53267438781 WAIT #140375247563104: nam='db file sequential read' ela= 3 file#=12 block#=3030 blocks=1 obj#=74022 tim=53267438807 WAIT #140375247563104: nam='db file sequential read' ela= 28 file#=12 block#=3031 blocks=1 obj#=74022 tim=53267438878 ...

The name ‘sequential read’ does not mean the same as the Postgres ‘sequential read’. It only means single-block reads that are done one after the other, but they are actually random reads. However, looking at the block# they appear as reading contiguous blocks.

At the end, because I have an index with good clustering factor, and because I’m using the defaults on Linux without direct read and asynchronous I/O, the execution is very similar to the postgres one: read the few index blocks and follow the pointer to the 140 blocks of the table.

The cost estimation looks similar (same number) between Postgres and Oracle but it is not the same unit. Postgres estimates the cost with sequential reads, but Oracle estimates the cost as random reads. In addition to that, Postgres, with its default planner parameters, gives more importance than Oracle to the CPU usage.

This is the good case of Index Access where we have a good clustering/correlation factor between the physical order of the table and the logical order of the index. The random reads are finally behaving as sequential read because there is no seek() between them. You can imagine that in the next post I’ll try the same with a very bad clustering factor.

Cet article Postgres vs. Oracle access paths VI – Index Scan est apparu en premier sur Blog dbi services.