一. 先看Oracle 官方文檔
參考:
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16794/intro.htm#CWADD91998
Oracle Clusterware Software Concepts and Requirements
Oracle Clusterware uses voting disk files to provide fencing and cluster node membership determination. OCR provides cluster configuration information. You can place the Oracle Clusterware files on either Oracle ASM or on shared common disk storage. If you configure Oracle Clusterware on storage that does not provide file redundancy, then Oracle recommends that you configure multiple locations for OCR and voting disks. The voting disks and OCR are described as follows:
Oracle Clusterware uses voting disk files to determine which nodes are members of a cluster. You can configure voting disks on Oracle ASM, or you can configure voting disks on shared storage.
If you configure voting disks on Oracle ASM, then you do not need to manually configure the voting disks. Depending on the redundancy of your disk group, an appropriate number of voting disks are created.
If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage. This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
You should have at least three voting disks, unless you have a storage device, such as a disk array that provides external redundancy. Oracle recommends that you do not use more than five voting disks. The maximum number of voting disks that is supported is 15.
Oracle Clusterware uses the Oracle Cluster Registry (OCR) to store and manage information about the components that Oracle Clusterware controls, such as Oracle RAC databases, listeners, virtual IP addresses (VIPs), and services and any applications. OCR stores configuration information in a series of key-value pairs in a tree structure. To ensure cluster high availability, Oracle recommends that you define multiple OCR locations. In addition:
o You can have up to five OCR locations
o Each OCR location must reside on shared storage that is accessible by all of the nodes in the cluster
o You can replace a failed OCR location online if it is not the only OCR location
o You must update OCR through supported utilities such as Oracle Enterprise Manager, the Server Control Utility (SRVCTL), the OCR configuration utility (OCRCONFIG), or the Database Configuration Assistant (DBCA)
See Also:
Chapter 2, "Administering Oracle Clusterware" for more information about voting disks and OCR
Oracle Clusterware Network Configuration Concepts
Oracle Clusterware enables a dynamic Grid Infrastructure through the self-management of the network requirements for the cluster. Oracle Clusterware 11g release 2 (11.2) supports the use of dynamic host configuration protocol (DHCP) for all private interconnect addresses, as well as for most of the VIP addresses. DHCP provides dynamic configuration of the host's IP address, but it does not provide an optimal method of producing names that are useful to external clients.
When you are using Oracle RAC, all of the clients must be able to reach the database. This means that the VIP addresses must be resolved by the clients. This problem is solved by the addition of the Oracle Grid Naming Service (GNS) to the cluster. GNS is linked to the corporate domain name service (DNS) so that clients can easily connect to the cluster and the databases running there. Activating GNS in a cluster requires a DHCP service on the public network.
Implementing GNS
To implement GNS, you must collaborate with your network administrator to obtain an IP address on the public network for the GNS VIP. DNS uses the GNS VIP to forward requests for access to the cluster to GNS. The network administrator must delegate a subdomain in the network to the cluster. The subdomain forwards all requests for addresses in the subdomain to the GNS VIP.
GNS and the GNS VIP run on one node in the cluster. The GNS daemon listens on the GNS VIP using port 53 for DNS requests. Oracle Clusterware manages the GNS and the GNS VIP to ensure that they are always available. If the server on which GNS is running fails, then Oracle Clusterware fails GNS over, along with the GNS VIP, to another node in the cluster.
With DHCP on the network, Oracle Clusterware obtains an IP address from the server along with other network information, such as what gateway to use, what DNS servers to use, what domain to use, and what NTP server to use. Oracle Clusterware initially obtains the necessary IP addresses during cluster configuration and it updates the Oracle Clusterware resources with the correct information obtained from the DHCP server.
Single Client Access Name (SCAN)
Oracle RAC 11g release 2 (11.2) introduces the Single Client Access Name (SCAN). The SCAN is a single name that resolves to three IP addresses in the public network. When using GNS and DHCP, Oracle Clusterware configures the VIP addresses for the SCAN name that is provided during cluster configuration.
The node VIP and the three SCAN VIPs are obtained from the DHCP server when using GNS. If a new server joins the cluster, then Oracle Clusterware dynamically obtains the required VIP address from the DHCP server, updates the cluster resource, and makes the server accessible through GNS.
Example 1-1 shows the DNS entries that delegate a domain to the cluster.
# Delegate to gns on mycluster
mycluster.example.com NS myclustergns.example.com
#Let the world know to go to the GNS vip
myclustergns.example.com. 10.9.8.7
See Also:
Oracle Grid Infrastructure Installation Guide for details about establishing resolution through DNS
Configuring Addresses Manually
Alternatively, you can choose manual address configuration, in which you configure the following:
· One public host name for each node.
· One VIP address for each node.
You must assign a VIP address to each node in the cluster. Each VIP address must be on the same subnet as the public IP address for the node and should be an address that is assigned a name in the DNS. Each VIP address must also be unused and unpingable from within the network before you install Oracle Clusterware.
· Up to three SCAN addresses for the entire cluster.
Note:
The SCAN must resolve to at least one address on the public network. For high availability and scalability, Oracle recommends that you configure the SCAN to resolve to three addresses.
See Also:
Your platform-specific Oracle Grid Infrastructure Installation Guide installation documentation for information about system requirements and configuring network addresses
Overview of Oracle Clusterware Platform-Specific Software Components
When Oracle Clusterware is operational, several platform-specific processes or services run on each node in the cluster. This section describes these various processes and services.
The Oracle Clusterware Stack
Oracle Clusterware consists of two separate stacks: an upper stack anchored by the Cluster Ready Services (CRS) daemon (crsd
)
and a lower stack anchored by the Oracle High Availability Services daemon (ohasd
). These two stacks have several processes that
facilitate cluster operations. The following sections describe these stacks in more detail:
· The Cluster Ready Services Stack
· The Oracle High Availability Services Stack
The Cluster Ready Services Stack
The list in this section describes the processes that comprise CRS. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.
· Cluster Ready Services (CRS): The primary program for managing high availability operations in a cluster.
The CRS daemon (crsd
) manages cluster resources based on the configuration information that is stored in OCR for each resource.
This includes start, stop, monitor, and failover operations. The crsd
process generates events when the status of a resource changes. When you have Oracle RAC installed, the crsd
process
monitors the Oracle database instance, listener, and so on, and automatically restarts these components when a failure occurs.
· Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interface with your clusterware to manage node membership information.
The cssdagent
process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process
Monitor Daemon (oprocd
), also known as OraFenceService
on Windows. A cssdagent
failure
may result in Oracle Clusterware restarting the node.
· Oracle ASM: Provides disk management for Oracle Clusterware and Oracle Database.
· Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle Clusterware.
· Event Management (EVM): A background process that publishes events that Oracle Clusterware creates.
· Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application Notification (FAN) events.
· Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g release 1 (11.1).
· Oracle
Root Agent (orarootagent): A specialized oraagent
process that helps crsd
manage
resources owned by root
, such as the network, and the Grid virtual IP address.
The Cluster Synchronization Service (CSS), Event Management (EVM), and Oracle Notification Services (ONS) components communicate with other cluster component layers on other nodes in the same cluster database environment. These components are also the main communication links between Oracle Database, applications, and the Oracle Clusterware high availability components. In addition, these background processes monitor and manage database operations.
The Oracle High Availability Services Stack
This section describes the processes that comprise the Oracle High Availability Services stack. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.
· Cluster Logger Service (ologgerd): Receives information from all the nodes in the cluster and persists in a CHM Repository-based database. This service runs on only two nodes in a cluster.
· System Monitor Service (osysmond): The monitoring and operating system metric collection service that sends the data to the cluster logger service. This service runs on every node in a cluster.
· Grid Plug and Play (GPNPD): Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.
· Grid Interprocess Communication (GIPC): A support daemon that enables Redundant Interconnect Usage.
· Multicast Domain Name Service (mDNS): Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
· Oracle Grid Naming Service (GNS): Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster.
二. 查看OHASD 資源
Oracle High Availability Services Daemon (OHASD) :This process anchors the lower part of the Oracle Clusterware stack, which consists of processes that facilitate cluster operations.
在11gR2裏面啓動CRS的時候,會提示ohasd已經啓動。 那麼這個OHASD到底包含哪些資源。 我們可以通過如下命令來查看:
[grid@racnode1 ~]$ crsctl stat res -init -t
---------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
---------------------------------------------------------
Cluster Resources
---------------------------------------------------------
ora.asm
1 ONLINE ONLINE racnode1 Started
ora.crsd
1 ONLINE ONLINE racnode1
ora.cssd
1 ONLINE ONLINE racnode1
ora.cssdmonitor
1 ONLINE ONLINE racnode1
ora.ctssd
1 ONLINE ONLINE racnode1 OBSERVER
ora.diskmon
1 ONLINE ONLINE racnode1
ora.drivers.acfs
1 ONLINE UNKNOWN racnode1
ora.evmd
1 ONLINE ONLINE racnode1
ora.gipcd
1 ONLINE ONLINE racnode1
ora.gpnpd
1 ONLINE ONLINE racnode1
ora.mdnsd
1 ONLINE ONLINE racnode1
在10g平臺下,RAC的一些資源,在我的Blog:
RAC 的一些概念性和原理性的知識
http://blog.csdn.net/tianlesoftware/archive/2010/02/27/5331067.aspx
裏已經做了相關的說明。
分別看下這些進程:
(1)ora.asm:這個是asm 實例的進程。 在10g裏, OCR和Voting disk 是放在其他共享設備上的。 11gR2裏面,默認是放在ASM裏面。 在Clusterware啓動的時候需要讀取這些信息,所以在集羣啓動的時候需要先啓動ASM實例。
(2)ora.crsd,ora.cssd 和 ora.evmd:
這三個進程是Clusterware中最重要的3個進程.
在10g中,在安裝clusterware的最後階段,會要求在每個節點執行root.sh 腳本, 這個腳本會在/etc/inittab 文件的最後把這3個進程加入啓動項,這樣以後每次系統啓動時,Clusterware 也會自動啓動,其中EVMD和CRSD 兩個進程如果出現異常,則系統會自動重啓這兩個進程,如果是CSSD 進程異常,系統會立即重啓。
在11gR2中,只會將ohasd 寫入/etc/inittab 文件。
[grid@racnode1 init.d]$ cat /etc/inittab
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
所以在10g中常用的/etc/init.d/init.crs 之類的命令都沒有了。 就剩下一個/etc/init.d/init.ohasd 進程。
OCSSD :這個進程是Clusterware最關鍵的進程,如果這個進程出現異常,會導致系統重啓,這個進程提供CSS(Cluster Synchronization Service)服務。 CSS服務通過多種心跳機制實時監控集羣狀態,提供腦裂保護等基礎集羣服務功能。
CRSD:是實現"高可用性(HA)"的主要進程,它提供的服務叫作CRS(Cluster Ready Service) 服務。所有需要 高可用性 的組件,都會在安裝配置的時候,以CRS Resource的形式登記到OCR中,而CRSD 進程就是根據OCR中的內容,決定監控哪些進程,如何監控,出現問題時又如何解決。也就是說,CRSD 進程負責監控CRS Resource 的運行狀態,並要啓動,停止,監控,Failover這些資源。 默認情況下,CRS 會自動嘗試重啓資源5次,如果還是失敗,則放棄嘗試。
CRS Resource 包括GSD(Global Serveice Daemon),ONS(Oracle Notification Service),VIP, Database, Instance 和 Service.
EVMD:負責發佈CRS 產生的各種事件(Event). 這些Event可以通過2種方式發佈給客戶:ONS 和 Callout Script.
這三個進程各自的作用,具體參考
RAC 的一些概念性和原理性的知識
http://blog.csdn.net/tianlesoftware/archive/2010/02/27/5331067.aspx
中的說明。
(3)Grid Plug and Play (GPNPD):
Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.
(4)Grid Interprocess Communication (GIPC):
A support daemon that enables Redundant Interconnect Usage.
(5)ora.mdns
Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
(6)Cluster Time Synchronization Service (CTSS):
Provides time management in a cluster for Oracle Clusterware. 在上面的查詢結果中,我們看到CTSS 的狀態是OBSERVER。即旁觀者。
在11gR2中,RAC在安裝的時候,時間同步可以用兩種方式來實現,一是NTP,還有就是CTSS. 當安裝程序發現 NTP 協議處於非活動狀態時,安裝集羣時間同步服務將以活動模式自動進行安裝並通過所有節點的時間。如果發現配置了 NTP,則以觀察者模式啓動集羣時間同步服務,Oracle Clusterware 不會在集羣中進行活動的時間同步。
(7)Automatic Storage Management Cluster File System (Oracle ACFS):
Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is a multi-platform, scalable file system, and storage management technology that extends Oracle Automatic Storage Management (Oracle ASM) functionality to support customer files maintained outside of Oracle Database. Oracle ACFS supports many database and application files, including executables, database trace files, database alert logs, application reports, BFILEs, and configuration files. Other supported files are video, audio, text, images, engineering drawings, and other general-purpose application file data.
An Oracle ACFS file system is a layer on Oracle ASM and is configured with Oracle ASM storage, as shown in Figure 5-1. Oracle ACFS leverages Oracle ASM functionality that enables:
· Oracle ACFS dynamic file system resizing
· Maximized performance through direct access to Oracle ASM disk group storage
· Balanced distribution of Oracle ACFS across Oracle ASM disk group storage for increased I/O parallelism
· Data reliability through Oracle ASM mirroring protection mechanisms
更多內容參考:
http://download.oracle.com/docs/cd/E11882_01/server.112/e16102/asmfilesystem.htm#OSTMG31000
三. 查看CRS資源
在11.2中,對CRSD資源進行了重新分類: Local Resources 和 Cluster Resources。 OHASD 指的就是Cluster Resource.
[grid@racnode1 ~]$ crsctl stat res -t
---------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
---------------------------------------------------------
Local Resources
---------------------------------------------------------
ora.CRS.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.DATA.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.FRA.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.LISTENER.lsnr
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.asm
ONLINE ONLINE racnode1 Started
ONLINE ONLINE racnode2 Started
ora.eons
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.gsd
OFFLINE OFFLINE racnode1
OFFLINE OFFLINE racnode2
ora.net1.network
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.ons
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.registry.acfs
ONLINE UNKNOWN racnode1
ONLINE ONLINE racnode2
---------------------------------------------------------
Cluster Resources
---------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE racnode2
ora.oc4j
1 OFFLINE OFFLINE
ora.racdb.db
1 ONLINE ONLINE racnode1 Open
2 ONLINE ONLINE racnode2 Open
ora.racnode1.vip
1 ONLINE ONLINE racnode1
ora.racnode2.vip
1 ONLINE ONLINE racnode2
ora.scan1.vip
1 ONLINE ONLINE racnode2
[grid@racnode1 ~]$
從上面的查詢結果可以看出,在11gR2中把network,disgroup,eons,和 asm 也作爲了一種資源。
還有一點需要注意:就是gsd 和 oc4j 這兩資源,他們是offlie的。 說明如下:
ora.gsd is OFFLINE by default if there is no 9i database in the cluster.
ora.oc4j is OFFLINE in 11.2.0.1 as Database Workload Management(DBWLM) is unavailable. these can be ignored in 11gR2 RAC.
也可用如下命令來查看進程:
[root@racnode1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.CRS.dg ora....up.type ONLINE ONLINE racnode1
ora.DATA.dg ora....up.type ONLINE ONLINE racnode1
ora.FRA.dg ora....up.type ONLINE ONLINE racnode1
ora....ER.lsnr ora....er.type ONLINE ONLINE racnode1
ora....N1.lsnr ora....er.type ONLINE ONLINE racnode2
ora.asm ora.asm.type ONLINE ONLINE racnode1
ora.eons ora.eons.type ONLINE ONLINE racnode1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE racnode1
ora.oc4j ora.oc4j.type OFFLINE OFFLINE
ora.ons ora.ons.type ONLINE ONLINE racnode1
ora.racdb.db ora....se.type ONLINE ONLINE racnode1
ora....SM1.asm application ONLINE ONLINE racnode1
ora....E1.lsnr application ONLINE ONLINE racnode1
ora....de1.gsd application OFFLINE OFFLINE
ora....de1.ons application ONLINE ONLINE racnode1
ora....de1.vip ora....t1.type ONLINE ONLINE racnode1
ora....SM2.asm application ONLINE ONLINE racnode2
ora....E2.lsnr application ONLINE ONLINE racnode2
ora....de2.gsd application OFFLINE OFFLINE
ora....de2.ons application ONLINE ONLINE racnode2
ora....de2.vip ora....t1.type ONLINE ONLINE racnode2
ora....ry.acfs ora....fs.type ONLINE ONLINE racnode2
ora.scan1.vip ora....ip.type ONLINE ONLINE racnode1
ora.scan2.vip ora....ip.type ONLINE ONLINE racnode2
[root@racnode1 ~]#
四. 查看各種資源之間的依賴關係
比如DG resource依賴於ASM,VIP依賴於network。這些可以從資源的詳細屬性看出:
[root@racnode1 ~]# crsctl stat res ora.DATA.dg -p
NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never
CHECK_INTERVAL=300
CHECK_TIMEOUT=600
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=CRS resource type definition for ASM disk group resource
ENABLED=1
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(ora.asm) pullup(ora.asm)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:ora.asm)
STOP_TIMEOUT=180
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_OPI=false
USR_ORA_STOP_MODE=
VERSION=11.2.0.1.0
[grid@racnode1 ~]$ crsctl stat res ora.racnode1.vip -p
NAME=ora.racnode1.vip
TYPE=ora.cluster_vip_net1.type
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip)
DEGREE=1
DESCRIPTION=Oracle VIP resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=racnode1
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=favored
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
USR_ORA_VIP=racnode1-vip
VERSION=11.2.0.1.0