Wednesday, November 3, 2010

ASM mount fails with ORA-15032 + ORA-15063

Development DBA's were using a single node Dell BOX for 11gR1 ASM , RDBMS and later decided to move it to two node RAC and hence deinstralled the existing software and had left the DB files as such so that they can use the same DB's after the the RAC setup. After the CRS,ASM and RDBMS were installed, they had 2 new disks to be added for the RAC nodes.

Using DBCA an asm instance was created and a diskgroup called DG1 was created with the 2 new disks, and one of the ASM instance's alert log started thowing the below error.My help was sought to see if anything was faced like this in production.


ERROR: diskgroup DG1 was not mounted
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DG1"
ORA-15038: disk '' size mismatch with diskgroup [1048576] [4096] [512]
ERROR: ALTER DISKGROUP ALL MOUNT

First check the disks in both the nodes :

Node - 1

SQL> select MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,NAME,PATH,TOTAL_MB,FREE_MB from v$asm_disk;

MOUNT_S HEADER_STA MODE_ST NAME PATH TOTAL_MB FREE_MB
------- ---------- ------- ------------------ -------------------------------------------------- ---------- ----------
CLOSED MEMBER ONLINE /dev/raw/raw3 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw5 0 0
CLOSED MEMBER ONLINE /dev/raw/raw1 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw2 0 0
IGNORED MEMBER ONLINE /dev/raw/raw4 0 0


Node - 2

SQL> select MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,NAME,PATH,TOTAL_MB,FREE_MB from v$asm_disk;

MOUNT_S HEADER_STA MODE_ST NAME PATH TOTAL_MB FREE_MB
------- ---------- ------- ------------------ -------------------------------------------------- ---------- ----------
CLOSED FOREIGN ONLINE /dev/raw/raw2 0 0
CLOSED FOREIGN ONLINE /dev/raw/raw5 0 0
CACHED MEMBER ONLINE DG1_0001 /dev/raw/raw3 547419 168410
CACHED MEMBER ONLINE DG1_0000 /dev/raw/raw4 547419 168257

Inference 1:

From the above details /dev/raw/raw4 was the old disk mounted in the old standalone ASM, which is not made visible in the new node.

Note: Also I could see all the files from the standalone DB synchronized in the second node DG1, this takes us closer to our issue.

From this I used kfed to check what the disk headers had to say

(Contenet shortened for better reading)

[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw1

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0000 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0000 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0

[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw4

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0000 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0000 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0


[oracle@wv1devdb03b dev]$ kfed read /dev/raw/raw3

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DG1_0001 ; 0x028: length=8
kfdhdb.grpname: DG1 ; 0x048: length=3
kfdhdb.fgname: DG1_0001 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0

Yes now we know the issue, the disk '/dev/raw/raw1' was earlier mounted as diskgroup DG1 (and of course was not cleaned up) , this could have better if the new disks were created with a new diskgroup name viz., DG2. So the end issue is we have mismatching set of disks for DG1 in both the nodes and also we have two disks with the name as "DG1_000".

So what could be done to resolve this (dev setup gives me more liberty :-) ), in our case the below :

1. Commented the /dev/raw/raw1 and restarted node 1.
2. ASM now had same disks in both sides and it has come up fine.
3. We returned the old disk raw1 to the storage team.

What could be done to avoid this :

1. Mounted the raw1 disk on both the nodes.
2. Else could have created the diskgroup with a different name, very simple I guess.

No comments: