How to (not) Generate Block Corruption on ASM

As  Leighton L. Nelson point me out in G+, there must other way to corrupt blocks directly in ASM making this task easier for testing purposes. My scenario to make this happen is the following:

* OEL5.7 (Red Hat Enterprise Linux Server release 5.7)
* 2 RAC nodes 11g (
* VirtualBox (4.1.10)
* VirtualBox shared storage disk /dev/sdh (1GB)

We will need a separate DiskGroup with only one ASM disk to accommodate the tablespace and the table with random data to corrupt.

[root@rac1 oracle]# /etc/init.d/oracleasm createdisk TEST_DSK /dev/sdh1
Marking disk "TEST" as an ASM disk:                        [  OK  ]

[oracle@rac1 bin]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576      1019      924                0             924              0             N  TEST_DG/

SQL> create tablespace ts_small datafile '+TEST_DG/RAC/DATAFILE/tdg.df' size 100k autoextend on;
Tablespace created.

SQL> alter user user_dbc default tablespace ts_small;
User altered.

Now using user_dbc create a small table that fits easily on 100k tablespace:

SQL> create table t_small_corrupt tablespace ts_small as select 'rownum:' || rownum R1 from dual connect by level <= 1200;

Table created.

First thing now is to discover datafile inside ASM, we can use X$KFFXP that contains the mapping between files, extents and AUs. With this table we can track the  position of datafile inside ASM (please note that external redundancy applies in this case):

SQL> column DSK_PATH format a25;
SQL> select distinct GROUP_KFFXP,DISK_KFFXP,NUMBER_KFFXP, (select path from v$asm_disk where group_number=GROUP_KFFXP and disk_number=DISK_KFFXP) DSK_PATH from x$kffxp where number_kffxp=(select file_number from v$asm_alias where name='tdg.df') and GROUP_KFFXP=(select group_number from v$asm_alias where name='tdg.df');

----------- ---------- ------------ -------------------------
          3          0          256 /dev/oracleasm/disks/TEST_DSK

GROUP_KFFXP – ASM disk group number
DISK_KFFXP – Disk number where extent is allocated
NUMBER_KFFXP – ASM file number
DSK_PATH – Device path for Disk

Next, we need to get the extent map for the datafile tdg.df (group 3, file number 256):

SQL> select xnum_kffxp,lxn_kffxp,pxn_kffxp,au_kffxp from x$kffxp where group_kffxp=3 and number_kffxp=256;

---------- ---------- ---------- ----------
         0          0          0        102

XNUM_KFFP – File extent number
LXN_KFFXP – 0 means that is a primary extent and not a mirrored extent (1 or 2, depends of redundancy)
PXN_KFFXP – Progressive file extent number that in this case (no mirrored extents) is always the same as XNUM_KFFXP
AU_KFFXP – Is the relative position of AU from the beginning of the disk

Time to use dd to access directy to disk using information we got from extent map of datafile. Use bs=1024k as AU for ASM is 1MB and skip parameter = (some) AU_KFFXP.

[root@rac1 oracle]# dd if=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102|strings|more
        r       d       V       H       :       ,
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.081027 seconds, 12.9 MB/s

So for sure, we hit the data now. No chance to miss!! 🙂

[root@rac1 oracle]# dd if=/dev/zero of=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.011318 seconds, 92.6 MB/s

Ok, all done, let’s see if Oracle is aware of some block corruption:

SQL>  select * from v$database_block_corruption;

no rows selected

To unsure we are not missing anything let’s run Oracle Health Monitor manually:

Basic Run Information
 Run Name                     : my_test_run
 Run Id                       : 1521
 Check Name                   : DB Structure Integrity Check
 Mode                         : MANUAL
 Status                       : COMPLETED
 Start Time                   : 2012-03-22 17:55:19.578660 +00:00
 End Time                     : 2012-03-22 17:55:19.593574 +00:00
 Error Encountered            : 0
 Source Incident Id           : 0
 Number of Incidents Created  : 0

Conclusion here is simple: no data corruption. My next ideia was to use strace to find anything that can give some hints:

mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1537f06000
ioctl(0, MGSL_IOCSTXIDLE or MTIOCGET or SNDCTL_MIDI_MPUCMD, 0x7fffc53e6c60) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(0, 106954752, SEEK_CUR)           = 0
read(0, "��������������������������������"..., 1048576) = 1048576
write(1, "��������������������������������"..., 1048576) = 1048576
munmap(0x7f1537f06000, 1060864)         = 0
close(0)                                = 0
close(1)                                = 0
write(2, "1+0 records inn1+0 records outn", 311+0 records in
1+0 records out
) = 31
write(2, "1048576 bytes (1.0 MB) copied", 291048576 bytes (1.0 MB) copied) = 29
write(2, ", 0.00314435 seconds, 333 MB/sn", 31, 0.00314435 seconds, 333 MB/s

I’m no strace expert, but besides ENOTTY error (Inappropriate I/O control operation) it SEEMS that dd can seek and write 0’s on the disk with success.

Conclusions so far:

0) I can backup datafile with RMAN with no errors
1) I can “full table scan” data (even after reset buffer cache and shared pool)
2) v$database_block_corruption returns no rows
3) Health Monitor reports nothing
4) Nothing on alert.log (on both instances)
5) Can read table data using dd (dd if=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102|strings|more)

6) ??

7) I was unable to corrupt anything!!

This is probably a VirtualBox issue or trick, i’m now short on time to make a few more tests (using VMWare for example), but i excepted that this method works on a non-virtualized environment. Next time i will blog about things that work 🙂

PS: Thanks to Martin Berger for time spent discussing this issue.


3 thoughts on “How to (not) Generate Block Corruption on ASM

  1. Hi,
    I actually used that for my backup tests, very useful, thanks 😉 As for dd, here:
    dd if=/dev/zero of=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102
    you should use seek instead of skip, when generating corruption:
    dd if=/dev/zero of=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 seek=102
    (skip refers to input file, seek – to output)

    My guess is that this asm diskgroup didn’t mount after restart, since you’ve overwritten it’s frist block. Did it?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s