7 minute readEMC VNX2 and Hot Sparing : What You Need to Know

Hot Sparing with VNX2

EMC releases new products often, but when VNX2 was released, it changed the way hot spares function completely. VNX2 comes in a variety of devices, and now offers many changes that will allow for faster speeds, due to running MCx series code (05.33), also known as “multi-core optimization”, as defined by EMC.

VNX2 EOSL Dates

It is good to note that some of the VNX2 line has already reached its EOSL dates, while others will reach their dates into 2023.

VNX VG10 N/A
VNX VG2 (WITH VNX OE 7.X) N/A
VNX VG50 N/A
VNX VG8 (WITH VNX OE 7.X) N/A
VNX5100 12/31/2020
VNX5200 01/31/2023
VNX5300 12/31/2020
VNX5400 01/31/2023
VNX5500 12/31/2019
VNX5600 01/31/2023
VNX5700 12/31/2019
VNX5800 01/31/2023
VNX7500 12/31/2019
VNX7600 01/31/2023
VNX8000 01/31/2023
VNXE N/A
VNXE1600 01/31/2023
VNXE3100 03/31/2018
VNXE3150 N/A

What is a Hot Spare? 

With MCx, we have also seen changes to how Hot Sparing works in a VNX Array. When a drive fails, Permanent Sparing is now the method used. 

Benefits of a hot spare:

  • Reduces the mean time to recovery (MTTR) for the RAID redundancy group
  • Reduces probability of a second disk failure
  • Reduces probability of data loss that can occur in any singly redundant RAID.

Permanent Sparing

When a drive fails in a Raid Group, whether this is a Traditional RG or Pool internal private RG, the RG rebuilds to a suitable spare drive. This is located in the VNX. At this point, the used Spare drive will become a permanent part of the RG. 

When the failed drive gets replaced, it will then become a Spare for eligible drives within the rest of the VNX Array. This method of sparing removes the previous methods where the Hot Spare would eventually go back to the original drive location (known as B_E_D) once it was replaced.

Replacing VNX2

VNX2 Hot Sparring Rules 

When the VNX2 chooses a hot spare, the MCR code goes through a list of priorities of which drive to select as the Hot Spare for the failed drive. Typically, it goes through four steps:

  1. Drive Type: If a SAS drive fails, the first rule processed will be to look for an unused SAS drive. If none are available, then the second best will be NL-SAS or an SSD drive. 
  2. Bus: Next, it will look on the same bus as the failed drive. This will be looking for the drive type it has selected to be the replacement. 
  3. Size: Find a drive with the same size or larger.
  4. Enclosure: Finally, it will try to find a drive in the same Disk Array Enclosure.

For example, if you have a 20 Hard Disk Drive end the usage is 2 per week. If you want to have spare for 6 months, then you will need to order about 40-50 HDD. You can alternatively stock 10% of the total of each type of Disk Drives. These rules go for other parts of the VNX2 system, but you will need to look at each system configuration.

VNX2 No Longer Utilizes Hot-Spare Designations

Sparing is now utilized from unassigned drives and is considered permanent. There is no equalization upon replacement. The unassigned drive permanently becomes the replacement. In other words, the user will no longer select specific drives as “hot spares”. 

The MCx code will consider any and every unconfigured drive in the array to be an available spare. The user will no longer configure drives in the array to be designated as a “spare” for a failed drive. As long as there is a drive available, sparing will occur. 

We found this discussion thread on EMC’s site to be particularly helpful: https://community.emc.com/thread/184504

Issues with VNX2 Hot Sparing

One of the many potential issues with EMC VNX2 Hot Sparing is that it will be possible to spare out to a lower resolution drive. The MCx code does not take into account form factor or RPM speed of the drive. Therefore, if a lower RPM drive is the only unassigned drive available, a 15KRPM drive can and will spare out to a 7.2KRPM drive. This puts your performance at risk. In addition, this puts more management responsibility on the administrator than before. 

In essence, administrators will be expected to frequently monitor whether a drive has failed over to a slower drive or drives. With current-generation VNX devices, you know the destination to which your drives are failing over. With the new VNX, you may not be so sure where or when they are failing over, and it will be up to the administrator to determine whether a drive has failed over to a slower drive.

How Raid Group Configuration Has Changed

In the past, the Bus, Enclosure & Disk (B.E.D.) was used to identify RAID group elements. However, in these VNX2 devices, serial numbers will be used to determine which drives are associated with a particular RAID group. This new methodology allows for a new feature called “drive mobility”. This means you may physically move a drive to another slot, and it will be recognized as part of a specific RAID group and continue processing, no matter where this drive is placed within the array. This has been done before and works well in certain instances, for example, when trying to balance or rebalance drives. However, this fails miserably when the process is not well planned or followed.

Considering an Upgrade to VNX2?

It is difficult to predict how these changes will be fully implemented for years to come. When looking from an engineering standpoint, the first thing that stands out is that these changes are extremely complicated. Not only are these changes complicated with regard to OS and code implementation, but especially if you are considering an upgrade.

If you are considering an upgrade to the EMC VNX2 model, you need to take into account that you might experience performance issues. You should also take into account that you will need to plan your system with a certified engineer to ensure your data is not compromised. 

At Reliant, we offer high-quality engineers available to help 24/7. If you would like more information on the VNX2 and hot sparing contact us today.