Overview

Here at FlexRAID, we are big proponents of virtualization. In fact, our whole infrastructure is virtualized. Everything from our development environments to our production environments lives on a virtualization platform of some kind.

This also means that we have various storage deployments under these virtualization platforms.
In this article, we will discuss specifically of storage deployment under VMware ESXi and explore some of the options available under that platform.

The biggest tug of war in computing is the one that exists between transparency and abstraction. Virtualization is all about abstraction whereas high performance compute is all about direct resource access (low latency or full transparency). The cost of abstraction is both performance and functionality. Abstraction can add latency and rob on performance. Additionally, it tries to arrive at a common functional denominator, which means it removes those features that are not common across the abstracted devices.

Luckily, most virtualization platforms now include the option of passing through certain hardware for direct access within virtual machines.
End users can now virtualize devices as needed and opt out wherever it makes sense to do so.

Virtualization options for storage deployments

For this topic, we are going to limit ourselves to storage virtualization and the opt-out options under VMware ESXi.
The illustration below shows the four storage options available under ESXi: direct access (IOMMU/Vt-d), Physical RDM, Virtual RDM, and VMDK.
01. ESXip1 - Transparency vs Abstraction

Without boring you with too specific virtualization talks, we will describe some aspects of each option:

  • The most transparent option is IOMMU/Vt-d (VMDirectPath), which lets you pass-through an entire storage controller to a virtual machine. The storage controller can be a hardware RAID card or one that just passes through the disks (controller without RAID). With this, there is zero abstraction on the disk devices on that passed through controller card. You can access the card and the storage it hosts just as you would on a physical machine.
  • Next is physical RDM. In physical RDM, the disks are minimally abstracted in that ESXi passes all SCSI commands to the device except for one command, which is used to used to distinguish the device from others. Otherwise, all physical characteristics of the underlying hardware are exposed. If you think about it, you will note that the VM will have a virtual controller entirely unrelated to the physical controller hosting the disk and that a disk can be on a port that is different on the virtual disk controller than it is on the physical controller. A translation needs to be made at some level for all this to work. Outside of that translation through, everything else is forwarded verbatim to the disk.
  • The next level is virtual RDM. In this mode, ESXi only sends READ and WRITE commands to the mapped device. All other commands are virtualized as done for VMDK file based disks. A virtual RDM behaves the same as a VMDK file based disk except that it is backed by raw block storage.
  • Finally, we have the VMDK file based disk, which is a fully virtualized storage device. Being just a file, such device is very portable. You can copy, duplicated, or move it anywhere you want without much restriction. For instance, you can move it from an iSCSI datastore, to an NFS datastore, to a local datastore with ease. This is something you cannot do with something directly backed by raw physical storage. The file system on which the VMDK file resides provides all the needed abstraction from the actual block that otherwise backs the file.

As stated, we have only brushed on some aspects for each of the above options. There are additional restrictions, advantages, and requirements for each of the options, which we are skipping for the sake of getting to the recommendation were are about to make.

Software RAID: choosing physical RDM

When it comes large software RAID based storage deployment under ESXi, we recommend going with Physical RDMs.

  • Why not VMDK files?
    1. Most of the features of VMDK become less important when dealing with large storage. Things such as thin provisioning, ease of VMDK file expansion, etc., simply lose their context. You will find that data management, maintenance, and migration for large storage deployment will follow the same pattern whether you are using fully virtualized disks versus physical disks when the storage is sufficiently large and spans across multiple disks.
    2. More importantly, many low level functions such as SMART for disk health monitoring, secure disk erase, TRIM, the ability to put your disks into sleep state in order to save energy, etc. are not available in VMDK file based disks. Such features can be critical in effectively maintaining your large storage array.
    3. Finally, there is an IOPS cost to the abstract that VMDK provides. Writing to a file in a VMDK based disk is writing to a file on a volume on a disk in a file residing on a volume on a disk. There are two file system overheads at play in addition to multiple I/O routing.
    4. For such cases, it just makes sense to skip the abstraction along with its restrictions if no real practical value is provided.

  • What about Virtual RDM?
    Virtual RDMs have the same advantages and restrictions as do VMDK file based disks except for:
    – that a virtual RDM must be a whole disk or LUN (not as flexible as VMDK files)
    – and that virtual RDMs have better IOPS than VMDK files as read/write I/O operations inside the virtual machine (VM) are passed through to the physical disk

    Ultimately though, the restrictions on low level disk access is an important factor against going virtual RDM. You want to be able to monitor the health of your disks. You want to be able to allow your idle disks to go into a sleep state and conserve energy. And, if using SSD storage, you want TRIM support to achieve the best continuous performance. Etc.

  • Then why not IOMMU/Vt-d (VMDirectPath)?
    Wouldn’t the ultimate choice be a complete pass through of the storage controller?
    True. In many cases, passing through the I/O disk controller is the best approach. For cases such as hardware RAID or where there is an application that needs to communicate directly with the storage controller, it makes sense to do IOMMU/Vt-d (VMDirectPath).

    Nevertheless, there is a number of restrictions that comes with IOMMU/Vt-d.

    1. Hardware support: you need a completely supported platform for IOMMU/Vt-d to work. The CPU, motherboard, I/O card, and virtualization platform need proper support and compatibility level for it to work.
    2. Stability: even if supported, the configuration is not guaranteed to be stable.
    3. Inflexible: passing through the storage controller is an all or nothing type of deal. What if you have an 8-ports controller and only want to pass through 6 disks to a VM with the other 2 disks being reserved for another VM?
    4. Most virtualization platforms also impose additional restrictions on virtual machines (VMs) that have pass-through devices due to technical limitations.

For software RAID deployments, there is never a need to talk to the storage controller directly outside of standard disk commands. The only thing one cares for is direct access to the disk devices. As such, physical RDMs stand as the least restrictive and least problematic option – none of the drawbacks of IOMMU/Vt-d and none of the restrictions of VMDK/virtual RMDs. In most cases, it is best to leave the disk controller under the management of the virtualization platform but simply gain direct access to the physical disk devices. The only thing virtualized in Physical RDM is the device address, which should not be an issue for all software RAID environments.

Setting up Physical RDMs

In the next section, we will talk about how to setup Physical RDMs in ESXi.
There are two ways to create RDMs in ESXi (whether virtual or physical):
– the non-supported way using vmkfstools
– the supported way (as described below)

Please be warned that the unsupported way should never be used on critical data. There are case scenarios under which you will lose data when using RDMs created using the unsupported approach. The technicals behind this are not highly discussed, and we are not going to cover them either (internal reasons).

1. Enable the use of local storage as RDM in ESXi. By default, ESXi restricts the use of local storage as RDM. RDM was primarily designed to carve out LUN on SAN storage (remote storage).
Counter intuitively, unchecking the checkbox for the RdmFilter.HbaIsShared property enables the use of local storage as RDM on supported disk controllers.
02. ESXip1 - Advanced Settings for RDM

2. Check whether your controller is supported by ensuring that it does not present your disks as “Local ATA” devices.
04. ESXip1 - LSI2008

3. Below is an unsupported controller showing its devices as “Local ATA Disk” devices. You can create RDM on such disk using the unsupported methods. However, please be again warned that there are a few scenarios that could prove problematic with such setups.
Now, RDM mapped through the unsupported process will work fine under FlexRAID products (RAID-F and tRAID). You just have to watch out when using other solutions.
05. ESXip1 - Local ATA

4. To create an RDM using the VMware supported process, edit the configuration of your VM and choose to add a new hard disk.
The image below shows several physical RDMs already added to the VM. Nonetheless, we will be adding one more RDM to this particular VM.
06. ESXip1 - Add Disk

5. Select to add a hard disk.
07. ESXip1 - Add Hard Disk

6. Select the option to add Raw Device Mappings (RDMs).
08. ESXip1 - Add RDM

7. When using a supported disk controller, available disks will be shown for you to select from.
09. ESXip1 - Select LUN

8. By default, the RDM mapping file is stored inside the VM’s folder. We are choosing to store ours to a specific datastore.
10. ESXip1 - Select Datastore

9. As recommended for Software RAID large storage deployment, choose “Physical” compatibility mode to create a Physical RDM.
11. ESXip1 - Physical vs Virtual

10. Select the node where to add the disk.
12. ESXip1 - Device Mode

11. Finally, you are presented with a summary of your selections.
13. ESXip1 - Summary

12. Once you click finish, the RDM is created and added to the VM. If you ever remove the RDM from the VM without deleting its mapping file, you can add the RDM to another VM by choosing to add an existing disk and selecting the RDM mapping file.
14. ESXip1 - Result

13. Here is a view of a virtual machine with a VMDK file for OS disk and Physical RDMs for data disks.
FlexRAID Transparent RAID is used to RAID and pool the data disks under a unified storage.
15. ESXip1Disks Inside VM

14. A key benefit of having a Physical RDM is the ability to pull SMART data off the disks. In FlexRAID, you will need to set the “Device Type Mapping” in the SMART advanced settings to “sat,auto”. This is a controller settings. So, even though the storage controller is virtualized and we only have direct access to the physical disks, we can still send certain controller commands.
16. ESXip1SMART Advanced Settings

15. Here is SMART data pulled off a Physical RDM based disk.
17. ESXip1SMART