Recently I had the “opportunity” to play with boot from iSCSI SAN with Windows 2012 R2. This wasn’t the original design but sometimes you have to roll with the punches…and there were a lot of punches on this project. This will not be a “How To” as most of the steps are outlined in places I reference below. This will be provide some lessons learned. Also, Normally I’d include more screenshots but the pace was too fast on this project for that.
Before we get to the meat of our experience, let me tell you my conclusion. I would not advise anyone to attempt boot from iSCSI SAN with Windows Server at this time. FCoE or Fiber Channel may have a different experience. And I was able to get VMware to boot from iSCSI in minutes. But with Windows 2012R2 and iSCSI it feels half baked and poorly documented.
We worked with some very good technical support resources at Dell both on their PowerEdge server and Compellent Copilot teams and even they couldn’t find consistent definitive answers to many questions. We were told it wasn’t support, it was supported but only with different cards, it was fully supported, etc. When I mentioned this to one of the engineers he said he was finding the same conflicting information in his research. In the end we got it working, but I fear it will end up causing more problems down the line.
- SAN: Dell SC4020 10G iSCSI
- Servers: Dell PowerEdge R630
- Diskless (This was the first punch, per Dell sales if you order a server diskless it is not possible to add disks later. We’ve heard this from several sources, but I swear the chassis had a backplane for disks and cables running from labeled SAS1 SAS2. A switch mid scoping from VMware to Hyper-V lead to this mistake)
- NDC/LOM: Broadcom 5720 Quad Port 1G
- PCI Slot 2: Qlogic QLE8442 Dual Port 10G (Despite name uses Broadcom drivers)
- PCI Slot 3:Intel X520 Dual Port 10G
- Switches: Dell Network N4032 x 2 (one for each iSCSI network)
There is quite a bit of content on this. The best for Compellent right now has to be Andrew Parrish’s post on LinkedIn He does a great job running down the requirements for a resilient, redundant boot from SAN architecture using much of the same equipment as described above. (Sounds like he too had boot from iSCSI forced upon him.)
One thing he points out that I can confirm is still true is that the Intel X520 can be configured with a single target as of March 2016. This is important in architectures where you have four paths for full redundancy. It’s also odd because the Intel configuration UI refers to this as the “First target” which implies the ability to have a “Second target”
Important Notes on Installing
- The Windows installer can only handle seeing a single path during the install. I believe this lead to some of our woes. While we only ever configured a single path at the HBA/NIC level I think other adapters being online
- Drivers are an important piece but tough to find good info on:
- In the end we were able to install via the Intel X520s and a server 2012R2 ISO without streaming any drivers in
- I was able to find some Broadcom 5720 drivers and successfully stream them into a Windows ISO. As mentioned in this Microsoft KB you cannot load the drivers after the install process starts.
- I assume the same driver requirement applies to the QLE8442 since it is actually a Broadcom NetXtreme chipset. But I was never able to stream a driver to an ISO that would load. Truthfully, I didn’t spend as much time on this as while working on it Dell PowerEdge support figured out a way for me to install via the Intel X520
- Make sure the IQN you configure is valid. Ideally use a standard vendors IQN prefix but you can make up your own if you follow the syntax
- YYYY is year
- MM is month
- .com.vendor is whatever you want
- Initiator is used to identify the specific device, so usually servername works.
So, what actually worked for us. An engineer at Dell PowerEdge on their deployment side did some internal testing and I worked with him to get the install running. We are not entirely sure what made the difference but here is what worked for us:
- Use BIOS mode not UEFI (more about this below)
- Use the Intel X520, with only one connected
- I didn’t have to “Load Drivers” when doing the install. It may be included in Server 2012 R2
- When configuring the X520 iscsi IP do not set a gateway IP
- In the BIOS disable the onboard NICs (Broadcom 5720 in our case). I asked why he did this and he said he didn’t really know.
- We also booted from the ISO via iDRAC virtual media
We’d tried BIOS and UEFI mode and also tried booting the ISO from iDRAC. But we aren’t sure what made the difference in the above. We got Windows Installed and moved on.
BIOS vs UEFI
Again, as far as I can tell both are supported. After reading this PDF from Dell about “Enabling iSCSI boot under UEFI boot mode on Dell 13th gen servers” I really wanted to use UEFI since Dell is sort of taking ownership of the process and abstracting the particulars of specific NICs/HBAs. But once we got things installed via BIOS mode I didn’t get a chance to try UEFI.
I thought (hoped) our troubles were past once we got Windows installed, but there were still a few lessons to be learned.
Because of the limitations of the Intel X520 only supporting a single target per port I moved the booting to the QLE8442. This seemed to work well and failing over to the second target did work, though occasionally I would get a “boot device not found error.”
What ended up working for me:
- Use different IPs for the Window iSCSI NICs then you used in the BIOS. I tried using the same IP as suggested in some how-tos, but I ended up with IP address conflicts.
- Make sure you add the Compellent MPIO settings per their documentation before you configure multiple paths
- Do not use the QLogic Control Center to configure additional IPs for the HBAs. The iSCSI Boot Table (IBT) will automatically pass up the configuration of what ever NIC was used to actively boot. Windows iSCSI Initiator will handle the other paths.
- Configure all four paths in the Windows iSCSI Initiator. You’ll end up with 5 paths: one for the IBT connection and 4 from within windows, but if your server is forced to boot from an alternative path you’ll still want to have all the paths built.
I tested disconnecting NICs and it seemed to cause about 45-60 seconds of disruption. I played with the MPIO settings some to try to reduce this, but the results were inconsistent.
My confidence a bit higher, I tried to install Windows updates. Fortunately, I had the forethought to take a Replay (snapshot) of the boot volume on the SAN in advance.
After a round or two of updates and reboots I was greeted by a black screen with a blinking cursor. This is usually a sign of MBR corruption. Unfortunately because I’m booting from iSCSI, and had moved to the QLE card with the funky driver, I was unable to use the installer ISO to repair the MBR.
I ended up having to revert to the known good replay and iterating through updates to identify those that seemed to have issues in the Boot from iSCSI world. The difficulty of this was compounded by the fact that the failed boot only occurred on a second reboot after applying an update. So I ended up with a cadence of:
- Take a replay
- Apply update and reboot
- Reboot again
- Assuming it booted, take another replay and repeat the process
This worked and at least on the two systems I was working with it appears the following Microsoft Updates were the culprits:
After over 16 years in the IT infrastructure world this has to be one of the most frustrating challenges I faced. As I stated at the beginning I do not recommend Windows Server boot from iSCSI at this time. It feels like such a corner case that no one (Microsoft, Dell, Intel, Qlogic/Broadcom) has spent the time to work out the details. I’m sure others have made this work with less issues than we faced for whatever reason, but I consider the ability to get good support as important as any technical requirement and it just isn’t there.