Lessons Learned: Windows 2012 R2 Boot from iSCSI

Recently I had the “opportunity” to play with boot from iSCSI SAN with Windows 2012 R2. This wasn’t the original design but sometimes you have to roll with the punches…and there were a lot of punches on this project.   This will not be a “How To” as most of the steps are outlined in places I reference below. This will be provide some lessons learned. Also, Normally I’d include more screenshots but the pace was too fast on this project for that.

Before we get to the meat of our experience, let me tell you my conclusion.   I would not advise anyone to attempt boot from iSCSI SAN with Windows Server at this time. FCoE or Fiber Channel may have a different experience.  And I was able to get VMware to boot from iSCSI in minutes.  But with Windows 2012R2 and iSCSI it feels half baked and poorly documented.

We worked with some very good technical support resources at Dell both on their PowerEdge server and Compellent Copilot teams and even they couldn’t find consistent definitive answers to many questions.  We were told it wasn’t support, it was supported but only with different cards, it was fully supported, etc. When I mentioned this to one of the engineers he said he was finding the same conflicting information in his research.  In the end we got it working, but I fear it will end up causing more problems down the line.

Environment

  • SAN: Dell SC4020 10G iSCSI
  • Servers: Dell PowerEdge R630
    • Diskless  (This was the first punch, per Dell sales if you order a server diskless it is not possible to add disks later.  We’ve heard this from several sources, but I swear the chassis had a backplane for disks and cables running from labeled SAS1 SAS2.  A switch mid scoping from VMware to Hyper-V lead to this mistake)
    • NICs:
      • NDC/LOM: Broadcom 5720 Quad Port 1G
      • PCI Slot 2: Qlogic QLE8442 Dual Port 10G  (Despite name uses Broadcom drivers)
      • PCI Slot 3:Intel X520 Dual Port 10G
  • Switches: Dell Network N4032 x 2 (one for each iSCSI network)

Installing

There is quite a bit of content on this.  The best for Compellent right now has to be Andrew Parrish’s post on LinkedIn   He does a great job running down the requirements for a resilient, redundant boot from SAN architecture using much of the same equipment as described above.    (Sounds like he too had boot from iSCSI forced upon him.)

One thing he points out that I can confirm is still true is that the Intel X520 can be configured with a single target as of March 2016.  This is important in architectures where you have four paths for full redundancy.   It’s also odd because the Intel configuration UI refers to this as the “First target” which implies the ability to have a “Second target”

Important Notes on Installing

  1. The Windows installer can only handle seeing a single path during the install. I believe this lead to some of our woes. While we only ever configured a single path at the HBA/NIC level I think other adapters being online
  2. Drivers are an important piece but tough to find good info on:
    1. In the end we were able to install via the Intel X520s and a server 2012R2 ISO without streaming any drivers in
    2. I was able to find some Broadcom 5720 drivers and successfully stream them into a Windows ISO. As mentioned in this Microsoft KB you cannot load the drivers after the install process starts.
    3. I assume the same driver requirement applies to the QLE8442 since it is actually a Broadcom NetXtreme chipset. But I was never able to stream a driver to an ISO that would load.  Truthfully, I didn’t spend as much time on this as while working on it Dell PowerEdge support figured out a way for me to install via the Intel X520
  3. Make sure the IQN you configure is valid. Ideally use a standard vendors IQN prefix but you can make up your own if you follow the syntax
    iqn.YYYY-MM.com.vendor:initiatorName

    1. YYYY is year
    2. MM is month
    3. .com.vendor is whatever you want
    4. Initiator is used to identify the specific device, so usually servername works.

So, what actually worked for us.  An engineer at Dell PowerEdge on their deployment side did some internal testing and I worked with him to get the install running.  We are not entirely sure what made the difference but here is what worked for us:

  1. Use BIOS mode not UEFI (more about this below)
  2. Use the Intel X520, with only one connected
    1. I didn’t have to “Load Drivers” when doing the install. It may be included in Server 2012 R2
  3. When configuring the X520 iscsi IP do not set a gateway IP
  4. In the BIOS disable the onboard NICs (Broadcom 5720 in our case).  I asked why he did this and he said he didn’t really know.
  5. We also booted from the ISO via iDRAC virtual media

We’d tried BIOS and UEFI mode and also tried booting the ISO from iDRAC.  But we aren’t sure what made the difference in the above.  We got Windows Installed and moved on.

BIOS vs UEFI

Again, as far as I can tell both are supported.  After reading this PDF from Dell about “Enabling iSCSI boot under UEFI boot mode on Dell 13th gen servers” I really wanted to use UEFI since Dell is sort of taking ownership of the process and abstracting the particulars of specific NICs/HBAs.  But once we got things installed via BIOS mode I didn’t get a chance to try UEFI.

Post Install

I thought (hoped) our troubles were past once we got Windows installed, but there were still a few lessons to be learned.

Configuring Multipathing

Because of the limitations of the Intel X520 only supporting a single target per port I moved the booting to the QLE8442. This seemed to work well and failing over to the second target did work, though occasionally I would get a “boot device not found error.”

What ended up working for me:

  • Use different IPs for the Window iSCSI NICs then you used in the BIOS. I tried using the same IP as suggested in some how-tos, but I ended up with IP address conflicts.
  • Make sure you add the Compellent MPIO settings per their documentation before you configure multiple paths
  • Do not use the QLogic Control Center to configure additional IPs for the HBAs.  The iSCSI Boot Table (IBT) will automatically pass up the configuration of what ever NIC was used to actively boot. Windows iSCSI Initiator will handle the other paths.
  • Configure all four paths in the Windows iSCSI Initiator. You’ll end up with 5 paths: one for the IBT connection and 4 from within windows, but if your server is forced to boot from an alternative path you’ll still want to have all the paths built.

I tested disconnecting NICs and it seemed to cause about 45-60 seconds of disruption. I played with the MPIO settings some to try to reduce this, but the results were inconsistent.

Windows Updates

My confidence a bit higher, I tried to install Windows updates.  Fortunately, I had the forethought to take a Replay (snapshot) of the boot volume on the SAN in advance.

After a round or two of updates and reboots I was greeted by a black screen with a blinking cursor.  This is usually a sign of MBR corruption. Unfortunately because I’m booting from iSCSI, and had moved to the QLE card with the funky driver, I was unable to use the installer ISO to repair the MBR.

I ended up having to revert to the known good replay and iterating through updates to identify those that seemed to have issues in the Boot from iSCSI world.  The difficulty of this was compounded by the fact that the failed boot only occurred on a second reboot after applying an update. So I ended up with a cadence of:

  1. Take a replay
  2. Apply update and reboot
  3. Reboot again
  4. Assuming it booted, take another replay and repeat the process

This worked and at least on the two systems I was working with it appears the following Microsoft Updates were the culprits:

  • KB2962409
  • KB2965065
  • KB2975719
  • KB3000850
  • KB31115224

Summary

After over 16 years in the IT infrastructure world this has to be one of the most frustrating challenges I faced.  As I stated at the beginning I do not recommend Windows Server boot from iSCSI at this time. It feels like such a corner case that no one (Microsoft, Dell, Intel, Qlogic/Broadcom) has spent the time to work out the details.  I’m sure others have made this work with less issues than we faced for whatever reason, but I consider the ability to get good support as important as any technical requirement and it just isn’t there.

 

 

 

Powershell Script to Locate Windows Services Running as Domain users

Ever had to change a domain administrator’s password and had the sinking feeling that some bozo had setup a Windows service to run as that user.   If you only have a couple servers it isn’t that big a deal to check each manually, but if you have a lot it can be a problem.  I’ve seen a lot of admins just use the scream test to figure out what broke, but sometimes it isn’t obvious until the server is rebooted.  We run into this situation frequently as we take over new clients.

Recently we had to make a change for a customer with 50+ Windows servers and I knew the account had been used for services. I just didn’t know where.  So I built the below powershell script.  I definitely owe a few people props as I used a number of different websites to figure out the WMI piece. Unfortunately, it has been too long since I remember who.  But the next best thing is to put this script out there for other to use.  So I have posted the script and a readme file to GitHub (a new experience for me, but way better than how I published my scripts previously).

https://github.com/Deadtired78/Find-DomainUserServices.ps1/

Hope this helps out.  This is provided “as-is” and while I’ve used in several environments I can’t guarantee it will work everywhere.  Feel free to leave respectful comments.

Remote Desktop Services “Drain mode” PowerShell script

If you’ve ever had to put a large number of 2008+ Windows Terminal/Remote Desktop servers in “drain mode” using the gui admin tool, you know it can be slow and tedious.  Faced with doing this on a Saturday night for about 30 servers I decided to make life a little quicker and easier by building a powershell script.   While my long term goal is to figure out how to drain an entire farm, for now I’m pretty satisfied to be able to do it from PowerShell pretty quickly.

There are two scripts.  Drain-RDserver.ps1 and Undrain-RDserver.ps1. Both require a single parameter “-RDserver” followed by the server name.  Drain changes the “User logon mode” to “Allow reconnections, but prevent new logons.” This could be changed to the “Allow reconnections, but prevent new logons until the server is restarted” by modifying the script.   Undrain puts the server back in “Allow all connections” mode.

Big thanks to SourceDaddy’s article which got me started on the right path. (http://sourcedaddy.com/windows-7/preparing-server-maintenance.html)

Here is the code:


###Drain-RDServer
# Input computer name
param (
[string]$RDServer = $(throw "-RDserver is required")
)
$RDSH = Get-WmiObject -Class "Win32_TerminalServiceSetting" -Namespace "root\CIMV2\terminalservices" -ComputerName $RDServerdra -Authentication PacketPrivacy -Impersonation Impersonate
$RDSH.SessionBrokerDrainMode=1
$RDSH.put() > $null
Write-Host "$RDServer is set to:"
switch ($RDSH.SessionBrokerDrainMode)
{
0 {"Allow all connections."}
1 {"Allow incoming reconnections but prohibit new connections."}
2 {"Allow incoming reconnections but until reboot prohibit new connections."}
default {"The user logon state cannot be determined."}
}


###Undrain-RDServer
# Input computer name
param (
[string]$RDServer = $(throw "-RDserver is required")
)
$RDSH = Get-WmiObject -Class "Win32_TerminalServiceSetting" -Namespace "root\CIMV2\terminalservices" -ComputerName $RDServer -Authentication PacketPrivacy -Impersonation Impersonate
$RDSH.SessionBrokerDrainMode=0
$RDSH.put() > $null
Write-Host "$RDServer is set to:"
switch ($RDSH.SessionBrokerDrainMode)
{
0 {"Allow all connections."}
1 {"Allow incoming reconnections but prohibit new connections."}
2 {"Allow incoming reconnections but until reboot prohibit new connections."}
default {"The user logon state cannot be determined."}
}

And here is a link to the scripts in a zip file.

Setting Up Dell DR4000 OST Storage with Backup Exec 2012

After beating my head against the wall for a while trying to figure out the settings for connecting Symantec Backup Exec 2012 to an OST container on a Dell DR4000 I finally figured it out.

The instructions seemed simple enough but they left off a few key details. I’m going to assume some familiarity with both products so if I gloss over anything here assume it is in the manual somewhere.

The process is:

  1. Upgrade your DR4000 firmware to a version that supports OST.
  2. Stop Backup Exec services and install the Dell OST Plug-in
  3. Restart Backup Exec (or even better reboot the server). You might want to install the latest LiveUpdates as well but keep in mind this may mean updating Remote Agents.
  4. Create an OST container in the DR4000
    1. Login to the DR4000 management portal
    2. Under Storage->Containers  click the Create link.
    3. Give the container a Name and change the Connection Type to OST.
    4. You will then have the ability to set a capacity. Click the Create a New Container.
  5. Add the OST Storage in Backup Exec
    1. Open the BE 2012 management console and navigate to the Storage tab.
    2. Click Configure Storage.
    3. Choose Network Storage and click Next.
    4. Choose OpenStorage and click Next.
    5. Give the OST device a name and description. The name does not need to match the OST Container name.
    6. Select the DELL provider and click Next.
    7. Provide connection information:
      1. Server name should be the resolvable DNS name or IP address of the DR4000 appliance.
      2. Logon account (this is the thing both BE and Dell’s documentation missed) click Add/Edit
        1. Create a new Logon Credential with the username ‘backup_user’ and the password of ‘St0r@ge!’ without quotes.
        2. Save this logon information.
      3. Select the new logon  and click Next.
    8. You should see the name of the OST Container you created on the DR4000.  Select it and click Next.
    9. Set the number of concurrent operations.  Absent any recommendations from Dell or Symantec, I’m going to try 8 and will update if that doesn’t work. Click Next.
    10. You will be prompted to restart BE services.  This will kill any running jobs.  For me I was prompted repeatedly, so I just restarted the whole server.

PowerCLI Script to automatically setup vCenter Alarm Email Notification.

After manually setting up email Alarm notifications for numerous vSphere deployments I decided it was time to harness PowerCLI and automate the process.

I found numerous examples of people who had already done this before me, so I figured it would be pretty easy.  It looks like the guys at VMPros.nl built a script that several others adapted. VirtuallyMikeBrown had created a script for vSphere 4.1, but it wasn’t updated for vSphere 5 and only had a single email address.  Justin at Justin’s IT Blog had created a script that allowed three email addresses, but it couldn’t handle 4 or 2 or just 1 without editing the script or using dummy addresses.   And both of these scripts used a variable for each vCenter Alarm which meant if you added or removed an alarm you had to make multiple edits throughout the script.

So I decided I wanted my script to do the following:

  • Use arrays for both the alarms and email addresses to make it easier to add and remove entries.
  • Use a foreach loop to work through the alarms.
  • Have different classification or priorities for alarms so that some would notify repeatedly and with different frequencies. For example. You might not care if a VM is maxing out it’s CPU, but you definitely want to know if a host is.

These additional requirements took “easy” and turned it into a frustrating half-day, but I feel pretty good about the result.

Some notes about the script:

  • The $user, $pass and $vCenterServer variables need to be edited to be correct for your environment.
  • This currently sets up email alerts for almost every single default alarm in vCenter (based on 5.0).  This is undoubtedly overkill and you’ll likely want to remove some from the lists.
  • In the current setup this script takes a while to run through all the alarms.  It took about 10 minutes for me.  But that’s still a lot faster than doing it manually.
  • If you add or remove an alarm from the list make sure the following syntax is maintained:
    • Alarm names should be in double-quotes (“)
    • Alarms should be separated by commas (,)
    • Back-ticks (`) are used to continue a breakup the single line of alarm names into multiple lines for readability.   All alarms should have a back-tick at the end of the line except the last in the array.
  • If you want to add or remove email addresses make sure they too are in double-quotes, separated by commas with no spaces.
  • If you want to change the frequency of the repeating alerts just change the second number in the  “-ActionRepeatMinutes (60 * 24)” section.

This script is provided with no guarantees or warranties.  Use at your own risk.  Comments and feedback are welcome.

*********Updated 10/29/2012*******************

VMware decided to rename some alarms in 5.1 so I’ve now have two versions of the script.

vSphere 5.0 Script

vSphere 5.1 Script

Update 3/18/2015

Today I learned two things to be aware of:

  1. If the password contains special characters (see below) then those characters need to be escaped by prefixing the back tick or grave accent character (`).  So the password P@$$word would need to be entered as “P@`$`$word”
    1. Special characters in PowerShell are $ (  ) * + . [  ] ? \ / ^ { } |
  2. Since the vCenter Server Appliance uses Sendmail the delimiter between multiple email addresses has been changed from a semicolon to a comma. Because the PowerCLI command is the one formatting the entry with the semi-colon I’m not sure yet how to fix this.  (Thanks to AdminAfterWork for pointing this out)

Office 365 PowerShell Script to Set PasswordNeverExpires for All Members of Group

I had a need to set all members of a group so that their Office 365 (aka Microsoft Online) passwords never expire.  Didn’t take too long but I though it was worth sharing.  I also added output to show the setting was changed.  This could certainly be prettier but it is what it is.

Pre-Reqs

  • This requires you to have the Office 365 Powershell cmdlets installed, which also required the Online Services Sign-in Agent to work. See this article for instructions.
  • You need admin credentials to your Office 365 account.
  • The script references the ObjectID of the Office 365 Group whose members you wish to change.  To get this you need to connect to Office 365 and use the Get-MsolGroup command. Below is a code snippet showing how.
import-module MSOnline
Connect-MsolService
Get-MsolGroup

Output will look something like below.

Script

The below would need to be saved as a .ps1 file.  The Object ID (shown in red #’s below) would need to be changed to match that of the desired group

import-module MSOnline
Connect-MsolService
### Get All the Members of the Group
$agents=Get-MsolGroupMember -GroupObjectId  ########-####-####-####-############
### Set PasswordNeverExpires to true for all members of the group.
Foreach ($agent in $agents ) {
Set-MsolUser -ObjectID $agent.ObjectID -PasswordNeverExpires $true
$postChangeAgent = Get-MsolUser -ObjectID $agent.ObjectID
Write-Host “User: ” $postChangeAgent.UserPrincipalName “PasswordNeverExpires:” $postChangeAgent.PasswordNeverExpires
}

Note: The line beginning with “Write Host” wraps. The end of that line in your script is the $postChangeAgent.PasswordNeverExpires

Cudos and References

Thanks to JoshT_MSFT @ the Office365 Technical Blog for the following article which pointed me in the correct direction.

http://community.office365.com/en-us/b/office_365_technical_blog/archive/2011/11/01/how-to-disable-password-policy-settings-in-bpos-and-office-365-with-powershell-grid-user-post.aspx

Exchange 2010 Missing Server Configuration in EMC

Just worked on a test(luckily) Exchange 2010 server with a customer.  When they opened the Exchange Management Console, the Server Configuration was missing and they couldn’t change the properties of any the mailboxes. When they opened the mailbox properties they saw these little lock symbols all over the place.

When they ran the command “Get-Mailbox” in the Exchange Management Shell, they only saw a single mailbox.

So we tried all sorts of things. Then they mentioned the installed Outlook on the server and set it up to access a mailbox. Just happened to be the   I tried deleting the mail profile, uninstalling Outlook, logging off and back in, no dice.  Then I found out that Windows caches credentials and you have to clear those out using the below procedure:

  1. Open a command prompt using “Run as Administrator”
  2. Run the command “control keymgr.dll”
  3. Click “Back up vault” and follow the prompts to back everything up.
  4. Find and remove all credentials that have to do with Exchange or the user setup for Outlook.

After that everything returned to normal.

So if you want Outlook on an Exchange server use OWA.

Set Owner with PowerShell: “The security identifier is not allowed to be the owner of this object”

I’ve written several PowerShell scripts to help customers adjust permissions to their directory structures when migrating from other file servers(Linux/Samba, Novell OES/Netware, etc).  Part of these scripts includes assigning ownership for the user.  While this tends to take a long time quotas and file reporting are worthless if the administrator that copied everything is assigned as the owner.

Recently I tried to adapt one of these scripts for a customer, but when I ran it it failed with the error: “The security identifier is not allowed to be the owner of this object”

A quick internet search found lots of people saying basically this can’t be done with PowerShell.  What!!! I know for a fact these scripts had worked before.  What’s the deal?   After a lot of testing and beating my head against the wall I figure out I was trying to do something different.  Previously I had run my scripts against the UNC path (eg. \\servername\share\directory), but this time I was trying to run it on a local directory using the drive path (E:\Share\Directory).

Could it be that simple? Yes.  I ran the command again using the UNC path and the script worked as it did before.

Here is an example script to set the owner of a directory or file to test the above:

function pathPrompt {

$tempPath = $null
$tempPath = Read-Host ‘Please enter the path of thedirectory (e.g. “\\file\vol1\users\example”‘
return $tempPath
}

$username=”exampleuser”
$domain=”domain”
$ID = new-object System.Security.Principal.NTAccount($domain, $username)

$path = pathPrompt

write-host $path

$acl = get-acl $path
$acl.SetOwner($ID)
set-acl -path $path -aclObject $acl

Save the above to a file with a .PS1 extension, change the $username and $domain variables, and run it (make sure you set-executionpolicy to unrestricted and PowerShell as administrator). It will prompt for a path. It will then write the path to powershell and then set the owner if it can.

Below is an example of running it against a local path and a UNC path.

Disabling RDP Network Level Authentication (NLA) remotely via the registry

So I logged into a server that was setup by another administrator using RDP to configure some software.  For whatever reason it is requesting a reboot, so I let it reboot before I start my work.  After the server comes back up I attempt to connect and get a “The connection cannot continue because the identity of the remote computer cannot be verified” error.

From experience I knew this means that Network Level Authentication (NLA) is enabled.  NLA is a nice security feature if you have an internal Certificate Authority and time to configure auto-enrollment, but most smaller organization opt for the “less secure” option.  Since I have no console level access I’d have to wait for an onsite technician to change it to allow for “less secure” connectivity.

But I can remote into another server on the same local network and connect to the registry.  A quick google search failed to identify the key/value to change so I did some digging and testing and found it.

To disable NLA remotely:

  1.  Open regedit on another computer on the same network.
  2. Under the File menu click “Connect Network Registry…”
  3. Enter your computer name and click Ok.  If this fails to connect you may be out of luck.
  4. Scroll down in the left pane to find the newly added server. Navigate to this Key:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp
  5. Find the value “SecurityLayer” and change the data to 0  (that is a zero).
  6. Voila, I was able to remote in without issue.

vSphere Alarm emails to multiple addresses

This will be a quick post.  I was trying to find/remember the syntax to add multiple email addresses to a single “Send a notification email” action and I couldn’t find it documented anywhere. So I tested and confirmed that all that is required is using a “;” between the email addresses as shown below.

Update 3/18/2015:

I’ve just learned that because the vCenter Server Appliance uses Sendmail the delimiter for multiple addresses needs to be a comma instead of a semi-colon.

http://adminafterwork.com/2013/11/14/linux-based-vcenter-server-appliance-vcsa-bug-vcenter-email-notifications-sent-using-multiple-email-addresses-action-sending-notification-emails/

Follow

Get every new post delivered to your Inbox.

Join 25 other followers