Monday, March 10, 2014

Adventures in Virtualizing SBS 2008

So... with Microsoft killing off the full blown SBS line, many SMB clients and their consultants are trying to find the right solution going forward. Some are going the Server 2012 Essentials route and Hosted Exchange, some are looking for ways to safely extend the life of their existing SBS 2008 or 2011 installations. With virtualization one can move such an install from aging hardware to new more powerful hardware and extend the life of their in-house Exchange systems. This also leaves many options open when the SBS software reaches end of life. On to the story of one such adventure.

Client has a 4 plus year old SBS 2008 box that is randomly shutting down Exchange due to Drive space issues. The box is to old to replace the drives and doesn't have the ability to add more. So yours truly suggests that we virtualize it on a new piece of hardware with lots of drive space and memory. The client purchases a brand new Dell PowerEdge T620 with 32 gig of ram and tons of drive space. I build the server off-site, Install Server 2012 on the mirrored OS drives, Install the Hyper-V role and create my new Virtual machine using a VHDX file created with the latest version of  Disk2VHD. The old server only had 4 drives in a raid 5 array, so my VHDX file is for one disk with 2 partitions. This complicates the expansion of the OS partition drive space. The steps that I took after verifying that the Virtual machine would boot and run properly are as follows.
  1. Shutdown the virtual machine.
  2. Used Hyper-V manager to edit disk and expand it to 1 TB.
  3. Mounted the VHDX and its Original copy.
  4. Formatted the 2nd partition.
  5. Using Disk Manager extended the OS partition to 200Gig.
  6. Used Ghost to copy the second partition from the Original VHDX file.
  7. Detached the virtual disks.
  8. Powered up the virtual machine.
  9. Works like a charm.
So now we have a virtual clone that has twice the disk space of the original. I schedule a weekend migration with the client. After hours on a Friday, I setup the new server on a gigabit switch with the old one. I block port 25 on the firewall to stop mail low and turn off terminal services on their terminal server to block external access. I make a backup of the data volume to an external disk connected to the Hyper-V server. At this point the new virtual machine is running in directory services repair mode so I will be able to easily restore the volume backup of the data partition.
When the backup is finished I set the external disk to Offline so that I can attach it the virtual machine. I restore the volume and reboot the virtual machine. This is when the fun really starts.

The new Virtual machine is almost totally non-responsive. I can open task manager but as soon as I try "show processes for all users" it becomes non-responsive. If I log out and back in,  I can do the same thing again. I can't open services.msc or the event viewer, I switch Task Manager to services view and saw MSExchange SA and Transport starting and MSExchange Information Store and DNS stopping. Trying to stop or start either results in an Access Denied. Trying to shutdown the machine resulted in extended periods of Stopping Services. So after several hard shutdowns I have managed to corrupt the machine. Time to start the whole process over again.

Umteen hours later, with an entire new image, the same results. At this point I notify the client that the migration will not be done this weekend. I hit the sack with a serious headache.
The next morning I decide to take another look, thinking I have nothing to lose (but my sanity).
This time I try running Msconfig to set the boot mode to directory services repair mode. I log into the local machine and find the same services stuck starting and stopping, I then use msconfig to disable the Exchange services and reboot in normal mode. I finally able to read the event logs. One error catches my eye, Event Id 7023 The DNS Server service terminated with the following error: The network is not present or not started. At first I think that this is due to the machine being in safe mode, then I check the time stamp and find that it is while in normal, so I Google the error and find a few things having to do with updates. I check the network adapter settings and verify that they are correct. Grabbing at straws I decide to create a private virtual network to connect to verses not connected, to keep the machine isolated from it's Physical counterpart. I then try to start the DNS service and Viola it starts, I verify that it stays running and reset msconfig to load all services. I reboot and the darn thing comes up pretty as can be, everything running and clean error logs. At this point I stop the information store on both machines and perform a Forklift transfer, restart the store on the virtual machine and verify access using OWA on the local machine. I then shut down the physical box and switch the virtual network connection to external. Unblock port 25 on the firewall again and test mail flow. SUCCSESS.

The moral of the story is that for SBS to work properly it needs to be connected to a switch, whether the switch is connected to anything else or not. Weekends like this make me wonder why I do this for a living. I hope this helps someone else avoid wasting their whole weekend.

Tuesday, October 29, 2013

Rogue DHCP Server


I finally found a resolution to a rogue DHCP server on a customer’s network. Randomly starting after an SBS migration the customer would start having network problems. I quickly found that another DHCP server was giving out 192.168.2.0/24   IP addresses with itself as the Gateway. The proper network is a 192.168.1.0/24 with a SonicWall as the gateway. This of course caused the kind and generous Microsoft DHCP server to turn itself off and let the Rogue DHCP server proceed to making a big mess out of the network. Previously I had tried to find the device by using it's IP in a webpage or by telneting to it to get an Idea what it might me. The webpage it gave was a blank page with the words "It Works!" and that was it, it would not answer telnet. My previous efforts to find it involved unplugging computers and devices as close to individually as possible with their network wiring, restarting the SBS DHCP Server and seeing if it stayed running. I narrowed it down to one switch and then it would disappear and not start up again. This happened a couple of weeks in a row.  I gave the customer all of the things to check and he thought he found it about a week later because he disconnected a cable and has been working fine ever since.
 

                Till Yesterday, they started having more and more network problems. So today I left the SBS DHCP server alone after verifying that it was shut off, I pulled an IP from the Rogue device and again tried the webpage and telnet. I Got the same results. This time I decided to use Ping -t to continuously ping the device as I disconnected cables one at a time. I found one cable that when I disconnected it the ping would time out, reconnect and it would again reply. So I left it disconnected, within a minute or two the local know-it-all Mac Graphics guy says he can't connect to anything. So I hooked the cable back up and he was in business so we went to his workstation where they have a hub connecting 3 Macs to the network we unhook his cable and the pings timeout. I tell him he has a DHCP server running on his Mac, he claims up and down he doesn't have anything running. So I ask him how to check services running on a Mac, he doesn't know but finds out how. I go to a windows machine and Google DHCP on a Mac. I go back to him and he has something up showing the services that are running, one is BootP I tell him that is his DHCP server and ask him to check internet sharing. So it turns out mister know-it-all Mac operator had turned on Internet sharing which turns on a DHCP server on a Mac. Hope this helps someone if they are in a mixed environment and the Webpage for the Rogue DHCP Server has nothing on it but "It Works!"

 

Monday, December 3, 2012

WSUS 3.0 SP2 SBS 2008

WSUS 3.0 SP2 must be manually installed and can take a fair amount of time to install. After it finishes installing it brings up the WSUS configuration wizard. Just cancel it there and it will use the previous settings.
If you don't cancel it there you will break the Console integration by changing one setting or another from the SBS 2008 defaults. Here is a link to the default settings needed for the Console integration to work.
 







 







Wednesday, October 31, 2012

Random Windows Server Backup Failures

The following is a modified re post of  TechNet article. The original http://blogs.technet.com/b/asiasupp/archive/2011/08/01/windows-server-backup-failed-to-backup-with-error-0x81000101.aspx contained an error of one to many zeros in the Dword Value.
 
I found the article while I was trying to troubleshoot what seemed to be completely random Backup failures on an HP ML350 G5 SBS 2008 Server. The backups would fail and when I ran VSSAdmin List Writers, numerous writers were in a state of "Waiting on Completion". The only way I could get a good backup was to reboot the server, a few days later it would fail again. Further troubleshooting led me to discover through HP Insight Diagnostics that one of the drives in a Mirror was failed. "The Read Write HARD error rate is above threshold". I contacted HP for a drive replacement and made the registry edit. I advised the customer that they would need to reboot that night so that they would get a good backup. They forgot to, but the Backup was successful. I believe this was due to changing the Timeout  value. I believe this was the first time that I have seen a successful backup when all the VSS Writers were not in a stable error free state. Checking them the next day, they were all Stable and no errors.
 
Symptom:

Sometimes Windows Server Backup failed to backup the data. The error is:


The shared restore point operation failed with error (0x81000101) The creation of a shadow copy has timed out. Try this operation again.
In the event viewer, found the following error:


The backup operation that started at '‎**** has failed because the Volume Shadow Copy Service operation to create a shadow copy of the volumes being backed up failed with following error code '2155348001'. Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.
Cause:

Windows Server Backup is timing out during shadow copy creation since it is taking more than 10 minutes.

Resolution:


- Run regedit.exe and navigate to "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\SPP"

- Create a new Registry value of type DWORD with name "CreateTimeout"

- Change value to 1200000(2*10*60*1000 = 20 mins) in decimal


Monday, October 29, 2012

Server 2012 Hyper-V Failover


 I won't get into to much detail on the various types of  failover possible with Server 2012 Hyper-V. I just wanted to show how simple the failover of replicated machines is.

The first image shows the failover options on the machine with the running virtual machine, notice the "Planned Failover".
 On the replica server we have "Failover" and "Test Failover". Test Failover creates a copy of the replicated VM and mounts it to test the viability of the VM. "Failover should only be used when the source VM becomes unavailable.
 Running a "Planned Failover" first runs through a Prerequisite check to make sure all is well. A check mark chooses whether to start the replica after the failover.
 It is always nice to see green check marks assuring that things are going well.
The Source VM has been successfully failed over and the replica has been started. Now you can change the network settings to configure access to the failed over VM.

 
 
 
 
 
 
 
 
 
Notice how the options have changed on the replica Server. They now are what they were on the source Server. This allows you to perform a planned failover back to the original source Server.

Monday, October 22, 2012

Server 2012 Hyper-V Replication

     One of the most exciting new features of Server 2012 is Hyper-V Replication. This post will include a basic tutorial on getting that feature working. In my test setup I have a Server 2012 that was upgraded from Server 2008R2. This Server hosts a VM of my SBS2011 production server. The upgrade from Server 2008R2 vent very smoothly and it was nice to not have to start from scratch.
    The second Server 2012 was built from scratch and the only role added was Hyper-V. The method that I chose for authentication was Kerberos since both Hyper-V Hosts were member servers. After following the steps to enable replication for a particular VM everything I tried resulted in Authentication failure. Looking at the Hyper-V logs on both machines I found numerous event ID 14050 errors.
Failed to register the service principal name 'Microsoft Virtual System Migration Service'.
Failed to register the service principal name 'Microsoft Virtual Console Service'.
Failed to register the service principal name 'Hyper-V Replica Service'.
After searching for possible causes and solutions many of them listed in the following wiki http://social.technet.microsoft.com/wiki/contents/articles/1340.hyper-v-troubleshooting-event-id-14050-vmms.aspx
I also made sure that I had at least one network adapter not being used by Hyper-V and made sure that I had Static IPs for all adapters with the proper DNS entered in their configuration. Still no success, I decided to add DNS and Directory Services roles to the primary Hyper-V Server. I did this for a couple of reasons, one being the fact that it always boots before my primary domain controller secondly I wanted another DNS server to allow Internet browsing when I have the SBS2011 VM down for any reason.
    Viola, Not sure why but my 14050 errors were gone on both 2012 servers, replaced by 14052 events stating successful registration of SPNs as well as 29290 events about updating of Firewall Rules. So onto replication!! The following screenshots show the basic process.



 

Right click on the machine that you want to replicate and select Enable Replication.
 
Click on specify Replica Server, browse and chose Server.
 
 
 Specify Replica Server Port and Authentication Method.
Chose VHDs to replicate, I unselected  a couple due to disk size and the fact that they weren't crucial to this test. Note... you can only make the choice when enabling replication. You can't add them later without starting the process over.
Chose recovery Points.
 Chose how to transfer initial image. I  chose send initial copy over the Network, all 500 gig of it. When to start the replication is also set here.
 Summary and click finish, hopefully no errors.
 At long last No Errors!!
 The VM shows up in the replication Server.
 Replication Status, Initial replication took only about 2.5 hours for 500 Gig. I was impressed.
I will be posting more on other features such as Failover.

Tuesday, July 24, 2012

KB2596911 Breaks Sharepoint and Backup on SBS08

Another update that breaks SharePoint and Backup on SBS08 this time. A similar update related failure with the same results occurred some time ago on SBS2011. The fix is very similar just a different path On SBS08. The Path on 08 is C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\BIN, the path on 2011 is \14\bin. The command that needs to be run from that path at an administrative command prompt is

PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures

The result after running it should be

Successfully completed the SharePoint Products configuration.
Total number of configuration settings run: 6
Total number of successful configuration settings: 6
Total number of unsuccessful configuration settings: 0
Successfully stopped the configuration of SharePoint Products.
Configuration of the SharePoint Products has succeeded.

On the 08 Server that I had the problem on I also had to restart the SQL vss writer service.