Google
 
Main Page
 The gatekeeper of reality is
 quantified imagination.

Stay notified when site changes by adding your email address:

Your Email:

Bookmark and Share
The Ultimate Candle
Email Notification
Project Cluster
Purpose
The purpose of this documentation is to provide tips on working with Windows Server 2008 in a cluster environment with DFS (distributed file system) and NLB (network load balancing).

General Notes:
1. If you need to migrate applications used by web applications from Windows Server 2003 (and those were previously in the System32 folder) the same applications should be placed in the SystemWOW64 folder on Windows Server 2008.
2. If you need HTTP Redirection and support for legacy ASP you will have to specifically install those components. Once completed you may have to restart IIS for the server to recognize the components.
3. When you add an identity to permissions on a file or folder on a website you should not need to manually do it on each server in the cluster as long as you are using a non machine specific account; if you do add a machine specific account then replication will automatically attempt to replicate that resulting in broken SAM data on other servers in the cluster.
4. You may need to alter (and/or add a rule) to anti-virus software you have running on each server in the cluster when developers perform updates to a website from a network path. What I've found is that developer changes will make it onto one server but anti-virus software on other servers in the cluster will hold up the replication (resulting in delayed replication or even no replication to all the servers in the cluster).
5. Since you are in a cluster enviornment, you may have found out you can share the applicationHost.config file between the servers in the cluster (by having it in a separate location). While this expedites maintaining various IIS and website settings for all of the sites, the cluster can go down if that file is unreachable. The way to prevent this from happening is to check out this documentation. Another good resource for setting up a cluster with a shared configuration can be found with this documentation.
6. If your applicationHost.config happens to be on a file server and that file server goes down (but you have another server to replace it), there are a few things to consider:
6A. The servers in the cluster will use the offline files local cache (even if you add the new file server and update the Shared Configuration to point to the new file server).
6B. It takes some work in order to force the servers in the cluster to stop using the offline files cache for the file server that is no longer present and get them to use the new file server.
6C. If you need to get rid of the old offline file server and use the new file server for offline files, read this documentation.

Shared Configuration Information Update
There are some interesting caveats with setting up sharing of the applicationHost.config file not alluded to by the site link in #5 above. Please see the steps below if you are still having trouble with setting up offline files.
A. Make sure RDP/Terminal Services are off on the computer with the files (the location of the applicationHost.config) you want to share offline if it is Windows Server 2003.
B. Make sure the computer with the files you want to make available offline is NOT application server (such as IIS).
B1. On Windows 2003 go to My Computer -> Tools -> File Options -> Offline Files -> Check the box to enable offline files.
C. Under the share of the computer that holds the offline files right click and select the "Sharing" tab. Then select "caching" button and make sure "All files and programs that users open from the share will be automatically available offline" is selected with "Optimized for performance" checked.
D. Access the share from the webserver or server needing access to the offline files on the computer (or server depending on how you have things set up).
E. On the server that accesses the offline files you will need to Enable Offline File Sharing (On Server 2008: Control Panel -> Offline Files).
E1. Run this in the cmd: REG ADD "HKLM\System\CurrentControlSet\Services\CSC\Parameters" /v ReadOnlyCache /t REG_DWORD /d 1 /f
F. Restart webserver or server that is accessing offline files. You will be prompted to do this after E1.
G. on the server that accesses the offline files you need to browse to the share and subfolders in File Explorer so that those files are marked as available offline.

Creating a New Website:
In the cluster environment there are more things that you have to do in order to create a website and have that website recognized on the network as detailed here. In addition don't forget, depending on how your network runs, to update primary and secondary DNS as well as any NAT tables, firewalls and so forth for new IP addresses associated to the website. HTTP access usually requires port 80 to be open and HTTPS port 443.

(Enlarge)
  1. Access the IIS Manager.

(Enlarge)
  1. Create the new website application pool. If 32-bit based, be sure to select classic mode; if ASP, no managed code and the running identity of network service.

(Enlarge)
  1. Create the new website. Associate it to the application pool you just created for the website and specify the internal IP address (binding) allocated to the new website.

(Enlarge)
  1. Don't forget to specify the physical path to the website.

(Enlarge)
  1. You're not going to get too far with the website until you add its new internal IP address (for HTTP and a separate IP address for HTTPS - if SSL is used) into the NLB (network load balancing) manager.

(Enlarge)
  1. Select Cluster Properties.

(Enlarge)
  1. Select Cluster IP Addresses.
  2. Enter the IP Address(s) and subnet mask.

(Enlarge)
  1. When completed you should see the IP address added. Note: in most cases you won't have to repeat the process on each server in the cluster.


Writing to a Website:
In the cluster environment you probably have things set up so that developers can access websites on the cluster from a network path (as opposed to giving developers direct access to each webserver to build or maintain websites). If the developers or the group that they are not in is not added with appropriate permissions, the access denial will crop up.

(Enlarge)
  1. Access denied message seen when a developer or the group the developer belongs to has not been added and given sufficient permissions.

(Enlarge)
  1. Open share and access management.

(Enlarge)
  1. Select the share that the website (or websites) are contained within and select properties.

(Enlarge)
  1. Select the permissions tab and click on share permissions.
  2. Add the developer or group and choose the appropriate permissions needed for developers to do their work.


Servers in the Cluster and Network Connections:
Each server in the cluster needs to have two network connections associated to it. One of those is the normal network connection and the other is the network connection that is used to allow the servers to communicate with each other.

(Enlarge)
  1. Access network connections on each server in the cluster.

(Enlarge)
  1. The regular network connection is not using anything out of the ordinary.

(Enlarge)
  1. No sharing is needed by the network connection.

(Enlarge)
  1. The second connection, as you can see, is set up to handle network load balancing.

(Enlarge)
  1. No sharing is needed by network load balancing.


The Core of Cluster Replication - DFS:
In the cluster environment replication of changes made to a website are handled with DFS (remember, installing a Windows application on one server in a cluster is not replicated...you have to manually install on each server). The process is a daisy-chain because a change will not always occur on a specific server and you have to specify how replication is handled for each server (you don't, however, have to logon to each server to do it).

(Enlarge)
  1. Access the DFS manager.

(Enlarge)
  1. Namespace Servers tab - General.

(Enlarge)
  1. The replication group contains the servers in the cluster environment.

(Enlarge)
  1. Under connections you can see that when one server receives a change it is to be replicated to the other server in the cluster.

(Enlarge)
  1. Under Replicated Folders you can see that the physical folder (and contents of) are specified in order to have replicated.


"Random" Loss of SSL Bindings:
In the cluster environment with a shared configuration (even if caching of the shared configuration is setup properly) it appears that a user-defined SSL binding will get dis-associated for websites once and a while. Although I've not conclusively identified the exact cause of "randomly" losing the SSL binding, resulting in no webpages being accessible via https, when it is discovered that webpages cannot be accessed by https the steps below help with re-associating the SSL binding for a web site.

(Enlarge)
  1. Locate the server in the cluster that has dis-associated the SSL binding.
  2. Under "Edit Site" click on "Bindings".

(Enlarge)
  1. Select "https" and click on the button "Edit".
  2. The custom-selected SSL certificate will be shown as "active".

(Enlarge)
  1. Select the SSL certificate for the server, click "OK" and save.
  2. Wait for a couple of minutes then re-open the bindings page. Select "https" and click on the button "Edit".
  3. Select the custom SSL certificate, click "OK" and save.


Avoid Catastrophe with NLB:
Updating (such as from Windows Update) a server in an NLB cluster requires that you make sure the server is not active in the NLB cluster. Not doing this can result in the entire NLB cluster going offline (not reachable) when a Windows Update or other installation/change occurs to that server and end up failing. You would think that there would be an option in the NLB manager to prevent a host from automatically rejoining the cluster (when it restarts) after the restart has happened...so you could block the host from the NLB manager on any other host...but that is not the case.

If the server was still an active host in the NLB cluster when it restarted and a failure (such as a failed Windows Update) occurs it will take down the whole cluster and the NLB Manager will be unusable on all hosts. At this point manually access the host with the failure and perform the following steps:
  1. Physically logon to the failed host.
  2. Open the command-line as Administrator and enter: nlb.exe suspend
  3. (May not need to do if other hosts recover) Disconnect the failed host from the network.
  4. Open NLB Manager.
  5. Under "Network Load Balancing Clusters", select the host with the failure.
  6. Right-click on it and select "Host Properties".
  7. Under the tab "Host Parameters" locate the section "Initial host state".
  8. Change "Default state:" to "Suspended".
  9. Check the box "Retain suspended state after computer restarts".
  10. Click "OK".
  11. Check other hosts to make sure their NLB entries for the failed host are updated.
  12. Reconnect the failed host to the network (may not need to do if other hosts recovered). The failed host will not attempt to automatically rejoin the cluster on restarts, allowing you to troubleshoot what has gone wrong with a failed Windows Update or other installation/change.
If you know in advance of a Windows Update (servers in almost all cases should NEVER be allowed to automatically download and install Windows Updates - if you could not tell) or other installation/change that may cause the server to restart or may require it being restarted, pat yourself on the back and follow these steps:
  1. Open NLB Manager.
  2. Under "Network Load Balancing Clusters", select the host that will be taken offline.
  3. Perform a drainstop on the host. Wait for it to stop which could take an hour or more.
  4. Right-click on the host and select "Host Properties".
  5. Under the tab "Host Parameters" locate the section "Initial host state".
  6. Change "Default state:" to "Suspended".
  7. Check the box "Retain suspended state after computer restarts".
  8. Click "OK".
  9. Another Option: You should also be able to open the command-line and enter the following line to suspend a host (I don't think this is retained after restarts though):
    nlb.exe suspend 1.2.3.4:ServerName (1.2.3.4 = IP address of cluster, ServerName = name of the host to suspend)
    -OR-
    nlb.exe suspend theCluster:2 (theCluster = name of the cluster, 2 = host number of the host to suspend in the cluster)

When things are back into normal operation you'll most likely want to change the "Default state:" to Started and uncheck the box "Retain suspended state after computer restarts" (mainly useful when power outages occur and you need to have things come back online as soon as possible).

The Interface is misconfigured:
In the cluster environment "most" of the time when you add an IP address to the cluster's list of IP addresses (commonly for websites in the cluster), the whole process will go through just fine. However, once and a while one or more servers in the cluster will automatically crash and restart themselves when they reject an IP address being added (yes, even if the IP is used nowhere on the network and the only place you have it defined is on the DNS server) with the bland message "The interface is misconfigured". In this case, hopefully you already had all the other servers in the cluster setup so they retain their suspended state if they restart. Once this happens, all of the servers in the cluster (if they were using a shared configuration) will revert to using locally cached copies.

Not only does this mean that you will NOT be able to modify the shared configuration and have the changes take effect, you should also find it obscenely difficult to add an application pool or create a website because the status will be "The object identifier does not represent a valid object (exception from hresult: 0x800710D8)", even on the server you had initially tried to add the IP address to. The steps below show the IP problem as it occurred, as well as fixing the issue. Ironically, after all of this was done, the original IP address (which caused the wide-spread failure) was successfully added without incident.

(Enlarge)
  1. Under normal circumstances the IP address of a new website is added to the NLB cluster.

(Enlarge)
  1. However, for whatever reason, the remaining servers in the cluster rejected the request, rebooted themselves and the NLB cluster then said "The interface is misconfigured" for those servers.
  2. You may think you could go back to the original server that you had added the IP address through and remove that IP. Unfortunately, the action will only be successful on that server; the rest of the servers in the cluster will ignore the request.

(Enlarge)
  1. Then you may think you could get on each server in the NLB cluster and remove the IP address as shown.
  2. Unfortunately that would be too simple a solution, so it is not possible.

(Enlarge)
  1. Instead, you must basically destroy and re-create the entire NLB cluster (if you have many servers, be prepared to spend a lot of time doing it).
  2. You will need to remove each host (server) from the cluster on the server which did not originally reject the addition of the IP address.
  3. Be sure to write down how the server was setup in the cluster, port rules and so forth.

(Enlarge)
  1. Once all of the servers have been removed, begin re-adding them.
  2. The first step will be to name the server and specify its network interfaces.

(Enlarge)
  1. From the next pane you will be able to set the Priority (rank in the cluster) and its default state.

(Enlarge)
  1. Set up the port rules (hopefully you had it written down somewhere if you didn't use defaults).

(Enlarge)
  1. At this point the server (host) should be part of the cluster again.


The object identifier does not represent a valid object (exception from hresult: 0x800710D8):
In the cluster environment you may encounter that message when you create an application pool or a website for a variety of reasons - one such as the reason above (the trivial task of adding an IP address to the cluster). Under this environment, you actually have to go through each server in the NLB cluster and restart a variety of services. Once all restarted then you should be able to successfully create application pools and websites again.
NOTE
  1. If you find you can create an application pool successfully BUT when you create a website you get the object error, you may not need to restart IIS and other services -- see next bullet.
  2. If you have an SSL certificate installed on the web server you may want to try the following with the new website you just created:
    • In some cases, after adding a new website with a port 80 binding, Iíve found that the infamous "object invalid" message is displayed next to the website (on the pane which shows a listing of all of the websites). Oddly, if I add a port 443 binding (which means you should have some type of SSL certificate already installed on the server) to the new website, the "object invalid" message will go away and I donít have to restart a bunch of services for the new website to be accessible. At that point, removing the port 443 binding from the new website can be done and the new website would continue to operate fine.
  3. If the above does not work for the conditions specified then there may be no choice but to restart services as outlined below.

(Enlarge)
  1. Restart the IIS Admin Service on the server.

(Enlarge)
  1. Restart the Windows Process Activation Service (WAS) on the server.
  2. Move on to the next server in the cluster and repeat.


About Joe