Recently we encountered with serious performance issues on one of production environment. I tried many solutions. First of all I tweaking ASP.Net configuration based on this tutorial. It helped for some time but after several weeks issues were back. After deeper investigation I noticed that on one of WFE there are many following errors in event log:
1: Application Server Administration job failed for service instance Microsoft.Office.Server.Search.Administration.SearchServiceInstance (0fb6066b-7fd5-4926-a28e-2348c0ae1256).
2:
3: Reason: Access to the path 'C:\WINDOWS\system32\drivers\etc\HOSTS' is denied.
4:
5: Techinal Support Details:
6: System.UnauthorizedAccessException: Access to the path 'C:\WINDOWS\system32\drivers\etc\HOSTS' is denied.
7: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
8: at System.IO.FileInfo.Delete()
9: at Microsoft.Search.Administration.Security.HOSTSFile.CleanupDedicatedGathering(Hashtable HOSTSFileMappings, StringBuilder HOSTSComments, IEnumerable obsoleteHosts, String dedicatedName, Boolean isDirty)
10: at Microsoft.Search.Administration.Security.HOSTSFile.ConfigureDedicatedGathering(SearchServiceInstance searchServiceInstance, SPServer dedicatedWebFrontEndServer, IList`1 previousWebApplicationHostNames)
11: at Microsoft.Office.Server.Search.Administration.SearchServiceInstance.SynchronizeDefaultContentSource(IDictionary applications)
12: at Microsoft.Office.Server.Search.Administration.SearchServiceInstance.Synchronize()
13: at Microsoft.Office.Server.Administration.ApplicationServerJob.ProvisionLocalSharedServiceInstances(Boolean isAdministrationServiceJob)
I checked 12/logs folder on this WFE and found that with regular Sharepoint logs files there were a lot of files with the following names: HOSTS.yyyy.MM.dd.hh.mm.ss.mm (where the last mm – is for milliseconds). There was ~ 60000 of such files (as I investigated for acceptable performance number of files in directory should not be more than 30000). When I tried to open that folder in Windows explorer it hang for several seconds. So I got the following idea: as these hosts files were located in the same folder as regular logs, Sharepoint may hang when it tries to write log records because of the same reasons which caused Windows explorer to open 12/logs folder very slow. I removed all hosts* files manually and started monitoring.
Performance issues seems disappeared but symptoms were not removed: someone still copied hosts* files into 12/logs every minute (1 file per minute). Keeping in mind the error from event log mentioned above the first suspect was Office Sharepoint Server Search (don’t mess with Windows Sharepoint Services Search). It was true, because when I disabled it, files were not copied anymore (I needed to disable it, i.e. not stop, because Sharepoint restart it every minute itself). But with disabled Search service search didn’t work at all. Users gets the following error when tried to click Search button in standard search box:
The search request was unable to connect to the Search Service
But why Search service copied hosts* files every minute. I notices that hosts file in C:\WINDOWS\system32\drivers\etc folder is readonly. In order to explain why it was made readonly I need to describe environment configuration. Production environment is a farm with several WFEs. Each WFE contains 2 network cards so it has 2 IP addresses (one per card). Our web application is extended to be used both with NTLM and FBA. And each extended web app uses its own IP address (via bindings configuration in IIS manager). It was made because of performance considerations.
Search service is running on one of WFEs (guess which server – yes exactly this WFE where I found mentioned errors in event log and which contains many hosts* files in 12/logs folder). And according to MS guide it was configured to use dedicated WFE for crawling: Configure a dedicated front-end Web server for crawling (again in order to increase performance by minimizing network traffic as crawler uses local server only in this case). But in the same article there is a note about potential problems with this approach:
Possible problems
In some cases, the timer service writes the incorrect IP address to your Hosts file. (For more information, see the blog post at http://go.microsoft.com/fwlink/?LinkId=135698.) This can cause problems ranging from inability to crawl content to inability to view sites, such as the Search Services Provider (SSP) or Central Administration site. The timer service can add an incorrect IP address to the Hosts file in cases such as the following:
- The server that you specified as your dedicated front-end Web server for crawling has multiple IP addresses assigned to one or more network cards.
- Your server farm is using network load balancing.
If either of these conditions is true, we recommend that you add the entries to the Hosts file directly instead of using the user interface to specify a dedicated front-end Web server for crawling.
As I wrote above each WFE contains several network cards. And that was the problem. Search service tried to modify hosts file with incorrect IP addresses. But making hosts file readonly had side effect: Search service copied it into 12/logs folder every minute. And that was the reason of performance issues when too many files accumulated in 12/logs.
The same link contains solution for this problem: see Configure a dedicated front-end Web server for crawling by editing the Hosts file. This was quite unobvious reason for performance impact. Hope this information will help you in your investigations.
No comments:
Post a Comment