Sunday, April 12, 2015

Create search crawl rules for Sharepoint search service application via PowerShell

In one of my previous articles I showed how we may exclude system pages like AllItems.aspx from search results: Exclude AllItems.aspx from search results in Sharepoint 2013. In this post I will show how to create search crawl rules via PowerShell. It may be useful when you need to exclude a lot of contents from search crawling and doing it manually would mean a lot of work (e.g. when you restored large content database from production, but don’t need to crawl all sites). Here is the script:

   1:  
   2: # Ensure SharePoint PowerShell Snapin
   3: if ((Get-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue) -eq $null) 
   4: {
   5:     Add-PSSnapin "Microsoft.SharePoint.PowerShell"
   6: }
   7:  
   8: [xml]$xmlinput=(Get-Content "CrawlRules.xml")
   9:  
  10: foreach($WebApplication in $xmlinput.SelectNodes("Build/WebApplication"))
  11: {
  12:     foreach($SearchService in $WebApplication.SelectNodes("SearchService"))
  13:     {
  14:         #Get search service
  15:         $strServiceName=$SearchService.Name;
  16:         $spService=Get-SPEnterpriseSearchServiceApplication -Identity $strServiceName;
  17:         
  18:         #Clear rules if needed
  19:         $Rules=$SearchService.SelectNodes("Rules");
  20:         $strClearRules=$Rules.ItemOf(0).Clear;
  21:         if ($strClearRules -eq "True")
  22:         {
  23:             $spRules=Get-SPEnterpriseSearchCrawlRule -SearchApplication $spService;
  24:             foreach ($spRule in $spRules)
  25:             {
  26:                 if ($spRule -ne $null)
  27:                 {
  28:                     Write-Host "Deleting rule:" $spRule.Path -ForegroundColor Yellow
  29:                     $spRule.Delete();
  30:                 }
  31:             }
  32:         }
  33:  
  34:         #Add new rules
  35:         foreach($CrawlRule in $SearchService.SelectNodes("Rules/Rule"))
  36:         {
  37:             $FollowComplexUrls=$false;
  38:             if($CrawlRule.FollowComplexUrls -eq "True")
  39:             {
  40:                 $FollowComplexUrls=$true;
  41:             }
  42:             
  43:             if ($CrawlRule.Type -eq "ExclusionRule")
  44:             {
  45:                 #In exclusion FollowComplexUrls actually means "Exclude complex URLs"
  46:                 $FollowComplexUrls=!$FollowComplexUrls;
  47:                 New-SPEnterpriseSearchCrawlRule -Path $CrawlRule.URL -SearchApplication
  48: $spService -Type $CrawlRule.Type -FollowComplexUrls $FollowComplexUrls
  49:             }
  50:             else
  51:             {
  52:                 $CrawlAsHttp=$false;
  53:                 if($CrawlRule.CrawlAsHttp -eq "True")
  54:                 {
  55:                     $CrawlAsHttp=$true;
  56:                 }
  57:                 
  58:                 $SuppressIndexing=$false;
  59:                 if($CrawlRule.SuppressIndexing -eq "True")
  60:                 {
  61:                     $SuppressIndexing=$true;
  62:                 }
  63:                 
  64:                 New-SPEnterpriseSearchCrawlRule -Path $CrawlRule.URL -SearchApplication
  65: $spService -Type $CrawlRule.Type -FollowComplexUrls $FollowComplexUrls -CrawlAsHttp
  66: $CrawlAsHttp -SuppressIndexing $SuppressIndexing
  67:             }
  68:         }
  69:     }
  70: }

Rules are defined in CrawlRules.xml file which has the following structure:

   1:  
   2: <?xml version="1.0" encoding="utf-8"?>
   3: <Build>
   4:   <WebApplication>
   5:     <SearchService Name="Search Service Application">
   6:       <Rules Clear="True">
   7:         <Rule URL="*://*/_layouts/*" Type="ExclusionRule" FollowComplexUrls="False" />
   8:         <Rule URL="*://*/_catalogs/*" Type="ExclusionRule" />
   9:         <Rule URL="*://*/_vti_bin/*" Type="ExclusionRule" />
  10:         <Rule URL="*://*/forms/AllItems.aspx*" Type="ExclusionRule" />
  11:         <Rule URL="*://*/forms/DispForm.aspx*" Type="ExclusionRule" />
  12:         <Rule URL="*://*/forms/EditForm.aspx*" Type="ExclusionRule" />
  13:         <Rule URL="*://*/forms/NewForm.aspx*" Type="ExclusionRule" />
  14:       </Rules>
  15:     </SearchService>
  16:   </WebApplication>
  17: </Build>

As result it will create exclusion rules for layouts pages, also for pages from _catalogs and _bti_bin and for list forms AllItems.aspx, DispForm.aspx, EditForm.aspx and NewForm.aspx. You may generate this xml file programmatically if you have a lot of sites which should be excluded and then pass it to the script above. It will simplify administrative work, which is not needed to be done manually.

No comments:

Post a Comment