Friday, October 4, 2013

Problem with trimmed html content in search index in Sharepoint 2013

If you use rich html fields (fields with Type = “HTML”) in your content type for publishing pages, then you may face with the problem that after crawling, content of these fields is trimmed, i.e. is shown as plain text. For example, imagine that you have authoring web application and create publishing pages there with custom rich html fields. Then by using cross site publishing these pages should be shown on consumer web application. On the page which is used for displaying authoring content on consumer web application there are several Catalog Item Reuse web parts, and one of them shows content of custom rich html field, which may show html content as plain text.

As you probably know, Catalog Item Reuse web part retrieves content from search index. So there may be several reasons for the problem:

  • Content of rich html fields is trimmed during crawling;
  • Item reuse web part trims html content.

Let’s start with first one and assume that we have the following field declaration:

   1: <Field ID="..."
   2:     SourceID="http://schemas.microsoft.com/sharepoint/v3"
   3:     DisplayName="MyField"
   4:     Group="Custom fields"
   5:     Type="HTML"
   6:     RichText="TRUE"
   7:     RichTextMode="FullHtml"
   8:     Required="FALSE"
   9:     Sealed="TRUE"
  10:     StaticName="MyField"
  11:     Name="MyField" />

First of all try to set RichHtml=”FALSE” in field declaration. Side effect is that field content will be shown encoded in authoring site by OTB field controls, but html won't be trimmed in search index. There is quite simple workaround: you may create custom control for displaying field content on authoring site or decode html in standard control by javascript.

If it won't help ensure that SourceID attribute is added to field declaration like shown above. Without it search won’t work properly. In one of my previous posts I wrote about similar problem with managed metadata (see Problem with not crawled managed metadata fields in Sharepoint 2013). For rich html fields when SourceID attribute is specified Sharepoint creates 2 crawled properties for your field: ows_MyField and ows_r_HTML_MyField. For second one it creates also managed property, however you should create managed property, map it to the first one and use it in your queries (second may not work as experience showed).

In order to make Item reuse web part rendering html content properly you need to specify RenderFormat property for Catalog reuse web part:

   1: <property name="RenderFormat" type="string">&lt;Format Type=&quot;HTML&quot; /&gt;</property>

If you will put not encoded xml there you will get error:

Exception calling "ImportWebPart" with "2" argument(s):
"The file you imported is not valid. Verify that the file is a Web Part description file (*.webpart or *.dwp) and that it contains well-formed XML."

It was interesting to find why RenderFormat should be specified in the above form. It was not documented anywhere, so I checked the code of CatalogItemReuseWebPart.GetValueToRender method:

   1: private string GetValueToRender()
   2: {
   3:     ...
   4:     if (string.IsNullOrEmpty(this.RenderFormat))
   5:     {
   6:         return CatalogItemUtilities.GetRenderValueUsingPropertyType(propertyName,
   7:             propertyValueFromResult);
   8:     }
   9:     return CatalogItemUtilities.FormatDataForRendering(propertyValueFromResult,
  10:         SPHttpUtility.HtmlDecode(this.RenderFormat));
  11: }

As you can see, if RenderFormat property is not empty, it tries to decode it first and then pass to CatalogItemUtilities.FormatDataForRendering() method. Then in FormatDataForRendering it does the following:

   1:  
   2: internal static string FormatDataForRendering(string data, string renderFormat)
   3: {
   4:     ...
   5:     XDocument document = XDocument.Parse(renderFormat,
   6:         LoadOptions.PreserveWhitespace);
   7:     XAttribute attribute = document.Root.Attribute("Type");
   8:     ...
   9: }

I.e. it loads RenderFormat property to XDocument and tries to get Type attribute of the root element. So property should be encoded and it should be correct xml. It was enough in order to guess the correct format for the property. Code is the best documentation :).

No comments:

Post a Comment