Saturday, December 22, 2012

Empty DropDownList.SelectedValue property when list is populated via javascript

In ASP.Net if you use DropDownList control on the page and populate it via javascript (using ClientID property of the list) you may face with the following problem: during the postback SelectedValue property will be empty. However if you will set it second time and make postback again, it will be set properly. It happens because when you first time fill the list on the client side, these values are not stored in the ViewState, so when list is populated with values from it, selected value becomes empty. Consider the following example:

   1: <%@ Page Language="C#" %>
   2:  
   3: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   4:  
   5: <html xmlns="http://www.w3.org/1999/xhtml">
   6: <head runat="server">
   7:     <title></title>
   8:     <script src="http://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.8.3.min.js" type="text/javascript"></script>
   9:  
  10:     <script type="text/javascript">
  11:         function populate() {
  12:             $("#<%= this.ddl.ClientID%>").append("<option value='1'>One</option>");
  13:             $("#<%= this.ddl.ClientID%>").append("<option value='2'>Two</option>");
  14:             $("#<%= this.ddl.ClientID%>").append("<option value='3'>Three</option>");
  15:         }
16: </script>
  17: </head>
  18: <body>
  19:     <form id="form1" runat="server">
  20:     <div>
  21:         <asp:DropDownList runat="server" ID="ddl" EnableViewState="False"/>
  22:         <br/>
  23:         <br/>
  24:         <button onclick="populate();return false;">Populate</button>
  25:         <br/>
  26:         <br/>
  27:         <asp:Button runat="server" ID="btn" Text="Postback"/>
  28:     </div>
  29:     </form>
  30: </body>
  31: </html>

If we will click Postback button, DropDownList.SelectedValue will be empty. Many forum threads mention that in order to solve it you need to add hidden field and synchronize your list with this field (save selected value when it is changed). However there is simpler way: instead of SelectedValue, use Request.Form[ddl.UniqueID]. It will contain correct selected value regardless of how it was populated.

Friday, December 21, 2012

Remove Sharepoint metadata from MS Office documents

In Sharepoint you may store files in the document libraries. Among with files themselves, it is possible to add additional metadata to each file. It is on of the ways to categorize content. In new Sharepoint 2013 platform it is even more important with their attraction to search-based solutions. Metadata values are stored differently for different files:

  • for Office documents metadata is stored in the file itself. It includes new open xml format (docx, xlsx, etc), and old formats (doc, xls, etc);
  • for other documents metadata is stored in the content database (there are also several mentions in the network that you may change this behavior by installing some extensions to the Sharepoint, but I didn’t find such extensions, if you know them, please share in comments).

So for example when you copy Word document (docx) from one document library to another (document libraries may be located in different web applications on different farms), metadata will be preserved. But if you will copy e.g. pdf document, all metadata will be lost. In this article I will show how to clear office files from the metadata. It can be useful when you reorganized content structure and want to start with clear version, without inheriting the garbage of old metadata (which even can be deleted in new version if we talk about managed metadata).

First of all we need to understand how metadata is stored in the office documents. I recommend the following article: Document Information Panel and Document Properties in SharePoint Server 2010. It says that metadata is stored inside “customXml section of the Open XML formats”:

image

However theory doesn’t provide all necessary information. In order to be able to remove metadata we need to understand it deeper. So for testing I created docx file with some test content, uploaded it to the document library with custom content type with several managed metadata fields and specified some values in these fields. After that I opened the doclib in the explorer view and copied document back to the file system. After that I changed extension to zip and unpacked the content of the file. In the files inside the package I found that managed metadata is stored in 2 places actually:

  • item3.xml file inside customXml subfolder;
  • custom.xml file inside docProps subfolder.

Metadata is stored differently inside these files. In the item3.xml it is stored like this:

   1: <?xml version="1.0" encoding="utf-8"?>
   2: <p:properties xmlns:p="http://schemas.microsoft.com/office/2006/metadata/properties"
   3: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   4: xmlns:pc="http://schemas.microsoft.com/office/infopath/2007/PartnerControls">
   5:   <documentManagement>
   6:     <AuthorLogin xmlns="..." xsi:nil="true"/>
   7:     <DocLanguage_Hidden xmlns="...">
   8:       <Terms xmlns="...">
   9:         <TermInfo xmlns="...">
  10:           <TermName xmlns="...">English</TermName>
  11:           <TermId xmlns="...">42f6e37f-06b6-4881-946d-fc945753adfa</TermId>
  12:         </TermInfo>
  13:       </Terms>
  14:     </DocLanguage_Hidden>
  15:     ...
  16:   </documentManagement>
  17: </p:properties>

For clarity I removed “http://schemas.microsoft.com/office/infopath/2007/PartnerControls” namespace from the some tags. This example shows that Language field contains English value. Also termId is stored within the value.

In custom.xml data is stored by the following way:

   1: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
   2: <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
   3: xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
   4:   <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="2" name="ContentTypeId">
   5:     <vt:lpwstr>...</vt:lpwstr>
   6:   </property>
   7:   <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="9" name="DocLanguage">
   8:     <vt:lpwstr>7;#English|42f6e37f-06b6-4881-946d-fc945753adfa</vt:lpwstr>
   9:   </property>
  10:   ...
  11: </Properties>

Here we also see value and term id, but in different format.

This investigation tells us that we need to remove the metadata from 2 places somehow. But how to do that, i.e. how to remove metadata from the office file programmatically?

First of all we need to download the Open XML SDK. We need to reference the following assembly from this SDK: DocumentFormat.OpenXml.dll. Also we will need to reference standard WindowsBase.dll. The code which removes the metadata is below:

   1: using (var document = WordprocessingDocument.Open("test.docx", true))
   2: {
   3:     // delete from custom properties first
   4:     if (document.CustomFilePropertiesPart != null &&
   5: document.CustomFilePropertiesPart.Properties != null)
   6:     {
   7:         document.CustomFilePropertiesPart.Properties.RemoveAllChildren();
   8:         document.CustomFilePropertiesPart.Properties.Save();
   9:     }
  10:  
  11:     // then from custom xml part "properties"
  12:     if (document.MainDocumentPart != null &&
  13: document.MainDocumentPart.CustomXmlParts != null)
  14:     {
  15:         Func<CustomXmlPart, bool> predicate =
  16:             p =>
  17:                 {
  18:                     using (var reader = new StreamReader(p.GetStream()))
  19:                     {
  20:                         var root = XElement.Load(reader);
  21:                         return (root.Name.LocalName == "properties");
  22:                     }
  23:                 };
  24:  
  25:         var propertiesPart = document.MainDocumentPart.CustomXmlParts
  26:             .FirstOrDefault(p => predicate(p));
  27:         if (propertiesPart != null)
  28:         {
  29:             document.MainDocumentPart.DeletePart(propertiesPart);
  30:         }
  31:     }
  32: }

Here we remove the metadata from the custom.xml first (lines 4-9) and then from custom xml part of the document item3.xml (lines 12-31). Removing from custom xml part is a little bit more tricky because you need to read xml content from the stream in order to find the correct part (single office file may contain several such parts).

Run this program with the file which contains metadata and then copy the file into another document library, all metadata will be empty. Hope that it will help you in your work.

Saturday, December 15, 2012

Problem with broken managed metadata fields when use Backup-SPSite/Restore-SPSite across Sharepoint farms

Backup-SPSite and Restore-SPSite are useful standard PowerShell cmdlets which allow to copy site collection from one web application to another. These web applications may be located on different farm, or they can be in the same farm, but use different instances of Managed metadata service application. In this case you will face with the problem that all managed metadata fields will be broken: they will be grey and not editable.

The problem is caused by different Managed metadata service application instances, or by different term store ids to be more precise. If target web application uses different instance (which may have even the same name, if we are talking about different farms. On the same farm it is not possible to create 2 instances of managed metadata service application with the same names) all bindings became broken.

One of the possible solution which you may think is to use the same guids for termsets and terms on source and target. Unfortunately it won’t help with Restore-SPSite, fields anyway will be broken. Even if you will use approach similar to described in the following article: Migrate SharePoint 2010 Managed Metadata to another farm or Service Application, fields anyway became broken because term store ids will be different anyway.

There are several possible solutions:

  1. Share managed metadata service application across the farms. Here is the article which shows how to do that: How to publish a Managed Metadata Service for cross-farm consumption.
  2. Copy term store from target to source using Export-SPMetadataWebServicePartitionData and Import-SPMetadataWebServicePartitionData cmdlets. When use this way term store id is preserved. See e.g. this post for example: Managed Metadata, Taxonomy & More.
  3. Backup managed metadata database in Sql server and restore it on target farm. The create new managed metadata service application instance and use restored database. It can be done from UI (specify name of the existing database when you will create new managed metadata service application on the source farm) or PowerShell (using New-SPMetadataServiceApplication or Set-SPMetadataServiceApplication cmdlets). Then associate your web application with newly created managed metadata service application using service connections in Central administration.
  4. All previous ways assume that you need to modify your existing target environment. If we are talking about living production environment they can be riskous options. If you don’t want to change it, there is one more option: fix broken managed metadata fields from UI. Go to Site settings > Site columns or Site settings > Content types > Select content type and select managed metadata field which should be fixed. Then specify binding manually and click Ok (choose propagate changes to the list when you will do it). After it field will become working. This way can be automated by PowerShell script.

Hope that it will help you if you will face with this problem.

Fix “No content databases in the web application were available to store your site collection” error when user Restore-SPSite cmdlet

In this post I will describe one interesting side effect of SP1 for Sharepoint 2010. As you probably know with this service pack it became possible to restore deleted site collections. See e.g. this post for details: SharePoint 2010 SP1 - Site Recycle Bin. But there is also one interesting side effect related with Backup-SPSite/Restore-SPSite cmdlets, which allow to backup and restore site collections. Suppose that you want to test how these backup/restore commands will work in your case (I strongly recommend to test it in your particular case before to use in the production or even before to suggest it to customers. Event if it is document quite well, you may find interesting which will be unique for your case). You backed up site collection and restored it on the target web application. Everything went smoothly first time.You found some problems, tried to fix them, delete 1st restored site collection and try to restore site collection 2nd time (probably with “force” switch parameter) and this time you got the following error:

Restore-SPSite : The operation that you are attempting to perform cannot be completed successfully.  No content databases in the web application were available to store your site collection.  The existing content databases may have reached the maximum number of site collections, or be set to read-only, or be offline, or may already contain a copy of this site collection.  Create another content database for the Web application and then try the operation again.

It sounds very strange, because you are even not close to the limit of content database capacity. So what is the reason?

The reason is in internal processes which occur when you restore the site collection. When you deleted your site collection first time in Sharepoint 2010 with SP1 it is not really deleted. As I said above you still have possibility to restore it from recycle bin. And its id prevents you form further tries to restore the same site collection in the same web application. What to do then?

First of all check the list of deleted site collections using Get-SPDeletedSite cmdlet. It may give you something like this:

PS C:\temp> Get-SPDeletedSite


WebApplicationId   : 35513b1d-408f-4643-b5f2-b417aaa252f0
DatabaseId         : 2bd59a50-7a94-4869-88af-ae1988be18fc
SiteSubscriptionId : 00000000-0000-0000-0000-000000000000
SiteId             : fc5ff8c9-2012-4fd8-b801-9a0efb0e71cd
Path               : /test
DeletionTime       : 14.12.2012 12:49:18

It means that there is site collection with “/test” relative url, which you can still restore. It prevents you from restoring of your site collection 2nd time. We need to remove this site collection completely. It can be done by 2 steps. First step is to use Remove-SPDeletedSite cmdlet with guid which you got from Get-SPDeletedSite command above:

Remove-SPDeletedSite –Identity fc5ff8c9-2012-4fd8-b801-9a0efb0e71cd

After that Get-SPDeletedSite won't show this site collection, but Restore-SPSite will still give the same error. The second step: go to Central administration > Monitoring > Job definitions and run the “Gradual Site Delete” job for the web application where you are trying to restore the site collection. After that wait some time (depends on the size of your site collection) and try to restore site collection again. This time it should work without problems.

Saturday, December 8, 2012

Problem with robots.txt module for Orchard CMS and HTTP 404 Not found

Many modern CMS allow you to specify content of robots.txt manually, which then will be available for search engines by the regular URL: http://example.com/robots.txt. In Orchard you can use Robots module from online gallery for that. This module adds dynamic route for robots.txt file:

   1: public IEnumerable<RouteDescriptor> GetRoutes() {
   2:     return new[] {
   3:         new RouteDescriptor {   Priority = 5,
   4:                                 Route = new Route(
   5:                                     "robots.txt",
   6:                                     new RouteValueDictionary {
   7:                                         {"area", "SH.Robots"},
   8:                                         {"controller", "Robots"},
   9:                                         {"action", "Index"}
  10:                                     },
  11:                                     new RouteValueDictionary(),
  12:                                     new RouteValueDictionary {
  13:                                         {"area", "SH.Robots"}
  14:                                     },
  15:                                     new MvcRouteHandler())
  16:         },
  17:     };
  18: }

Content of robots.txt is stored in the CMS content database. You can change it from admin panel in Robots.txt section.

In some cases when you will install this module and enable robots.txt feature, you will get HTTP 404 Not found error when will try to access http://example.com/robots.txt. In Orchard by default all static files from the virtual folder will give 404 result. David Hayden described it in his post: Modifying Web.config to Serve Site.xml and Static Files with Orchard CMS.

However in our case robots.txt is not static file. It is dynamic ASP.Net MVC route. So why we can get 404 error? It can be caused if someone put real static robots.txt file to the virtual folder. In this case request will be handled by IIS without ASP.Net MVC (real files have priority over dynamic routes), and you will get 404 for robots.txt because as I said above in Orchard all static files give 404 by default. In order to fix the issue, remove static robots.txt from the route.

Several useful SEO rules via IIS URL rewrite module

In this post I would like to share with you several useful rules for IIS URL rewrite module, which can increase rating of your site for search engines. Before to add these rules to the web.config we need to install URL rewrite extension on the web server, otherwise site won’t work.

The first rule is redirect with 301 (Permanent redirect) http status code from the url without www prefix to the url with www. Very often sites are available by both urls, e.g. http://www.example.com and http://example.com. Search engines may treat these sites as separate (most of search engines may handle this case, but I would not suggest experimenting) and as content on them will be the same, search rank may be decreased. In order to fix this problem we need to setup permanent redirect from one address to another. It is important to use permanent redirect (301), because in this case search engine will know that url from which redirect was made is not used anymore and can be deleted from search index. Another way to perform redirect is to use 302 (Moved temporarily), but in this case page won’t be deleted from index. In theory it also can impact search rating, but I didn’t find exact evidences so if you know that please share it in comments.

In order to add the redirection rule to the site running on IIS we will use IIS URL rewrite module. You can add rules by several ways:

  • via UI – in IIS manager select site under the question and then select URL rewrite in the right panel (it will be added after installation of URL rewrite extension. Note that you need to restart IIS manager after installation. No need to make iisreset)
  • directly to the web.config using any text editor, e.g. notepad.

UI is just convenient interface for adding rules to the web.config. Result in both cases will be the same: rules will be added to the web.config of your site. In this post I will show how to add rules to web.config directly.

Rules are added to the <system.webServer> section under <rewrite>/<rules>. Redirection rule from no www to www will look like this:

   1: <rule name="redirect_from_nowwww_to_www" enabled="true" stopProcessing="true">
   2:   <match url=".*" />
   3:     <conditions>
   4:       <add input="{HTTP_HOST}" pattern="^example\.com$" />
   5:     </conditions>
   6:   <action type="Redirect" url="http://www.example.com/{R:0}"
   7: appendQueryString="true" redirectType="Permanent" />
   8: </rule>

Note that above I separated <action> to 2 lines (lines 6 and 7) in order to fit the article width, but in real example it should be single line. This rules tells to rewrite module that it should be applied to all urls (line 2) which have host “example.com” (line 4) and response should be redirected with 301 status code to “http://www.example.com” (lines 6-7). Here for matching urls we used regular expressions. {R:0} means back reference in term of regexp, which in this example has url part after the host. Plus we need to add possible query string in order to keep existing functionality working: see appendQueryString="true" attribute.

After that you can run fiddler and check that when you enter http://example.com in the browser, you will be redirected to http://www.example.com with 301 status. The rule will be also applied to all pages on the sites, including images and css.

Another useful rule is redirect from addresses without trailing slash to the addresses with slash: http://example.com/about –> http://example.com/about/. The reason here is the same: search engines may treat these url as different urls and if they will have the same content (in most cases they will), search rank can suffer. In order to add these redirection you can use the following url:

   1: <rule name="add_trailing_slash" stopProcessing="true">
   2:   <match url="(.*[^/])$" />
   3:   <conditions>
   4:     <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
   5:     <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
   6:   </conditions>
   7:   <action type="Redirect" redirectType="Permanent" url="{R:1}/" />
   8: </rule>

Here we used several conditions which mean that if physical file or folder are requested (matchType="IsFile" and matchType="IsDirectory"), the rule should not be applied (negate="true"). With this url you may have problems, e.g. if you use some CMS which allows to edit content of commonly used in SEO files robots.txt and sitemap.xml dynamically (i.e. if content is stored in the content database and routings are configured dynamically: http://example.com/robots.txt or http://example.com/sitemap.xml). In this case IIS won’t treat them as files and will add trailing slash: http://example.com/robots.txt/. Also it may cause problems on login view or administrative views if you use ASP.Net MVC (some actions won’t work). Solution is do disable this rule for these views. In order to avoid mentioned problems we need to add several additional conditions:

   1: <add input="{REQUEST_FILENAME}" matchType="Pattern" negate="true"
   2: pattern="robots\.txt" />
   3: <add input="{REQUEST_FILENAME}" matchType="Pattern" negate="true"
   4: pattern="sitemap\.xml" />
   5: <add input="{REQUEST_URI}" matchType="Pattern" negate="true" pattern="^/admin.*" />
   6: <add input="{REQUEST_URI}" matchType="Pattern" negate="true" pattern="^/users/.*" />
   7: <add input="{REQUEST_URI}" matchType="Pattern" negate="true"
   8: pattern="^/packaging/.*" />

This example contains conditions for administrative views for Orchard CMS. Here I excluded internal urls “/admin”, “/users”, “/packaging” from the rule (these urls are used in Orchard).

Note that also I used lowercase rule names with underscores, you may use more user friendly names for your rules. I used to this syntax :) That’s all what I wanted to write about in this article. Hope that it will be useful for you.