Sunday, November 14, 2010

Retain object identity during export and import of subsites in Sharepoint into different site collection hierarchy location

As you know Sharepoint object model contains several classes which allow to perform export and import of sites. Most important of them are SPExport and SPImport. If you don’t familiar with content deployment API then before to read this article I highly recommend you to read series of articles of Stefan Goßner: Deep Dive into the SharePoint Content Deployment and Migration API.

Mentioned classes allow you to export content of single SPWeb as well as whole site collection and import it into another location (probably on another server). You can find list of another features of content deployment here. In this post I want to describe problem with retaining objects identities during import of subsites into different location rather they had in source site.

When site is exported we can choose from 2 alternatives for import:

  • import this site with retaining object identities;
  • import site without retaining object ids.

With first method all child objects of exported site like lists, doclibs, list items, etc. will have the same GUIDs and location after import as they had in source site. This method has many advantages as you don’t need to make extra work after import. E.g. you don’t have a headache with retargeting of content by query web parts (if you don’t familiar with this problem I recommend you to read this post of Gary Lapointe: Retarget Content Query Web Part). Also you won’t have problems with lookup fields retargeting, etc. But with advantages this method has limitation: you should import exported subsite into the same URL location on target site relative with site collection URL. I.e. if your site2 was located at the following URL http://example.com/sites/sitecollection/site1/site2, then it should be imported to the same location on target site collection related to site collection root site: http://example1.com/site1/site2, but it can be imported for example to http://example1.com/site2 (where http://example1.com is root site of site collection).

Second method can be used when you need to copy objects into another site collection hierarchy (exactly 2nd method is used in site manager copy operations and in stsadm import operation). If you will open stsadm utility in reflector and see SPImportOperation class you will see:

   1: public override void Run(StringDictionary keyValues)
   2: {
   3:     ...
   4:     SPImportSettings settings;
   5:     ...
   6:     settings.RetainObjectIdentity = false;
   7:     ...
   8: }
So it just sets RetainObjectIdentity property to false without any conditions. But in opposite to 1st method you will have all problems with retargeting which are avoided in 1st method.

It is not very good as we need to select a compromise when we use one of these approaches. But is there a way to use advantages of both methods, i.e. retain objects GUIDs and have possibility to import site into any site collection hierarchy? Well, there is such method, but I need to warn that approach described below is actually a workaround and it is not supported by MS. You should keep it in mind when you will make decision whether to use it or not.

Before to show exact workaround I would describe the reason of this issue. The root of the problem is in SPImport class which can’t properly import SPWebs which are childs of another SPWebs (i.e. not childs of site collection root site) with preserving objects identities. It occurs because during exporting of web site it becomes orphan as we don’t import its parent. One of they way to handle such situation is to re-parent exported web during import process using SPImport.Started event – see Deep Dive into the SharePoint Content Deployment and Migration API - Part 3. Unfortunately this method doesn’t work with RetainObjectIdentity = true. If you saw provided link you know that in order to identify all orphaned objects args.RootObjects collection is used (where args is of type SPDeploymentEventArgs and this is parameter of OnImportStarted delegate). Lets see how this collection is initialized in SPImport class via reflector:

   1: public override void Run()
   2: {
   3:     try
   4:     {
   5:         ...
   6:         SPImportObjectCollection rootObjects = this.DeserializeRootObjectMap();
   7:         this.OnStarted(new SPDeploymentEventArgs(SPDeploymentStatus.Started, base.ObjectsProcessed, base.ObjectsTotal, base.DataFileManager.DataFilePath, rootObjects));
   8:         ...
   9:     }
  10:     catch (Exception exception3)
  11:     {
  12:         ...
  13:     }
  14:     ...
  15: }

So RootObjects collection is result of DeserializeRootObjectMap() method. Lets see this method as well:

   1: private SPImportObjectCollection DeserializeRootObjectMap()
   2: {
   3:     SPImportObjectCollection objects = new SPImportObjectCollection();
   4:     FileInfo rootObjectMapFile = base.DataFileManager.RootObjectMapFile;
   5:     if (!rootObjectMapFile.Exists)
   6:     {
   7:         return objects;
   8:     }
   9:     if (this.Settings.RetainObjectIdentity)
  10:     {
  11:         return objects;
  12:     }
  13:     ...
  14: }

Quite impressive: if we use RetainObjectIdentity = true then empty collection is always returned! That's why we can’t re-parent orphaned objects using Started event of SPImport class.

I found the following way to fix it. When we exported site using SPExport class we have a bunch of xml files with objects definitions (assume that we didn’t use compressed export into cab files). And there is a lot of relative URL paths of exported site like “/site1/site2”. You need to replace all such strings with name of the target site. I.e. if you need to import site2 not under site1 on target site collection, but under root site of site collection, you need to replace all strings “/site1/site2” by “/site2” in all files which you get after export. Doing this we made a trick – we replaced real parent of site2 (/site1) with root site of site collection (/). After that you can use SPImport with RetainObjectIdentity = true and it will work like if you would export site2 from subsite of root site of site collection.

As I already mentioned this solution should be considered as workaround. There is no guarantee that it will work for all cases. However it helped me to solve real life issue on one of the projects – with it I was able to copy sites using SPExport and SPImport classes into any site collection hierarchy and avoid problems with retargeting of Sharepoint artifacts. I hope that it will be useful for you in real life Sharepoint development as well.

No comments:

Post a Comment