Friday, October 14, 2011

Support of Scandinavian characters in Sharepoint URLs

Often on the forms or on the work people ask me about whether or not Scandinavian characters (‘ä, ö, å) are supported in Sharepoint URLs. In order to have reference for the answer I decided to write it in this post.

Mentioned characters belong to so-called high-ASCII characters set (http://www.lorem.biz/htmlcodes.asp - characters with codes from 128 to 256). According to the technet article Sharepoint supports them, but URL should be encoded when you access resources with such symbols:

“If you have non-standard ASCII characters, such as high-ASCII or double-byte Unicode characters, in the SharePoint URL, each of those characters is URL-encoded into two or more ASCII characters when they are passed to the Web browser. Thus, a URL with many high-ASCII characters or double-byte Unicode characters can become longer than the original un-encoded URL.”

Also it is important to note that article says that “SharePoint Foundation 2010 adheres to the standards for URL encoding that are defined in The Internet Engineering Task Force (IETF) RFC 3986”.

Another useful link regarding Sharepoint URLs is Information about the characters that you cannot use in sites, folders, and files in SharePoint.

But of course as always in Sharepoint even if documentation says that it should work – it is not 100% guarantee that it will work in all cases. E.g. somewhere in internals of Sharepoint 2007 I found comment made by MS developer that in this place URLs should always contain only Latin symbols (it was in some of the doclib schema. I will update this post when will remember what exactly place it was).

My recommendation is that if possible it is better to avoid using of localized URLs with non-Latin symbols. Although it might have good impact for SEO (e.g. your pages will be shown in the search result made on local languages), you may face with not very pleasant issues during development.

6 comments:

  1. Afaik, there's also a commonly accepted convention about sort of 'transliteration' of western-european characters in URLs that lets to keep a URL both human-readable (without %xxx and so on) and seems to be recognisable by SEs. E.g. ö -> oe, ü -> ue, ß -> ss, etc.

    ReplyDelete
  2. Hi Rodion,
    in more simpler solutions, (ä, ö, å) is replaced with just (a, o, a). However both ways assumes that custom solution is required.

    ReplyDelete
  3. А вот любопытно, почему до сих пор никто не соорудил сервис для коротких ссылок на документы Шарепойнта? Не те сервисы, которые для Твиттера, а действующие локально (на ферме).

    ReplyDelete
  4. DkmS,
    добрый день. Поделки есть типа http://spurlshortener.codeplex.com. Видимо особой нужды нет, поэтому не сильно распространено видимо.

    ReplyDelete
  5. Is there a standard conversion of these characters ö ä ü è é for a URL? I am passing data via a querystring which contains these characters, if there is a code which I can decode on the Sharepoint side that would be great. I don't want to change the spellings for example from ö to oe as this will cause problems later in the process. Thanks in advance.

    ReplyDelete
  6. Hi Rian,
    if you pass data with high-ascii characters in query string and don't need to modify them, you should use HttpUtility.UrlEncode method (http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx). It will encode characters into "character-entity equivalents": ö -> %C3%B6, ä -> %C3%A4, ü -> %C3%BC, è -> %C3%A8, é -> %C3%A9. In order to decode query string use HttpUtility.UrlDecode method.

    ReplyDelete