We used Azure table storage as NoSQL data storage for quite a time and got some experience which may be useful to share with community. First of all I want to say that I like Azure table storage: it is very simple to use (there are different SDKs), flexible with it's dynamic data schema for tables (although this must be a common thing in NoSQL world), fast (if you use it correctly and avoid full table scan scenarios) and last but not least - it is very cheap.
However at the same time it has number of cons which may be significant depending on amount of data you need to process and performance which you need to achieve. I collected them in the following table:
Cons
Comments
No LIKE operator
Can't perform search by sub string. Need to introduce complex data duplication patterns with different PartitionKey/RowKey for avoiding full table scan and performance issues
Only 2 system indexed columns (PartitionKey/RowKey) without possibility to add own indexes
Can't add custom indexed fields for improving queries performance
No full pagination support on API level
There is possibility to limit returned resultset size, but no possibility to skip number of rows
No native backup options
There are 3rd party (including open source) solutions which can be used but no built-in support from MS
Like I said I personally like Azure table storage. But due to these cons you may also consider other options for data storage e.g. Azure CosmosDB which is more expensive but also more powerful at the same time.
As you probably know Azure function app is provisioned with Storage account behind it which is used for internal needs of function app (e.g. storing compiled dlls of Azure functions in blob storage of this storage account or storing App insights logs in tables). If we need to get instance of this storage account associated with Azure function app we may use the following PowerShell:
In above code we first get instance of function app using Get-AzWebApp cmdlet and find AzureWebJobsDashboard application setting which contains connection string of associated storage account. After that we retrieve storage account name from connection string using regex and finally call Get-AzStorageAccount to get actual storage account instance. Hope it will help someone.
One of known pain of developers who works with Azure table storage is lack of SQL LIKE operator which allows to perform queries with StartsWith, EndsWith or Contains conditions. There is another NoSQL database available in Azure - CosmosDB. It is not free (although Table storage is also not free but is very cheap) but with some notes it supports LIKE operator.
The first thing you should do when create new CosmosDB instance is to select underlying API:
At the moment of writing this post the following APIs were available:
Core (SQL)
MongoDB
Cassandra
Azure table
Gremlin
If you will create CosmosDB with Azure table API - it will look the same as native table storage and will have the same 6 operations available for queries: <, <=, >, >=, =, <>. I.e. no LIKE operator still:
But if you will create it with Core (SQL) API it will look different (it will work with "documents" instead of tables and rows):
and which is more important will allow to use SQL-like API and queries from Microsoft.Azure.Cosmos nuget package. E.g. if we have Groups table we may fetch groups which have some user as member using the following query (here for simplicity we assume that members are stored in JSON serialized string):
string userName = ...;
string sqlQueryText = $"SELECT * FROM c WHERE c.members like '%{userName}%'";
var queryDefinition = new QueryDefinition(sqlQueryText);
var queryResultSetIterator = this.container.GetItemQueryIterator<Group>(queryDefinition);
var items = new List<Group>();
while (queryResultSetIterator.HasMoreResults)
{
var currentResultSet = await queryResultSetIterator.ReadNextAsync();
foreach (var item in currentResultSet)
{
items.Add(item);
}
}
So if you need NoSQL database with support of LIKE operator consider using CosmosDB with Core (SQL) API.
As you probably know Table storage is basic NoSQL db option available in Azure. It is very cheap and used in many Azure components internally (e.g. in Azure functions or bots). From other side CosmosDB is powerful NoSQL db engine available in Azure which support several different APIs. You need to select this API when create CosmosDB instance and it can't be changed after that:
Core (SQL)
MondoDB
Cassandra
Azure Table
Gremlin
I.e. if you have existing application which already uses Table storage - you may create CosmosDB with Azure Table API and switch to it without rewriting application code (although this is in theory. E.g. if you app targets .Net Framework you will most probably need to retarget it to .Net Standard or .Net Core. But this is topic of different post).
In our product we use Table storage and decided to compare performance of CosmosDB (with Azure table API) and native Table storage. For testing the following data was used:
table with about 100K rows
query against system PartitionKey field: PartitionKey eq {guid} (this is fast query itself because PartitionKey is indexed field)
each query returned about 300 rows from overall 100K rows
execution time was measured
Same test had been ran 10 times in order to get average execution time - for CosmosDB and Table storage separately. Here are results:
Test
run#
CosmosDB
(msec)
Table
storage (msec)
1
3639
2825
2
3249
2008
3
3626
1613
4
3016
1813
5
2764
1942
6
2840
1713
7
4704
1776
8
2975
2036
9
3353
1808
10
2833
1699
Avg (msec)
3299.9
1923.3
I.e. average execution time for native Table storage was less than for CosmosDB with Azure Table API.
As you probably know Table API used by Azure table storage is limited by the following operators which can be used in queries:
=
<>
>
<
<=
>=
We needed to run analogue of NOT IN Sql operator against RowKey column:
RowKey NOT IN (id1, id2, …, id_n)
where these ids were guids represented by strings. In order to do that with mentioned operators we needed to use not equal operator (<>) and construct query like that:
RowKey <> id1 AND RowKey <> id2 AND … RowKey <> id_n
The problem is that this list of guids didn’t have fixed size and could be potentially big. So it was interesting is there limit for max query length and if yes – what is exact value of this maximum query length.
For testing I created test app and generated conditions over guids like that (in addition to mentioned conditions on RowKey we needed to get groups from particular partition only – so this extra condition was added there as well. Exact partition name is not important – we only need to know that its length was 8 symbols – it will be used for overall query length calculation below):
var tableClient = storageAccount.CreateCloudTableClient(new TableClientConfiguration());
var table = tableClient.GetTableReference("MyTable");
var queryConditions = new List();
queryConditions.Add(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Test1234"));
int numOfConditions = 50;
for (int i = 0; i < numOfConditions; i++)
{
queryConditions.Add(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.NotEqual, Guid.NewGuid().ToString().ToLower()));
}
string filter = CombineUsingAnd(queryConditions);
var query = new TableQuery().Where(filter);
var watch = System.Diagnostics.Stopwatch.StartNew();
var result = table.ExecuteQuery(query).ToList();
watch.Stop();
public static string CombineUsingAnd(List queryConditions)
{
if (queryConditions == null || queryConditions.Count == 0)
{
return string.Empty;
}
string filter = string.Empty;
foreach (string queryCondition in queryConditions)
{
if (string.IsNullOrEmpty(queryCondition))
{
continue;
}
if (string.IsNullOrEmpty(filter))
{
filter = queryCondition;
}
else
{
filter = TableQuery.CombineFilters(filter, TableOperators.And, queryCondition);
}
}
return filter;
}
E.g. here how query looks like with condition on 50 guids:
((((((((((((((((((((((((((((((((((((((((((((((((((PartitionKey eq 'Teamwork') and (RowKey ne '60712ffe-061b-4a46-8402-01381177c4da')) and (RowKey ne '85694313-2776-485c-833d-ae78c185adba')) and (RowKey ne 'dac6f7fd-1198-486c-b3ea-e90337ee9711')) and (RowKey ne '5faf85cb-4662-4843-87f6-ac6dc20b04e1')) and (RowKey ne '8f524940-9822-46e1-b961-59a2e8fa51b4')) and (RowKey ne 'bacd765e-64d3-464a-a57e-bb30eeca0c2f')) and (RowKey ne '75ba6126-f529-4b30-ae41-882f2b223bff')) and (RowKey ne '5e9950ac-e611-4d58-8c91-8fcf16f2886a')) and (RowKey ne '8f08bd7b-7d6e-4c21-8b94-d5ca2a15f849')) and (RowKey ne 'a965f6f6-f03c-4b4a-91be-98481a35ba11')) and (RowKey ne '5858de75-addb-4f5a-acc1-6f9f55078b4f')) and (RowKey ne '9ada1d04-dd03-4b1f-ae26-bc51b7f35ec3')) and (RowKey ne 'cbb976e7-87c9-4b83-8007-355f3ce1061d')) and (RowKey ne '59e06da8-f343-4a98-9382-0bbf1ce21d3f')) and (RowKey ne 'ea0763d9-cb64-4750-a7e0-c1869ca82a15')) and (RowKey ne '926106f9-bf64-42db-b27f-be9bcb45907e')) and (RowKey ne '3cdcaf1e-ff36-4243-b865-9424e1f46427')) and (RowKey ne '8fcfc4ca-7510-4fa6-95d7-3ff2999de509')) and (RowKey ne '21b04d3a-32e0-4a59-900a-30babfa8fbd2')) and (RowKey ne 'df181ee8-4dc0-4232-b2bc-7e13dc8f2abf')) and (RowKey ne '6a0038d0-40a4-4a11-b626-9850b8178caa')) and (RowKey ne 'ffb78a96-85c9-4648-aba7-3baa56be6c1f')) and (RowKey ne '0c56afd8-164e-4574-9054-8824e6fc1969')) and (RowKey ne '51d95602-cfee-40bf-b9e3-b0f935083742')) and (RowKey ne 'f286d1fb-b9bb-4850-a911-58a77ffc7991')) and (RowKey ne '50280089-e61b-4874-8f99-8dc437f74181')) and (RowKey ne '05f72847-3936-4efc-9902-540a9d9c5ca8')) and (RowKey ne '066fcbe4-afee-46ff-b6d2-18e072f850a0')) and (RowKey ne '602f7cd4-7697-49ca-af51-e96c70eefed5')) and (RowKey ne '6858f832-a58e-4bcb-b57b-7c40d32cbcb6')) and (RowKey ne 'f9e2cc4f-a579-46f4-820b-e7d8ab9711c8')) and (RowKey ne 'cbb68c38-7e73-463b-8839-bef040803c37')) and (RowKey ne '1bf0d7ff-ece6-4ef1-9d48-ae424feea108')) and (RowKey ne 'de1dbdbf-8a75-40b4-bb9b-81e95ff4a542')) and (RowKey ne '31f5c94e-3eea-4c5a-8e60-8899eddc7829')) and (RowKey ne '94bc24a8-28e4-4b6d-965e-9360104bff21')) and (RowKey ne '93d7f84e-eb4b-4c7d-826b-196e8b46255e')) and (RowKey ne 'bd0d0989-7256-4ffc-8904-6135955b8aae')) and (RowKey ne '9c38865a-913c-468c-bd26-25e606f9b73a')) and (RowKey ne 'c2f9b899-76c1-4b04-aa68-fb3f903d81d4')) and (RowKey ne '9b478051-78b2-4a67-a25f-6cc09251ec33')) and (RowKey ne '908f80ea-e31f-4a85-9ff8-8cb80b601434')) and (RowKey ne '7775f8d4-2b9c-4903-a191-b54d214e380a')) and (RowKey ne '55b24a87-3259-4ca4-8869-f6d68f9c95c3')) and (RowKey ne '0476263d-cf8a-49b3-89f1-0bc8df9fd906')) and (RowKey ne 'f7a45c30-eba7-4944-b08b-a875d4b871e0')) and (RowKey ne '4136dc3c-10d3-4d58-afb9-23d8b422822e')) and (RowKey ne 'b6a40f8e-5284-4cbd-9356-2296df415650')) and (RowKey ne 'd63ecb62-5f48-4314-8a15-08c0b62ee835')) and (RowKey ne 'e3acdc4f-70a5-4a73-b023-1845ccf9ea51')
Testing results are summarized in the following table:
Num of
conditions
Filter length
Query elpsed
time
50
2876
12075
100
5726
10550
112
6410
10361
113
6467
10967
114
6524
Bad request
> 114
Bad request
So Table storage query maximum length is between 6467 and 6524 symbols. For us this knowledge was enough to proceed. Hope that this information will help someone.