Experience using Azure Cosmos DB in a commercial project
In this article, we will share with you our experience using the Azure Cosmos DB database service in a commercial project. We will tell you what the DB is for, and the nuances we came across throughout development.
In this article we will share with you our experience using the Azure Cosmos DB database service in a commercial project. We will tell you what the DB is for, and the nuances we came across throughout development.
What is Azure Cosmos DB?
Azure Cosmos DB is a commercial, globally distributed database service with a multi-model paradigm, provided as a PaaS solution. It is the next generation of Azure DocumentDB.
The database was developed in 2017 at Microsoft Corporation with the participation of Dr.Sci. Leslie Lamport (winner of the Turing Award 2013 for a fundamental contribution to the theory of distributed systems, the developer of LaTex, the creator of the TLA + specification).
The main characteristics of Azure Cosmos DB are:
- Non-relational database
- Storage of documents in the JSON format
- Horizontal scaling with the ability to select geographic regions
- Multi-model data paradigm: key-value, document, graph, family of columns
- Low latency for 99% of queries: less than 10 ms for read operations and less than 15 ms for (indexed) write operations
- Designed for high throughput
- Ensures availability, consistency of data, delay at SLA level of 99.999%
- Configurable throughput
- Automatic replication (master-slave)
- Automatic data indexing
- Configurable levels of consistency of data Five different levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual);
Notice the dependence of different levels of consistency on availability, performance, and consistency of data.
The graph is up – availability, productivity; graph down – consistency.
- For a convenient transition to Cosmos DB from your database, there are many APIs for accessing data: SQL, JavaScript, Gremlin, MongoDB, Cassandra, Azure Blob
- Customizable firewall
- Configurable DB size
The problem that we solved
Thousands of sensors located around the world transmit notifications every N seconds. These notifications should be stored in the database, and allow for searching and display in the UI of the system operator.
Customer requirements:
-
Using the Microsoft technology stack, including the Azure cloud
-
Throughput of 100 requests per second
-
Notifications do not have a clear structure and could be further expanded
-
For critical notifications, processing speed is important
-
High fault tolerance of the system
Based on customer requirements, we were perfectly suited for a non-relational, globally distributed, reliable commercial database.
If you look at databases similar to Cosmos DB, you would find Amazon DynamoDB, Google Cloud Spanner. But Amazon DynamoDB is not globally distributed, and Google Cloud Spanner has fewer levels of consistency and types of data models (only a table view, a relational one).
For these reasons, we settled on Azure Cosmos DB. Azure Cosmos DB SDK for .NET was used to interact with the database, since the backend was written in .NET.
The nuances we encountered
-
Database Management Tools
In order to start using the database, first of all, you need to select a tool to manage it. We used Azure Cosmos DB Data Explorer in the Azure portal and DocumentDbExplorer. Also, there is a utility Azure Storage Explorer.
-
Configuring Database Collections
In Cosmos DB, each database consists of collections and documents.
Customizable characteristics of the collection, which should be noted:
- Size of collection: fixed or unlimited
- Bandwidth per request units per second RU/s (from 400 RU/s)
- Indexing policy (Including or excluding documents and paths to and from the index and from it, configuring various index types, setting up index update modes)
Example of a typical index:
{ "id": "datas", "indexingPolicy": { "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Hash", "dataType": "String" }, { "kind": "Spatial", "dataType": "Point" } ] } ], "excludedPaths": [] } }
To search for a substring in string fields, you need to use the Hash-index ("kind": "Hash").
-
Database transactions
In the database, transactions are implemented at the stored procedures level (Execution of a stored procedure is an atomic operation). Stored procedures are written in JavaScript.
var helloWorldStoredProc = { id: "helloWorld", body: function () { var context = getContext(); var response = context.getResponse(); response.setBody("Hello, World"); } }
-
Database change channel
Change Feed listens for changes in the collection. When there are changes in the collection documents, the database throws a change event to all the subscribers of this channel.
We used Change Feed to track changes to the collection. When creating a channel, you must first create an auxiliary AUX collection, which coordinates the processing of the change channel for several work roles.
-
Database restrictions
- Absence of bulk operations (used stored procedures for mass deletion, updating of documents)
- No partial update of the document
- There is no operation SKIP (complexity of the implementation of pagination). To implement the pagination in requests to receive notifications, we used the parameters RequestContinuation (reference to the last element as a result of issuance) and MaxItemCount (the number of items returned from the database). By default, results are returned in packets (no more than 100 items and no more than 1 MB in each package). The number of items returned can be increased to 1000 by using the MaxItemCount parameter
-
Processing the Error 429
When the collection Throughput reaches a maximum, the database starts throwing an error "429 Too Many Requests". To process it, you can use the RetryOptions setting in the SDK, where MaxRetryAttemptsOnThrottledRequests is the number of attempts to execute the query, and MaxRetryWaitTimeInSeconds is the total time that connection attempts are attempted.
-
Forecasting the cost of using the database
To predict the cost of using the database, we used the online calculator RU / s. In the base plan, 1 query unit for a 1 KB element corresponds to a simple GET command by reference to itself or the identifier of that element.
Resume
Azure Cosmos DB is easy to use, easily and flexibly configured through the Azure portal. A lot of APIs for accessing data allow you to quickly transition to Cosmos DB. You do not need to get the database administrator to maintain the database. Financial guarantees of SLA, global horizontal scaling make this database very attractive in the market. It is great for use in corporate and global applications that require high demands on fault tolerance and throughput. We at WaveAccess continue to use Cosmos DB in our projects.
To get your custom Azure Media Services based project, contact us!
Let us tell you more about our projects!
Сontact us:
hello@wave-access.com