Global corporations that have a presence in the major world markets like Europe, Asia, and Americas, need to comply with “data sovereignty” requirements of the countries that they operate in. Data sovereignty requires that data collected and/or processed in the country is subject to the laws of the country, and must remain within its borders.
AWS has built cloud data centers in countries across the globe to provide better experience for local users, but more importantly to also comply with the data sovereignty requirements. The AWS Global Infrastructure spans 69 Availability Zones within 22 geographic regions around the world, and has announced plans for sixteen more Availability Zones and five more AWS Regions in Indonesia, Italy, Japan, South Africa, and Spain.
AWS also complies with Global Data Protection and Regulation (GDPR), which is a regulation in EU law on data protection and privacy in the European Union (EU). The GDPR aims primarily to give control to individuals over their personal data. AWS provides tools and services to build GDPR-compliant infrastructure. Given our geographic reach, it is very likely that there is an AWS region that satisfies the data sovereignty requirements, close to where you want to deploy your applications.
Pricing for AWS services varies by region. In regions where the cost of land, fiber, electricity and taxes is less – we pass those savings on to our customers. In this blog post, we’ll discuss how to determine the cost of your applications in different regions. This in turn will allow you to price your applications depending on where they are deployed.
The cost of application in regions depends on the services that comprise the application. For example – if an application is only using AWS Lambda, then its cost in the US and Ireland is the same, since Lambda pricing in the US and Ireland is identical. However, an application using Amazon Kinesis Data Streams service, will incur costs that are 20% lower in the US, than in Frankfurt for that service. Hence, you will need to factor in all the services that comprise your application to determine the true cost of your application in a region.
In this blog, I will walk you through an example that details the cost calculations for a modern “serverless data lake” application, across different regions.
Serverless Data Lake
First, let us look at the typical components of a “serverless data lake” architecture. In the data lake architecture shown below, data from the IoT sensors is ingested into Amazon Kinesis Data Streams, decoded by AWS Lambda, and the decoded data is saved through Amazon Kinesis Data Firehose into an Amazon S3 bucket. The decoded data is further transformed by leveraging the massively parallel capability of Amazon EMR and stored in the “enriched” bucket. Finally, Amazon Athena is leveraged by the business users to analyze data in the “S3 enriched Data lake” using standard SQL (structured query language).
To build the cost model, let’s understand how the different services are priced.
Amazon Kinesis Data streams pricing is based on the number of shard hours, the number of put payloads (where each payload can be up to 25KB) and the data retention duration (from one to seven days).
AWS Lambda pricing is determined by the memory allocated to the function, as well as how long it takes to execute the function, and is typically measured in GB-seconds.
Amazon Kinesis Data Firehose pricing is based on amount of data ingested and is priced at $0.029/GB in us-east-1.
Amazon S3 pricing depends on different tiers used in S3. Assuming S3 standard tier, the cost is $0.23/GB/month in us-east-1.
Amazon Athena pricing depends on the amount of data scanned and is $5/TB in us-east-1.
Now let us assume that our monthly costs to decode data from a million IoT sensors and enriching it are as follows for us-east-1. This is fictitious sample data. Your costs will be quite different and they are a function of the message size, as well as the amount of processing required to decode and enrich the data, so that it can be consumed by business users.
Table #1: Sample cost table
We now want to understand the cost associated with deploying the same “data lake” across different regions, based on our business needs. Since the pricing for services varies by region, I have created Table #2 (below), that highlights the price differences for the various services by region. The pricing information for the services used in the table below is up to date as of the publish date on the blog. You can always check the latest pricing for the services here.
In the table below, I have introduced a column called “price multiple”. The price multiple depicts the cost of a service in a region, relative to its cost in the US. For example, Kinesis Data streams cost per shard hour in the US is $0.015/hr, whereas the price in Frankfurt is $0.018/hr. Hence, the price multiple for Kinesis data stream in Frankfurt is 1.20.
Table #2: Price multiple across regions for services
Now that we know the price multiples for the various services compared to US, we can apply the price multiples to arrive at the monthly cost. This is shown in Table #3 below.
Table #3: Global pricing model
As you see in the table above, our data lake built using Kinesis Data Streams, Lambda, Kinesis Data Firehose, S3, EC2-EMR and Athena will be 12% costlier in Tokyo versus the United States. These numbers are directly governed by your original costs from the first table. If you were deploying an application in Europe, you will see that your application cost in Ireland is lower than in Frankfurt, and that could be a factor in determining where you choose to deploy your data lake.
In this blog post, we have shown you a detailed example of how to calculate the cost of your application in different regions. The cost of global deployments depends on the services that comprise your application. In general, the service costs in the US tend to be cheaper than other regions. Within Europe, you will see that service costs in Ireland are lower than the ones in Frankfurt. Hopefully, this cost model example will help you evaluate where you should deploy your application in the most cost-effective way.