20 JUN, 2023

Cloud Computing: storage and security

This post was originally a report written for an assignment as part of The University of Auckland Computer Science 727 Cryptographic Management. As such, it may read and present somewhat different to my usual posts that target an easy-reading general audience. In-text citations are referenced accordingly.

This report will provide a high-level overview of data encryption at rest, with regards to cloud computing storage. Firstly, a differentiation between transmitted and stored data will be explained, followed by some specific challenges faced by cloud computing for storage. Lastly, some methods that are used to secure cloud computing storage will be discussed, using Google Cloud as an example.

Data encryption domains

Data encryption is often discussed in the context of transmission, which has several key differences to storage. Transmitted data is typically done in small amounts, as the constraint is often the limited bandwidth of the intermediary medium, such as over an internet connection. Stored data is much larger, often by several magnitudes. An example of this difference is to compare an average website's transmitted size of around 2,000 kilobytes to an average smartphone's storage capacity of 100 gigabytes - five orders of magnitude larger [1, 2]. Another key difference is in accessibility: transmitted data must be intercepted in-flight, during transmission, at some point along the transmission path. This is a non-trivial problem that requires both appropriate time and place for an attacker to gain access to the encrypted data. Meanwhile, a key advantage of cloud storage is its accessibility, which is also a key disadvantage for the security of cloud storage. Cloud storage must be available to its users anywhere in the world with an internet connection, which also provides availability for attackers anywhere in the world with an internet connection. The last key difference is in the feasible lifespan of the data. Data in transmission is typically only of temporary or immediate use, such that breaking the encryption will not provide any meaningful advantage to an attacker if it takes a year to do so. Data in storage typically has a longer usable lifespan, often stretching up to a decade or more, such that breaking the encryption after a year will still provide a significant advantage to an attacker.

Cloud computing storage

Some more specific aspects of cloud computing storage that are key for security include its public availability. Some cloud storage data is publicly available to anyone, but must remain securely encrypted, which allows attackers to freely access the encrypted data. An example of this would be encrypted firmware for proprietary hardware devices delivered to public users without further authentication, such that the hardware verifies the firmware is valid and decrypts it. As mentioned earlier, the data must be globally accessible, which allows attackers from around the world to attempt access. Along with only requiring an internet connection, this allows attackers to remain relatively remote and anonymous, further masking their activities. As part of this globalisation, and due to the fundamental structure of the service model of cloud computing, clients do not own or manage the hardware that provides the cloud computing services they use. This makes it difficult for clients to audit the hardware, or for a third party to be able to audit the hardware, given its global distribution. A final key challenge to cloud computing storage is the ownership of data.

Data ownership

Is data a separate entity? The distinction between software and hardware has been debated several times during the digital age, and the legal status of software has now been relatively established. However, data is not so protected at this time, as it is neither designed hardware nor algorithmic software, but merely pieces of information. As we are increasingly conscious of our digital footprint of data, the legal protections and rights of that data are increasingly important to determine. This is particularly with regards to ownership, as most other issues and protections follow on from determining ownership. One possible solution is that data is owned by whomever owns the physical medium upon which it is stored. Whilst this is a very simple solution, it does not scale to businesses, and cloud computing. It would not be viable for a cloud services provider to own their client's data by simply providing storage for it. Another possibility is that data should be owned by the source, whomever generated it. Whilst this initially sounds reasonable, and has analogies to the physical world, it has severe implications for the digital world. This has the opposite issue of the prior situation, where users of a service own their data, despite it being used by an intermediary entity, and then stored on a cloud storage provider. As such, the intermediary would be incapable of fully owning the data that users on their platform generate. Whilst this is contended subject, the European Union has sought to enforce some protections over data in this manner, via the General Data Protection Regulation [3]. However, there are limitations to be considered, such as the precise definition of the source of the data. Is a mobile application a source of data, or is the person who owns the mobile device the source of the data? Can an organization be a source of data, or must it be attributable to specific persons? These questions are yet unknown and must be explored before appropriate legislation can be pursued. A final hurdle of data ownership is the global nature of cloud computing, and the internet. Laws are local to countries, whilst the internet mostly transcends physical boundaries, and data is ethereal except for the medium upon which it is stored. If data is owned by its source, and yet is stored in another country, the laws, and regulations by which it is owned must extend beyond the original borders of the country. This will likely lead to international dispute of data protection alongside a country's sovereignty, for which there is no simple solution but diplomacy. Do we need passports and immigration and customs control for data?

Challenges for cloud providers

A cloud computing provider must be especially diligent with their security practices, as they represent a large attack surface, both with their global infrastructure and the multiple clients that they serve. Their infrastructure must present as a homogenous entity, such that security is uniform across its surface, otherwise an attacker will be able to exploit a weakness. Furthermore, if a client suffers a breach of security, this breach, and all affects, must be constrained to that single client, such that other clients are not compromised. These are challenging requirements to enact at scale, particularly with the varying infrastructure available across the globe and the possibilities of intrusion that internet-accessible services face. Furthermore, the cloud provider itself must be trustable to its clients, by providing trustable hardware environments for which to store data. It must also provide secure and trustable APIs by which to work with the stored data. Finally, it must ensure that employees with potential access to key infrastructure or client data are trustable, such that they would not compromise the security of the whole system. These three aspects can be effectively mitigated from requiring trust by the implementation of homomorphic encryption algorithms, which allows data to be processed without decryption [4].

Organisational security risks

A key risk to organisational security is malicious behaviour of insiders, such as employees or clients. One such countermeasure, as previously mentioned, is homomorphic encryption. It allows employees some level of manual access to data, without decrypting the data first. This means that the encryption keys can be securely stored, and need not be accessible to insiders at all, without restricting the general operations of the provider. A further countermeasure is the careful design and application of Access Control Policies. Some examples are to minimise the amount of access to specific data via privilege levels for insiders, and to implement activity logging with suitable warnings for unusual or inappropriate accesses.

Physical security risks

Due to the global infrastructure requirements of providers, appropriate security policies must also be enacted for each physical location. Such policies would include the deployment of guards, key card access to restricted areas, and monitored alarm systems.

Compliance and auditing risks

For cloud providers operating in New Zealand, the government has jurisdiction and thus applies the New Zealand Privacy Act - your data is your data. These technicalities are likely specified in the service agreement as determined by the cloud services provider. Some providers, such as Amazon Web Services, allows for the region of data storage to be explicitly specified by the client [5].

Data security risks

Confidential virtual machines are a key method that cloud providers can use to ensure that the environment provided to their clients is trustable and secure, regardless of the underlying hardware or operating system environment. Some examples of this are Microsoft Azure DCasv5-series and ECasv5-series confidential VMs [6]. Google Cloud also provides a confidential computing environment via their confidential VM configuration of the Compute Engine VM [7]. Access control is another key tenet of data security, with accepted industry solutions having been implemented in most areas. These include multi-factor authentication as part of a strong Identity and Access Management system, to be implemented by both the cloud provider and the client users. Lastly, strong encryption of the data at rest is of utmost importance. Microsoft Azure, Amazon Web Services, and Google Cloud use the industry-standard AES-256 encryption scheme [8, 9, 10]. However, any encryption scheme is useless without an appropriately secure method of storing and managing the encryption keys. One such popular strategy is the use of envelope keys, to encrypt data encryption keys that are then applied to blocks of data at rest. This is used by Microsoft Azure, Google Cloud, and Amazon Web Services [8, 9, 11].

Example (Google Cloud)

This section will explore the encryption process and key management mechanisms as implemented by Google Cloud, which are alike to those used by Microsoft Azure and Amazon Web Services. The first step of this process is to separate data into blocks, with each block encrypted by an individual Data Encryption Key (DEK). These blocks of data can then be distributed and stored amongst the cloud provider's infrastructure according to their own internal methods. This is graphically represented in figure 1.

236737319_Update diagram 2 in Encryption at Rest Layer 1 zone external - blue zone external - blue zone external - blue

Figure sourced from Google Cloud

The individual Data Encryption Keys are then collated into nearby blocks, and encrypted with a Key Encryption Key (KEK). This allows for fast access to individual data blocks, as the keys are stored nearby to the data they protect, in a distributed manner that allows for parallelisation. The Key Encryption Keys are also collated, so that they can be stored in a centralised Key Management System (KMS). Figure 2 details this process.

Figure sourced from Google Cloud

The access to the Key Management System is managed by the Identity and Access Management system as implemented by the provider or client. This allows for clients to generate or even store their own master keys, with the data storage implementation details and individual Data Encryption Keys abstracted away by the provider.

Conclusion

This report has detailed the key practical and legal issues that apply to cloud computing storage providers. The cryptographic implementation, including key management and distribution allows for providers and clients to mitigate the vulnerabilities of a single provider serving multiple clients, whilst confidential virtual machines provide a secure computing and storage environment. Practical issues of physical security and secure data storage and access controls have known working solutions, whilst the legal debate of data ownership and sovereignty remains unspecified.

References

Specific sources

  1. Page Weight | 2022 | The Web Almanac by HTTP Archive
  2. Report: The average smartphone storage crossed 100GB in 2020
  3. General Data Protection Regulation - Wikipedia
  4. Homomorphic encryption - Wikipedia
  5. Global Infrastructure
  6. DCasv5 and ECasv5 series confidential VMs | Microsoft Learn
  7. Confidential Computing concepts | Confidential VM | Google Cloud
  8. Azure Data Encryption-at-Rest - Azure Security | Microsoft Learn
  9. Default encryption at rest | Documentation | Google Cloud
  10. The importance of encryption and how AWS can help | AWS Security Blog
  11. Use envelope encryption with AWS KMS keys - Financial Services Industry Lens

General sources