Effective Cloud Incident Response: Fundamentals and Key Considerations

March 30, 2023

Human error behind misconfigurations, a host of insecure remote access issues, exposed business credentials with reused passwords and unpatched vulnerabilities have all contributed to a significant increase in cloud security incidents.

Many organizations don’t foresee the challenges of what it will take to protect their data and operations after a move to the cloud. We have addressed common misconceptions in our SMB Guide to Cloud Security and explored specific issues with container security, but what happens when an incident takes place in a cloud environment?

This article helps cloud service customers (CSCs) better understand common challenges in cloud incident response through the entire intrusion lifecycle.

The Cloud Incident Response Framework

Organizations used to the access and visibility they have with on-premises networks may find themselves losing time, missing critical steps or prolonging disruption if they haven’t accounted for cloud-specific factors when responding to an incident. Incident response planning is always a best practice, but in the context of a cloud incident, it is absolutely critical.

Some factors driving the differences are the shared responsibility model employed by cloud service providers (CSPs) as well as the diverse service models options themselves (Infrastructure-as-a-Service (IaaS), Software-as-a-Service (SaaS), etc.), not to mention knowing how data and applications hosted across various CSPs all fit together.

Similar to traditional incident response planning, cloud incident response guidance typically outlines the potential risks to the organization; what constitutes an incident; and the steps, resources, and priorities for addressing them. The SANS Institute outlines six steps for traditional incident response: prepare, identify, contain, eradicate, recover, and review lessons learned.

The Cloud Incident Response (CIR) Framework, developed by the Cloud Security Alliance (CSA), is a representative set of guidance that contains generally the same steps above but covers them in the context of cloud-specific risks and response considerations. These include, but are not limited to, critical areas of focus in cloud computing; business continuity planning for cloud computing security risks; cloud outage risks; cloud outage incident response (COIR) categories; and COIR detection and analysis as well as containment, eradication, and recovery activities before and after, for both cloud services providers and cloud service customers.

IR and the Cloud Management Plane

In Kroll’s experience, threat actors typically seek the servers or user accounts that will provide the greatest access and control over the network. For this reason, domain controllers and domain administrator accounts are the primary targets in on-premises networks. For organizations that have moved to the cloud, the so-called cloud management plane serves this purpose. Also known as the management console, administrator console or control plane, the cloud management plane is the web-accessible center for accessing and managing all the services and applications in the customer’s cloud instance. If access is not strictly limited or protected, an actor has the proverbial “keys to the kingdom.”

With access to the cloud management plane, actors can engage in virtually endless unauthorized activities in a matter of minutes, to include:

Creating unauthorized user accounts and elevating privileges for existing accounts
Creating virtual machines
Maliciously wiping out databases and deleting buckets
Completely taking over websites
Injecting malicious code into e-commerce websites
Executing cryptomining malware

Visibility, Logs and Cloud Costs

A common cloud misconception is thinking that the initial expense will be the final cost. While cloud implementations certainly offer valuable efficiencies and flexibility, security measures add costs that leaders might not anticipate.

For example, cloud costs are generally tied to data usage. From a cloud incident response perspective, network activity logs are crucial for detecting and responding to malicious events, but turning on AWS CloudTrail data or VPC Flow Logs, which are large, increases costs significantly and quickly. Faced with mounting monthly bills, organizations may be tempted to take the risk of turning them off or not enabling in-depth logging from the beginning.

Ultimately, the choice of which logs to use should be defined by organizational and regulatory needs along with risk-based priorities. Companies should also consider how, where and for how long they will store logs. Additionally, they need to assess how they will parse and examine that volume of data, either regularly or in the event of an incident.

Without adequate logs, the organization is forced to consider alternative methods to gain visibility across their environment. A lack of visibility means incidents have a higher chance of becoming serious compromises. Additionally, the lack of relevant logs severely hampers forensic investigations in determining the full extent of actors’ activities, forcing organizations to remediate more than might otherwise be necessary.

Building a Cloud Incident Response Plan

As we noted earlier, responding to a cloud incident requires understanding the differences between the visibility and control you have with on-premises resources and what you have in the cloud. This understanding is even more important given that many organizations run a hybrid model, with assets often intersecting in both spheres.

Using the intrusion lifecycle model developed by Kroll, it becomes apparent how cloud incident response means looking at potential and actual attacks from a new vantage point. For example, threat actors are aware of common ways that cloud database access can be misconfigured and regularly scout organizations for these errors.

Meanwhile, many vulnerabilities in cloud platforms go unpublicized, i.e., they do not appear in the common vulnerabilities and exposures (CVE) database, because they are considered the responsibility of CSPs, and not end-users, to remedy. Although CSPs are typically quick to address known vulnerabilities, fixes often require some final action on the part of their clients. If notification does not reach the correct person, there is generally no other channel to make the organization aware of the issue, leaving the door open for actors to gain access, perform reconnaissance and deploy toolkits undetected.

Finally, and not to belabor the point, but without robust logging of cloud activity, incident responders may have diminished visibility to fully investigate an incident.

Responding to cloud incidents requires knowledge and a purpose-built plan for the organization’s cloud environment in all its diversity and inner working layers.

Common Cloud Security Incidents Mapped to the MITRE ATT&CK® Cloud Matrix

MITRE ATT&CK combines an open-source repository of known adversarial tactics, techniques and procedures (TTPs) with a framework (matrix) that organizes TTPs for each phase of a cyberattack. It not only covers an entire network ecosystem but also offers subsets that apply to specific types of environments, such as the MITRE ATT&CK Cloud Matrix, which focuses on cloud infrastructure.

The MITRE ATT&CK Cloud Matrix and associated attack methods can be further broken down according to the type of enterprise cloud services, including Office 365, Azure AD, Google Workspace, and general SaaS and IaaS. We’ll leverage the cloud matrix to explore a few common cloud security incidents our team has investigated many times:

Phishing via Public Cloud Infrastructure

Most email defense systems rely on a series of signals to indicate the trustworthiness of the message, the links and the attachments it contains. Aware of this, attackers then leverage public cloud infrastructure such as AWS, Google Docs or OneDrive to masquerade links or host spoofed web pages. These massive cloud services are inherently trusted and can often bypass email filters.

OAuth Office 365 Phishing Attacks (aka Consent Phishing)

Users are tricked into providing their credentials to malicious Office 365 (O365) OAuth applications. These apps are registered with an OAuth 2.0 provider, with access to their O365 accounts. Once consent is granted, attackers can take over the target’s Microsoft accounts and make API calls through the malicious OAuth app to gain access to email, files, contacts and a variety of resources hosted on Sharepoint, OneDrive and other cloud storage spaces.

Valid Credential Abuse

This type of attack is especially impactful for IaaS platforms, which have the capability to grant resources access to other resources depending on the policy applied. In recent years, Kroll has observed threat actors leveraging vulnerabilities within code repositories and code analysis tools that have helped them identify and reuse authentication tokens or credentials to access cloud management planes. In one Kroll engagement, our investigators identified that actors leveraged one of these vulnerabilities to exfiltrate authentication tokens (normally used to authenticate to various applications utilized by the organization), as well as a personal access token associated with an account to clone the organization’s code repositories.

MFA Bombing

Multifactor authentication (MFA) bombing is a related tactic where actors flood an account-holder with prompts to permit authentication, counting on user error or fatigue to gain access.

Best Practices for Incident Response in the Cloud

While traditional incident response may be a good starting point in some instances, Kroll experts offer some considerations to enhance cloud-specific incident response:

Know Where Your Data and Applications Reside
Build a detailed map of where your data sits, to include the programs, applications, etc., that your organization is running. This inventory will help teams more completely and efficiently address the dependencies and scope involved in a particular incident.
Know Who is Responsible for Responding
This related step requires documenting who will be responsible—your organization and/or the CSP(s)—when an incident involves certain data or programs. The overall shared responsibility model and many service model options can lead to critical delays if this hasn’t been determined beforehand.
Outline Existing and Responder Technical Protocols
Predetermine, document and train IT and information security teams in the steps to follow upon detecting or suspecting an issue, including when and how to disconnect network assets and escalation points. As part of this exercise, teams should audit and document existing access controls as well as define how to provide responders with the access required to investigate.
Prioritize Logging as Much as Possible
Logs are crucial tools for helping responders in their investigations. Beyond that, should the event involve third-party data (e.g., data for employees, patients or clients), logs can be especially helpful in demonstrating that your organization was able to clearly track suspicious or unauthorized activities and forensically confirm the scope of unauthorized access.
Proactively Establish Relationships With Providers of Specialty Services
Security incidents in the cloud can be more complex given the various service models and network dependencies at work. In preparing your incident response plan, research and proactively establish relationships with service providers who have specific experience with cloud-related events, to include investigations, forensics, and recovery/remediation, as well as crisis communications and cyber insurance.

Complementary ways to assess and reveal gaps or weaknesses in cloud security or cloud incident response include vulnerability and penetration testing and incident response tabletop exercises.

Don’t miss our next article specifically on incident response best practice in the AWS environment.

From managing over 3,000 incidents annually, many of which involve client data and operations in the cloud, Kroll has a unique, real-world perspective on what actors are looking for and ways to block their attacks. To learn more about creating a cloud-specific incident response plan or validating and testing an existing plan, contact us today.

Connect With Us

Stay Ahead with Kroll

Cyber and Data Resilience

Kroll merges elite security and data risk expertise with frontline intelligence from thousands of incident response, regulatory compliance, financial crime and due diligence engagements to make our clients more cyber resilient.

Learn More

Cloud Security Services

Kroll’s multi-layered approach to cloud security consulting services merges our industry-leading team of AWS and Azure-certified architects, cloud security experts and unrivalled incident expertise.

Learn More

24x7 Incident Response

Kroll is the largest global IR provider with experienced responders who can handle the entire security incident lifecycle.

Learn More

Computer Forensics

Kroll's computer forensics experts ensure that no digital evidence is overlooked and assist at any stage of an investigation or litigation, regardless of the number or location of data sources.

Learn More

Incident Response & Recovery

Kroll’s elite security leaders deliver rapid responses for over 3,000 incidents per year and have the resources and expertise to support the entire incident lifecycle, including litigation demands. Gain peace of mind in a crisis.

Learn More