Security Review for SaaS Systems

What & How to Look at Security at Every Level

15 min readMar 10, 2023

I am often asked by CEOs and CTOs to do security reviews of their SaaS products, as securing them is critical for their users, companies, and often legal compliance.

This article covers how I approach these reviews, what I look at, how I look at it, and how to best help SaaS developers improve security over time.

These are not meant to be exhaustive lists of things, as there are many of those for various areas, but more about the most obvious things I’ve run into recently on various systems — they are almost all low-hanging fruit, but sadly, I often find them hanging!

Getting Started

The overall goal of any engagement and review is to improve system security, usually via a fairly in-depth look at the code, infrastructure, configurations, tooling, and 3rd party components of existing products.

This can cover many diverse areas, and of course somewhat depends on areas of existing concern, prior weaknesses or exploits, and projected needs.

I like to look at the following areas, in the following ways — basically working from the bottom up in layers:

Infrastructure

At the base of it all is infrastructure, usually, the cloud environment where this system deploys, typically AWS, but could include GCP, Azure, Heroku, etc.

Modern cloud infrastructure is surprisingly challenging to secure, especially over time in the face of constant changes, upgrades, and improvements.

The following focuses on AWS, with AWS terminology, but applies anywhere.

I generally use the console and a few tools, maybe a CLI or two, etc. to look around for these obvious challenges. Sometimes I’ll use more in-depth analyzers to really look at things like policy holes, but I can find a lot in just 30 minutes of focused wandering around.

General

Broadly, I’m looking for general issues that cause problems on the cloud, or are just good practices, such as;

Account — Is the core account setup correctly, with proper contacts, as I often find empty contacts, which makes it hard to recover from a lost password, a departed employee, etc., and even allows folks to steal the account if someone can be coerced to put in a former or non-employee.
Security Questions — These are often not set, which is probably fine, but probably should be set to a set of random answers, though they also need to be changed if an employee with root access leaves.
Naming — Do all the objects have tags and/or names? If not, it shows sloppy management and will make the system very hard to manage in the future.
Config — AWS Config tracks infrastructure configuration history, which is very helpful for auditing and tracking any changes, especially from malicious actors, but also from mistakes, manual or unanticipated changes, etc.
Security Hub — AWS has a nice basic set of checks and tools in the Security Hub, which should be enabled.
Security Tools — AWS has a number of tools that ideally should be in use, such as: Cloud Trail, Guard Duty, Inspector, Macie, and Detective. At least Cloud Trail should be enabled to track AWS-level activity.
SEIM — Using a SEIM system is fairly advanced, but should be considered for larger systems and when there are heightened security requirements.

IAM

The IAM system is at the heart of cloud security, as it literally controls all of the users, permissions, and access to the system, for both human users and APIs or third-party tools. As such, IAM configuration and use is critical to good security.

At its heart, IAM is simple, just users, groups, policies, and roles, and its use should be kept as simple as possible, but no simpler! That means, simple, well-aligned permissions and roles, not just admin rights for everyone!

Root User — Are people using or sharing the root user? That should never happen, and the root user is very rarely used, only for key account-level changes that require it. The root user should also have MFA enabled, ideally with a backed-up app (not Google Authenticator), as resetting the root MFA is difficult (and requires good contacts, see above).
MFA — Generally, MFA should be enabled on all console users. It’s an annoying, but critical, part of securing basic user access to the system. MFA should be used where possible on API operations, too, though this is more complex.
Password Policies — Strong policies should be set, with length of least 12 with all character requirements enabled. Users should be strongly encouraged to use completely random passwords.
Access via Groups — All access, especially for administrators, should be via groups, not by applying direct policies to users. This makes management much easier, especially removing admin access when necessary. Only very specific policies should be attached to users, and even then, it’s often better to create a group first, which can clearly document why and how a permission is given.
Excess Permission — Users are often given way more access than they need, such as admin rights for some simple tasks, or excess rights on a service such as EC2 just to do one thing. This is especially true for API users, such as Terraform or GitHub Actions. Access should be reviewed and limited to what’s actually needed.
Stale Users — Old users are sometimes left in the system, in part to support old tooling with APIs, or just forgotten as employees left (or maybe they helped out for a while after departure). These should be purged regularly, and if must be retained for automation, the keys and passwords changed.
Unused Keys — It’s common to create API keys when creating users, but keys should only be created when and if needed. AWS can show when they were last used, so if it’s been a long time, they should be deleted.
Unused Users — Some engineers have a user but rarely or never use it. If an account is never used, or is no longer used, it should be deactivated.
Unused Policies — Some customers have old, unused policies which clutter the account and can easily get used by accident, causing problems and possibly granting excess permissions. They should be deleted.
IAM Scanner — There are some good scanning tools for IAM that can deeply scan more complex accounts and permissions structures for problems. These should be used, if possible.

VPC

The overall network architecture is important for system operation, and also for security. They are pretty simple in most cases, though there are a few things to consider (at least until you do complex peering, etc.) as the basics:

Public vs. Private Subnets — Most VPC networks include, at a minimum, a set of public subnets and a separate set of private subnets. The only real difference is routing, but they provide a basic way to separate public-facing services from private backend services. This is good, and more complex systems often have multiple sets of private networks, typically for specific services or protections, such as authorization, databases, logging or management, etc. All resources should be reviewed for their network placement to ensure they are in the right place.
Security Groups — Every SG should be reviewed for scope and accuracy, especially with too many open ports, CIDR blocks, etc. And making sure rules have descriptions, make sense, etc.
Outbound Access — Not all instances and services need outbound access in some systems, especially for protected backend services such as authorization. Most systems default to fully open outbound, but this should be reviewed, though if all outbound is blocked, consider how software and OS will be updated, etc. (less of an issue with more ephemeral instances, containers, etc.)
Peering Controls — If your VPC is peered with other networks, especially with third parties, ensure there are very tight access controls, in part to prevent any unwanted intrusion at the 3rd parties to access resources.
Unused EIPs — Many systems have unused EIPs that are left from prior services, tests, etc. but may be in whitelists of other important services and thus should be deleted to avoid accidental or future reuse.

EC2

There are many security elements on EC2 and instances, though most are linked to OS access, updates, and hardening. All the usual Linux & Windows best practices apply, though there are a few cloud-related items:

Key Pairs — How many of these exist and who has them, as these must be highly protected in all cases. Often, ex-employees still have access to these keys, which coupled with public ssh access can be a serious risk.
SSH Access — How are instances accessed? Public IP and no restrictions are the worst, and should be limited to fixed IPs, or better, a single SSH bastion, or even better, a VPN system like TailScale. AWS and other clouds also have new tools such as IAP tunnels, etc. for better and more secure access.
Instance Roles — What instance roles do the instances have and what permissions are granted? Instances usually have ignored roles or roles with much too much access, which coupled with any ssh or application vulnerabilities, can lead to serious system risks.

RDS

Databases have a long list of best security practices, including in the config, users, and code, but for the cloud itself:

Root User — Some apps just create a single user at setup time and use it for everything, which is a very bad practice. Root users should be protected and saved for only critical uses that require root access.
IAM Integration — On clouds that support it, using IAM users instead of native users and passwords can increase security, auditability, etc.
Public IP — RDS often defaults to enabling a public IP on the RDS databases, a fact easily missed by engineers setting up the system This is a huge security risk as it opens the DB directly to attacks over the Internet. This should never be enabled unless there are very specific reasons and protections in place.
Security Groups —The Security group for the RDS DBs should be very tight, usually only opening to the application servers or private subnets that will need access.
Remote Access —Engineers and DBAs often need remote access to RDS DBs for troubleshooting and other work. This should be very strictly limited and provided over a dedicated SSH, VPN, or other secure gateways. On clouds that support it, such as GCP, connecting via IAP tunneling is even more secure. Remote tools such as Retool, Bubble, Airtable, etc. should also use these methods.
Backups — Of course, databases should be backed up regularly, and ideally, recovery tested into a new instance periodically.
Versions — Systems are often trapped on older versions of database software, for real or perceived reasons, but this opens up possible security risks, so any non-current version should be evaluated and ideally upgraded — for MySQL, this is rarely a problem and usually easy to test.
Third-Party Users — Some systems use third-party tools such as Retool or Bubble to access their database. They should use dedicated users, ideally with only read-only permissions.
Logging — RDS logging is not always enabled and pointed to the cloud’s logging infrastructure (e.g. Cloudwatch). This should be enabled, as logs are very useful for troubleshooting and tracking possible hacks or security issues (such as failed login attempts).
Secure DB — Some DBs, e.g. MySQL, have a secure DB process or script that should be used to help harden the DB by removing guest users, dropping sample DBs, etc.

S3

S3 and similar bucket systems are very powerful additions to cloud systems and enable all sorts of nice data sharing.

However, they are often misconfigured (or not configured), and thus oversharing, especially in complex systems with many buckets, complicated policies, and constant changes. The challenges of some cloud bucket security consoles and practices (e.g. AWS) don’t help either.

A few things to check:

Public Access — The most basic is documenting which buckets have public read or write access and ensuring that’s correct. This type of audit should happen very frequently.
Versioning — For reasonably sized buckets, versioning can and should be used to protect against possible overwrites or mistakes; of course, a hacker can overwrite 10 or 100 times and lose the version history, but this is a good way to rollback and prevent accidents or malicious changes.
MFA Delete — For valuable data, such as backups, MFA delete should be enabled. This forces an extra layer of protection so hackers or mistakes in IoC tools like Terraform can’t easily delete important files.
Logs — For all but very high volume buckets, access logs should be enabled so it’s clear who is accessing what files when which can be useful during or after an attack.
CORS — For buckets serving static assets, enable CORS to help protect assets from being loaded elsewhere. This is not perfect protection but helps reduce casual use and theft.
CSP — For buckets serving static Javascript apps like React.js, etc., enable Content-Security-Policy (CSP) headers which then control where other elements of the app can be loaded from. Since the index.html of the app often comes from these buckets, it controls the overall app security policy. This can be set as part of custom headers.

Cloud Front

Cloud Front is a very good CDN, with lots of security options and configurations, but there are a few basic things:

Logging — Access logging should be enabled to track what is accessed from where, and when.
HTTP — HTTP should not be enabled on the CDN.
Geographic Restrictions — While unusual, consider these to help avoid hacking from problematic countries, and if customers are really only USA or single-country-based, consider stronger limits.
WAF — Consider enabling the WAF for basic protections.

There are, of course, many, many more cloud services, but the same general concepts and checks apply to them, too.

Source Code

The system source code is obviously the heart of any system, and often where the most non-obvious vulnerabilities lie. This is especially true with recent hacker interest in supply chain vulnerabilities and corruption, which can bite nearly any company.

There are many guides on good programming practices, but at a higher level, some key items:

Framework — The most basic security protections are provided by the system framework, such as Laravel, Spring Boot, Rails, etc. These have evolved to support best practices in most areas, from password hashing to CRSF protection to session handling, database injection, and much more. Ensure all these protections are enabled and in use, as many can get accidentally disabled by junior developers trying to make things work.
OWASP Top 10 —Developers should ensure every system avoids the current OWASP Top 10 Application Security Risks. Frameworks (see above) handle most of this, but care should be taken to avoid accidentally creating any of these risks. Others, such as A06:2021 on vulnerable and outdated components are rising in risk and need direct attention (below).
OWASP Cheatsheet — OWASP also publishes very useful cheatsheets for popular technologies, such as Java, PHP, Ruby, etc., and also for the various clouds, databases like MySQL, etc. and specific processes or technologies such as authorization, JWT tokens, etc. Developers should ensure their work and system follows all these best practice recommendations.
Supply Chain Security — This is an increasingly challenging and risky area, ensuring all the dozens (or hundreds) of components we use in our applications are secure (and not tampered with). There is a range of options here, but some key steps are:
- Keep up to Date — This is most important, using your existing tools like npm, maven, etc. to keep your libraries up to date to avoid any recent vulnerabilities. There are pros and cons of version pinning, but try to stay on the latest versions.
- Scanners — There are a number of vulnerability scanners, including those built into GitHub and other tools, that can highlight known vulnerable components which need immediate attention.
- Signatures & Pinning — If you pin versions, consider adding the newer security signatures and other methods to ensure your components are valid and not tampered with. This can complicate a lot of build and delivery processes but may be worth it for important software.
Static Code Checker —Developers should be using a lint-like tool for all their code, enforcing best practices that both avoid bugs and help eliminate common security errors. In addition, static security checkers such as Fluid Attacks static scanner (recommended by Google for their partners).
Dynamic Tests — Periodic dynamic security testing (DAST), such as with OWASP ZAP, should be run on the application to look for run-time issues, including cookies, headers, fields, functions, data handling, etc.
Validate Everything from the Client — Many apps accept data from the client and use it without tightly validating it, for length, values, regex, encoding, etc. And while many apps do check these things for standard input fields, they don’t check values coming from cookies, tokens, or others ‘non-data’ sources. Developers should validate everything, all the time.
Public Paths — Many apps have too many public paths, where various APIs or pages should be protected by the authentication system, but are not. These can leak valuable data, expose other vulnerabilities, etc., and should be carefully reviewed and ideally configured in one single place (hard in some frameworks like Spring Boot that use distributed annotations).
Session Management — Most frameworks do a good job of basic session handling, signing or encrypting cookies, etc. but fail at invalidating sessions (logging the user out) on password changes or when risky behavior is detected. Long-running hacked sessions can be a real danger and should be invalidated on changes.

App

Application logic can, of course, be full of challenges, but a few have a direct impact on security, especially around passwords, such as:

Password Policies — An amazing number of systems handle this poorly. Generally, you want long, complex passwords that users are not required to change periodically.
MFA — Multi-Factor Authentication has become quite standard and should be implemented wherever appropriate, noting that SMS MFA in particular is sometimes vulnerable to hacking and should not always be trusted.
Re-Auth/MFA on Critical Function — Apps should re-authenticate or validate an MFA code before doing critical functions, such as changing passwords, adding users or changing security configurations, deleting data, etc.
Search Data Leakage—Ensure that app search systems don’t allow full wildcard searches (*) that will often return all data, all users, etc. Also make sure the data returned does not leak IDs such as user IDs, internal IDs, etc.

Repositories & Deployment

Modern CI/CD systems are great at managing application builds, tests, and deployment, but are increasingly targeted by hackers since they are often misconfigured or insecure.

Below are items for Github, which also apply to most other CI/CD and/or repository systems:

Use Groups — Developers (users) should be in groups and all permissions assigned via the group, not individually.
Limited Admins — Most developers do not need admin rights on the GitHub repo or CI/CD system. They can be Contributors, and systems like Github are increasing their permission granularity to further enable proper permissions.
Disable Forking — No need for this on private repos.
Protect Branches — Protect your key branches (usually main, dev, etc.) and make sure most developers are not Admins (which can override).
Action Secrets — Never hard-code secrets in Actions; always use the Action secret system.
Action Permissions — Restrict Action permissions, usually to at least Read-Only (unless they commit updates or assets).
Action Read-Only — Actions can be set read-only early in the workflow definition, which helps avoid issues from later malicious actions.
Dependbot — GitHub has a good free dependency analyzer that can notify users of old or risky dependencies. This should be enabled and used.
Fine Grain Access Tokens — This new GitHub feature allows specific access for remote access, for developers, CI/CD, 3rd party tools, etc.
Policy Tools — GitHub has new security policy tools such as Safe Settings and AllStar that should be evaluated.

Secrets

Credential and secrets management is a challenge for all developers. There are many ways to do it, but the most important is that secrets remain secret, i.e. are never committed to code repos, shared insecurely, or exposed anywhere to anyone unauthorized.

Not in Code — Secrets should never be in the code or the related config or property files. Secrets should be loaded from environment vars, secrets managers, etc.
Production — Secrets for production systems should not be available to most developers, in that they should be stored in production secrets managers and obtained at run-time.

Browser

For most apps, the browser is the all-important interface, application engine, and security barrier for many functions. And while improving, it’s often neglected, or has its powerful security controls misconfigured.

Console Errors — Related to security, many apps have a lot of browser messages and errors. This makes it very hard to know what is a real error, possible hackers, or hacked libraries. Developers should try hard to run clean in all common browsers, and collect & review console logs to look for errors, CORS blocks, JS errors (that can be exploited), etc.
Cookies — Apps should set HTTP-only on all cookies, especially for security, to prevent them from being accessed by JavaScript.
Session Cookie Expiration — Many apps that use session cookies set them as session-only, which the browser should discard, but often retain as part of their session or tab reload, such that a hacker could steal them. Systems should set short expirations on session cookies and have a refresh process for a seamless user experience.
File Upload Checks — Apps should very carefully validate file names, and file extensions, and most importantly, separately verify the file is really the type the user says it is (often via file type tools that actually read the file headers). Otherwise, hackers can send scripts with .jpeg extensions, etc. Filenames also should be sanitized, so if they are stored locally, they don’t cause issues with directories, illegal characters, etc.

HTTPS & Headers

HTTP is the key transport for nearly every application and must be secure, literally end-to-end. But SSL, headers, etc. are often neglected, or initially set up, but then atrophy over time.

Header Analyzer — There are several analyzers, such as the Mozilla Observatory, which should be used to do an overall check of the HTTP header situation and problems.
SSL Analyzer — There are several analyzers, such as the SSL Labs tool, which should be used to check the SSL certificates, ciphers, and other key configuration items.
DNS CAA — Still fairly rare, consider setting DNS certificate authority to help limit man-in-the-middle or other possible break-ins to the SSL stream.

Summary

There are many, many more things to look at, both in general and for specific languages, clouds, frameworks, industries, etc. but this is a quick start at initial fairly-obvious things to look at first.

I’m Steve Mushero, fractional CTO for early-stage startups — I help CEOs and CTOs build confidence in their product, processes, and people. See more at SteveMushero.com and my profile on LinkedIn.