DevOps for Early-Stage Startups
Technical things to think about when you get funding
Introduction
This guide is about the technical, procedural, system, and security issues around software development and deployment that you should think about as your startup gets started, especially once you have developers building & deploying code.
Most startups work to get up and going quickly, build prototypes and MVPs, get users, and get going, as all that propels them towards getting the all-important funding they need to keep going.
However, most also miss many key things they need along the way, in ways that cause real problems later, from intellectual property to code possession to security, performance, technical debt, and much more.
This guide is about how to balance and manage those things in a prioritized way to help ensure short and long-term success.
Target Audience
This guide is for very early-stage startups, around the time you have your initial angel or pre-seed funding, and importantly, have a few developers working and some code running.
This guide is meant to be a shared resources between the CEO and the senior technical roles, which may vary based on the company’s founders, who is most technical, etc. The CEO should be making sure those in charge of technology are aware of and managing the issues raised in this guide. Likewise, the technical folks should be on top of all these issues as early as possible.
Obviously, each startup and startup team is different, and some may be well-versed in these issues, while others are less so. This guide is mostly for those who are not already deeply experienced in these topics. It’s also for founders that come from larger organizations, where much of this was done for them, and thus it’s useful as good starting points.
Typical Scenarios
Most startups start with a couple of founders and an idea, then proceed to quickly build something to prove that idea, get users and feedback, get funding, and then take over the world.
If one or more of the founders is technical, they’ll often do the first coding, building the prototype, or perhaps enough to get an MVP, but often external developers are involved very early on. These days, those external resources are often part-time and/or from off-shore outsourcing companies that purport to deliver more features and code at a lower cost.
These part-time and 3rd party resources are fine, but they almost always fail to follow the best development or DevOps practices, especially on security, ownership, and automation, making it very hard to reliably develop and deliver software in the early days.
Software Development
For most early-stage tech startups, software development is the core process they must master, and the sooner they get on the right road, the sooner they can quickly & consistently deliver features to their users.
The trick is to do this in a way that matches your stage and resources, as Google’s processes are hardly useful to a 3-person company. However, there are many basic good practices that even a single developer should get going from the beginning, as this really avoid later problems, and makes development both better and faster.
Getting Started
Writing code is, in many ways, simple — get a private GitHub account, a development environment, and write code! And you should start that way, as getting going is important, but don’t forget to also chip away at other suitable best practices, outlined below.
Security, Users, and Keys
Before we get started, a quick word about users, passwords, and all that. The absolute rule, to be observed above all others, is NEVER SHARE usernames or accounts among your team. Many startups share users & passwords, including with their 3rd party developers or part-time staff, and this creates endless security risks and other nightmares later. Do not do it.
For EVERY service, system, and tool you use, create a separate username and random password (or key) for it. Save those in a password manager such as 1Password. This will make your life so much easier later, especially when your early teams leave and go on to other things. In 1Password and similar systems, you can create different vaults, such as for developers, finance, etc. and put the root/admin credentials in a separate, tightly-controlled vault only open to founders & very key staff.
And when someone leaves, make SURE to disable these users in every system — this is very hard to do in practice, as you will easily have dozens of systems to keep track of even in your first few months. But at least be sure to remove users from your code, build, docs, and infrastructure systems ASAP when they leave. And check those systems every month or so for lingering users that need purging.
On a related note, you must never, ever let developers hard-code security credentials, passwords, API keys, or other secrets in the code or anything that is checked in, ever. Just never do it. Always use external configuration, such as environment variables (usually the easiest to start with), or a secrets manager. This is such a common mistake made when people are in a hurry that you must continually guard against it (and later automate checking).
Two & Multi-Factor Authentication (2FA/MFA)
You should enable and use 2FA wherever possible, especialy for internal use and developers. This may not be available on all services and can get annoying if it’s needed 25 times per day, so you have to balance security vs. annoyance factor. But at least enable it on key infrastruture and administrative accounts, i.e. any user that can cause damage.
Github
The first thing you need is a git repository, which today is usually done on github.com, though some people prefer the more powerful gitlab.com system; they are quite similar, and both have free plans you can start with.
Whatever you use, make sure it’s a PRIVATE repository, with SEPARATE users for each person, system, or product that will access it.
You can decide how many repositories you need, but separate ones for the frontend, backend, and infrastructure are common when getting started (this greatly depends on your tech stack).
Control access via github/gitlab users in groups (e.g. ‘developers’) so you can easily remove access when they depart. You generally do not need to grant individual permissions, except for GitHub Actions which need higher level permissions.
Branching
When and how to branch your code is often controversial, with no best way, though feature branches and merges to dev, test, and main are probably the simplest to start with.
Regardless, if you branch, try to name branch and merges with some standards, usually and especially including the task or ticket number when you are using a task manager such as Jira (which you should be doing). This will greatly ease the management of the build, merge, deployment, and tracking processes over time.
README.md
Fall in love with README files in your Git repo — write them for key developer things, especially how to set up a new development & deployment environment for each new developer; this will save much pain later. Also create a README for the system environment variables, feature flags, and other options that are needed to run and configure it. Keeping these up to date will save a lot of time.
Tooling
Broadly, your tooling is up to you and has to fit your technology stack, comfort level, and where or how you will be deploying (more on that below). But to get started, start with the tooling you know such as a development environment, possibly a build environment like Jenkins or other CI/CD, Docker if you’ll be using it (you probably should), and so on. More on this, below.
Deployment
It is very important to be deploying your application to a cloud environment as early as possible, ideally right from day one. Do not just test or look at the system on developers’ laptops, as this will hide all sorts of problems that will slow you down later, especially as the team grows and others need to see or test the system.
You may not have the resources to build a full beautiful deployment system right away, but do get your developers in the mode of committing, merging, building, and deploying to some server somewhere as that process will force lots of best practices early on.
This is especially important with 3rd party developers — do not let them develop for weeks or months before building, deploying, and showing you a running system. It’s easy to let them build for a while and promise things later, but you must see things really running as early as possible.
This also forces the team to work on how and where things will be developed, including how to configure the different environments, avoid hard-coded elements, etc. Otherwise, they’ll have to figure this out later, leading to significant delays and cleanup.
Slack
Everyone uses Slack, often for everything. In concert with email and perhaps a few other whiteboard or documentation systems, it’s often the core day-to-day communication channel. This is especially true for today’s remote workforces, as Slack becomes the common water cooler for everyone.
Much has been written about Slack and how to use it, and for early-stage companies, it’s probably wise to use it as much as possible, with channels by department and team.
The threads feature is especially helpful for discussions, though it gets overwhelming when there are dozens of things going on, so try to close out threads as quickly as possible.
Keep in mind that Slack is a little hard to search, depending on your subscription plan, and is not as good as email for summarizing and documenting the how and why of various decisions, especially years later when you are want to look back at something. This means summarizing things in an email from time to time, especially when the Slack discussion was in a small team, but you then want to tell others about it.
Many of the tech teams’ tools, such as CI/CD, monitoring, etc. will integrate with Slack, which can be a good idea but can also get noisy. However, they are easy to turn on/off, so if it makes sense, go ahead and enable a few as you go.
You might consider SlackOps, which often uses robots and cute tools to do things automatically, but these are hard to do well or securely and are best left to much later.
Documentation
Decide early on where you want to main developer and system documentation. This can vary a lot, from markdown files in the repo to Google Docs to Wiki systems like Notion.so — ideally you can stick to one place and format for docs, though these tend to vary over time — the most important thing is to get used to writing things down, tracking the system physical and logical architecture, how things work, a team directory, build and deployment process, etc.
YAGNI — You Aren’t Going to Need It
You should start doing lots of best practices and building up things as you go, as outlined below, BUT you should not overbuild or get stuff you don’t need. Known as YAGNI, or you aren’t going to need it, it’s the sin of overbuilding infrastructure, over-designed features that are ways out in the future, etc. Don’t over-build, and don’t over-engineer.
Ownership
It’s critical that you set up and own all your own SaaS and service subscriptions, especially for anything software development, code, and cloud-related things. It’s very easy for your part-time or 3rd party developers or others to do this for you, in their own accounts to just get going, but that’s a very bad idea, as they can withhold access, etc. if there is a payment or other type of dispute. Do not let others own anything or you’ll likely regret it. No personal services, period.
If developers or 3rd parties ask for services, SaaS tools, etc., get a list of what they need and then set up your own accounts, creating users for them while you retain (and keep secret) all the root or admin passwords — this can be a challenge for non-tech founders, so get trusted help if you need it.
Architecture
Software and system architecture is highly variable, depending greatly on the problem being solved, where the system will be deployed, the scale of the system, the experience of the team, and the technology stack involved.
Below is some general advice, though the overall rule is KISS, keeping it simple. The simpler, more mainstream, and straightforward your system is, the easier it will be to build, manage, and evolve. In fact, the more boring your code and systems are, the better.
Programming Languages
Choosing a programming language is often a religious war, but generally, you should choose from a very small list of the most common languages and environments. Using a popular language will make it much, much easier to hire, find tools, use 3rd parties and 3rd party libraries, etc.
There are many sexy languages to probably avoid, such as Erlang, Smalltalk, Koitlin, Haskell, or other cool, but very hard-to-staff languages. And while Java is good, you should probably avoid it unless you have a very good reason, such as enterprise software.
PHP is still common, but Go is taking over for lots of applications (though not always for web frameworks). C#, Ruby, PERL, and others have faded from the scene for new systems, and Python is not as popular as it once was (though still the standard in data science and Machine Learning).
In practice, most new systems are built in JavaScript/Typescript, Go, Python, PHP, and/or Java.
Regardless, try to use a good framework, like SpringBoot for Java, Laravel for PHP, etc. that provides dozens of built-in services and best practices, especially for security, database handling, logging, etc.
Also, try VERY hard to standardize on one backend language (your frontend will almost certainly be JavaScript). Do not let developers choose different languages for different services or functions, or based on their backgrounds, as this creates an unmanageable (and un-staffable) mess for small companies.
This means picking something and sticking to it, even if it’s not great for every use or role you have (e.g. I’ve written a lot of batch systems in PHP; not because it’s good for that, but because our core system was PHP, and thus our developers needed only one skill set; it worked & scaled just fine).
Note mobile applications for the iPhone and Android are whole separate ecosystems, with their own languages, tools, and processes that are outside the scope of this guide. As always, find and follow best practices.
Esoteric Services
Avoid fancy new or esoteric services like the plague. This includes the latest databases and other systems that are on the bleeding, or even leading, edge, unless they add some very unique value to your solution (unlikely).
New and esoteric services rarely live up to their claims, but they really complicate software development and especially deployment and operations. They are also hard to hire for, and no one has experience in how to deal with them, especially later at scale.
MicroServices
Avoid microservices in early-stage systems, as they are not worth it. They greatly complicate development, destiny, deployment, and troubleshooting are not something you should generally mess with.
Build a so-called monolith, or at least only break up your system into large macro chunks that make sense, but do not overdo these spits. Wait until you have product-market fit, a large team, and more complex needs.
Never Build What You Can Buy
You should also not build what you can buy, especially in these days of plentiful SaaS and other services that are everywhere, often for free or at low cost. Almost everything we used to have to build is now at your fingertips. And you can always build your own later when your scale or needs exceed what’s easily buyable (by which time you’ll have more resources to build it).
This includes infrastructure and services, but also very code-application things like authentication, mail sending, IoT fleet management, and much more.
Off-Shoring Development
Many startups outsource some or even all of their early-stage development, and many push this offshore to save money. This is fine, but there are some additional considerations when using outside development companies.
Code Ownership & Control
A critical element of any 3rd party contract and arrangement involving software is who owns the code and who controls access to it. Most contracts are pretty good at who owns it, but be sure you own it for real, and the 3rd party or offshoring company has no rights to it, now or ever (i.e. they can’t use it in other projects, etc.)
More important is where the code is and who controls it. It’s very common for 3rd party developers to set up your first git code repository, owned & controlled by them, but this is a very bad idea as they can withhold access, etc. if there is a payment or other dispute.
Always make sure the code resides in your repository and your account, at github.com, gitlab.com, etc. There should be no exceptions to this, right from day one.
Tooling
As mentioned above, to get started, just get started with a development environment and private code repository, e.g. GitHub and a development environment and basic tools (like Gradle, Maven, webpack, etc.) and go from there.
Development Processes
Different companies will use different processes to build software, based on the founders’ experience, but all will generally be so-called Agile processes. This means rapid and flexible software development and very frequent deployment, with very short development cycles, from hours to days, so new code is designed, built, and deployed much more rapidly than in traditional processes.
The key is to be Agile, and to use a very simple process that works for you. Many companies start with Scrum processes, but these are falling out of favor due to their complexity and overhead. Instead, try a simple Kanban system, which maintains simple lists of the next high-priority tasks in sorted order, frequently updating the tasks and priorities.
Development Tracking
It’s important to track and manage your development work, even for a single developer. Plus, you’ll find it much easier to expand your team and scale if you start off with a good system. By far the most common is the Jira system, which creates simple issues or tickets for everything you want to do: features, bugs, enhancements, etc. and gives them a priority, owner, assignments, etc. Other popular systems inclue Trello, Asana, and Pivotal. You can also use GitHub Issues for basic bug and issue tracking.
Use this system religiously to enter & track ideas and every task, as you can document how and why things happened, bug notes, etc., and the issue numbers are used through the development system (e.g. branches, merges, and releases) to know what was done and fixed when, where, and by who.
CI/CD Systems
The CI/CD (Continuous Integration / Continuous Delivery-Deployment) concept means different things to different people, and can be used in different ways. As a startup, you want to focus on a few core tasks you need to be done — the goal is a simple one-click centralized build and deployment system — this will greatly speed development, adding developers, plus it forces all sorts of best practices along the way.
CI — Continuous Integration
CI systems continually merge in, build, and ideally test your code as it’s committed to your repositories. This forces developers to deal with conflicts, failures, and problems on a near real-time basis, while things are fresh in their minds, with the goal to ensure the code is always as clean, stable, and functional as it can be.
Developers should be committing and merging their code as often as possible, usually at least once per day for commits, and every day or two for merging. This forces them to deal with other developers’ code, conflicts, etc. right away.
Initially, developers will also build various ways on their own laptops, but this should be rapidly standardized with build tools such as Maven, Gradle, etc. which then everyone uses.
And while developers will use this locally, you’ll want to migrate the process to a centralized build system as soon as you can, connected to your git repo, so the process is run automatically for all new code as it’s checked in or merged. Note developers will also continue to build locally for many years to come, as they need this to do testing.
Automated builds are usually done via an automated build system, such as Jenkins, CircleCI, GitHub Actions, TravisCI, and others. These platforms are the key to your CI/CD system.
Be sure you set these systems up correctly and securely, as there are many common mistakes made that create vulnerabilities over time.
Docker & Containers
Developers will likely be using Docker Containers for much of their work in modern systems. This is a complex area, but generally, containers are a very good thing, vastly simplifying the build, configuration, and deployment processes for systems today. In fact, most systems use containers to build & test the application, which is then deployed via containers.
Code Checking
More advanced CI systems will run a number of code-checking tools to help check for common mistakes, security issues, formatting problems, etc. All this varies highly by language and technology stack, but it’s good to start introducing the basics as soon as you can. It’s especially important to scan the code for hard-code security credentials and keys, which should NEVER, EVER be in the code, configuration, or any checked-in file.
Supply Chain Checking
Supply Chain Security is an important new and developing area that you’ll see mentioned. This is making sure the myriad of 3rd party libraries and code you use in your software is safe, secure, and up to date. Over time, this becomes very important.
This is a complicated and evolving area, and many services such as GitHub have some capabilities in this area. Overall, this is something to look at relatively early in your process, especially for the more challenging ecosystems such as Javascript (which nearly everyone uses).
Code Testing
Developers should include unit and other tests in their code, and ideally there are also higher-level feature, white/black-box and other testing, too. The CI system normally runs as many of these tests as is practical (at least all the unit tests), though this is a complex area — you should at least set it up to run the unit tests on every commit or merge.
There are also many third-party sites and tools that can run tests, especially for the UI, etc. but these get expensive and complicated, so their use depends on your situation.
CD — Continuous Delivery-Deployment
The second part of CI/CD is the delivery or deployment phase, usually bundled in the same CI/CD tool, and focused on deploying the built and tested the application to its environments, such as for dev, testing, or eventually, production.
The normal progression is to build every code merge and deploy it to a development environment as part of Integration testing, then later deploy it to a QA or testing environment for users to look at, and finally, to production for real users. This may vary based on your situation.
Ideally, you will be deploying the exact version of the application, container, etc. in all these environments, but this is often not practical as the dev/test versions are usually built with debug flags, Javascript comments, etc. that you don’t want in production, but you should still strive to build and deploy as identical versions as you can, to ease in production troubleshooting.
Developer Data & 3rd Party API Services
Modern applications rarely stand alone, instead depending at least on a database or two (and their data), and often many 3rd party services (APIs) as part of best practices. However, this greatly complicates the developer experience, as they often need most, if not all, of these services available to build and test their code each day.
For 3rd party APIs, there are several ways to manage them, and a very important aspect is to have DEVELOPMENT accounts, keys, and environments for all your 3rd party services. This is so developers don’t need production service access, and to make sure they are never seeing, nor modifying, production data (and maintaining GDPR and other data compliance).
Teams also usually find a very wide range of 3rd party services for any particular need, ranging from nearly free to expensive enterprise-grade services. Try to pick quickly, but wisely, leaving a bit of room for expansion, but also try to use what’s simple, inexpensive, and easy-to-implement today — you can evalute and upgrade later, but get going now on a budget is usually best.
Note, developers may also need mock services, such as email sending, so developers can test sending emails or notifications that never actually goes to real users (which would be embarrassing). Services like MailTrap work well for this.
Local Services & Data
Developers often run a small number of local services, such as databases, and usually in Docker Containers. The availability of containers on the laptop (and plenty of RAM) has greatly simplified this process and is popular for running things like MySQL, ElasticSearch, and many other 3rd party products.
The challenge is how to get good data (and data structures) into these local databases — ideally, the code has automatic structure migrations for this (which is needed for production and automated testing, anyway).
For local development databases, some companies use a copy of the production database for development and testing, but this is a very bad practice, as it creates security risks, and often violates rules such as GDPR and CCPA regulations on data privacy.
It’s much better to either use mock test data or use a scrubbed and anonymized (remove all PII data, e.g. names, addresses, emails) database extracted from production. This is sometimes hard to do for every developer, so many teams also use shared services, below.
Shared Services
More complex systems often need bigger services that are hard or impractical to have on every developer’s laptop. In this case, teams will usually deploy shared databases, etc. to a private development environment that developers can reach via VPNs, ssh tunnels, or great networking tools such as Tailscale. A development environment is usually provisioned for this, so the developers and the deployed development stage code share the same data.
Developers also often need public 3rd party APIs, such as for billing, sending mail, authentication, mapping, and many other things. As noted above, try to create dev/test environments on those 3rd party sites (the best ones allow this), with separate users or keys for each developer, ideally with limited permissions for only dev/test data.
Testing & QA
You need to test your software, or your users will be testing it for you. The challenge for early-stage startups is good testing resources are expensive, and automated testing is complex and expensive.
The best scenario is a dedicated testing team that tests each feature, bug fix, etc. after each is merged in and deployed to the dev or test system, providing feedback and status info via the ticketing system such as Jira.
However, the usual early-stage scenario is no staff nor money for QA, so the product owner, management, and other staff often test as best they can. That’s fine, as long as they document the bugs and how to reproduce them, etc.
Regardless, it’s critical that developers test their fixes and features as best they can. Sadly, many do not test very well, leaving bugs to be found by testers, product people, or users, and delaying real fixes by days, weeks, or more. You should insist that developers test fairly extensively, and follow up with them when bugs are subsequently found, in an effort to avoid future recurrences.
One recent useful tool for UI testing that can be used by nearly anyone is RainForestQA, which can do basic login, click, feature test, data entry, etc. Like any automated QA tool, you’ll need to work to keep the tests up to date with all the frequent changes your team makes.
Infrastructure & Deployment
Once you have some working code and an application to deploy, you need some place to actually run it, usually on cloud infrastructures such as AWS, Azure, or Google GCP.
This can start simple, but will rapidly get complicated as you get near any sort of production deployment, with a lot of small, and not so small, items that need to come together correctly to deploy a system.
More importantly, once some infrastructure is built, it will rapidly and often change as you grow, find new needs, change elements of your tech stack, etc. This change is usually where the problems start to creep in, with security, cost, and stability usually becoming issues.
Getting Started
Start with a set of cloud accounts that you own (not your 3rd party developers, nor employees). The exact structure will vary by cloud, but one account is probably good to start, with provisions for at least three or our completely separate environments: dev, test, and production, plus maybe staging later.
To get things going, teams often manually create a dev environment to get some VMs, Docker, databases, etc. up and running on day one so they can deploy and test stuff. That’s fine, but you should move to a separate, and carefully controlled, set of environments as soon as possible, ideally in a month or two, and definitely before you do production deployments.
As always, give each developer and 3rd party system or API separate users and credentials, so you can later control their access, deactivate when they depart, etc.
You might also set some cost alerts on the account so you have some sense of money being spent, and in case a mistake suddenly sets up a $5–10,000 month resource you’re unaware of.
Infrastructure as Code
Modern infrastructure is managed as code, using the IaC model, or infrastructure-as-code, usually using either a cloud’s native system, e.g. Cloud Formation on AWS, or an industry-standard cross-cloud tool like the ever-popular Terraform from Hashicorp.
Ideally, one of your DevOps engineers familiar with the tools and the chosen cloud will start to code, build, and test some infrastructure, usually for the dev system first, but sometimes starting with the production system and working backwards (especially if there is a simpe manual devevelopment system already in use).
This process will both end up with a nice, controlled, documented system, and for a number of issues to the surface, such as overall security, URL and SSL list, per-service resource requirements, database options, secrets management, log collection, networking challenges, and monitoring pans. All of these are best solved early and often, lest they blow up into major challenges just when you want to go live or have important milestones to meet.
Also, making frequent changes to the environments and expanding them for dev, test, and production will force the configurations and code to improve as it deals with assumptions, changes, update issues, etc.
Secrets Management
Managing secrets in cloud environments can be quite challenging. It’s very important to get this right or at least well-secured, or else hackers or others can easily compromise, or even destroy your entire system. Note that secrets include all users, passwords, keys, authentication data, and most critical configuration data such as host names, IP addresses, database names, etc. If you don’t want it broadcast on the Internet, it’s a secret.
Generally, try to use the secrets system your cloud provider has, and find ways to integrate it with your run-time environment. This can be challenging, but most systems can at least use environment variables, which can be connected with secrets managers in a variety of ways.
Whatever you do, never, ever hard-code secrets in your source code (and use tools to alert you if you do), but also keep them out of configuration, deployment, CI/CD, and other file and systems. Keep them in one place and centrally managed, if possible, or at least spread to only your CI/CD and cloud systems, and keep it simple as complexity is the enemy of security.
Note you can also use more sophisticated products like Vault from Hashicorp, but they are often complex to use, manage, and secure correctly, and thus are best left to later phases when you need more powerful tools.
Kubernetes
You will undoubtedly be pushed to use Docker containers and Kubernetes very early on. This may or may not make sense, depending on your team’s experience level and the size or complexity of what you are building.
Generally, you probably don’t need Kubernetes, and if you do need it, you probably should not try to run it yourself. There are many cloud-managed options available now that remove most of the burden, but not always the overhead of designing, configuring, and running a fairly complex system. Kubernetes is a bear to run and manage, so even if you use it, really try hard to avoid running and managing it yourself.
A lot also depends on your architecture. For example, if you have a basic web or SaaS application with a Javascript front-end and a backend consisting mostly of APIs from a single codebase, it’s often easier to launch with a few backend containers and a static front-end, a couple of load balancers, and that’s it. Your code can run in various ways, e.g. on VMs, Docker, or various simple container deployment systems such as Amazon Elastic Container Service or Google Cloud Run.
That type of simple deployment will leverage all the benefits of containers and fairly dynamic run-time environments without the complex overhead of Kubernetes, which you can always move to over time as your needs grow.
Monitoring, Logs, Tracing
Once your system is deployed and has any users, you’ll need to monitor and manage it. Monitoring means many things to many people, but you should strive for basic Monitoring, Log collection, and ideally some distributed tracing (usually much harder to do and often not needed).
Start with the built-in cloud services on your platform of choice, but realize most of these are not very capable, and are often hard to use effectively. It’s usually better to use a full-stack service such as Datadog that can collect and combine these all into one place, especially if you have a very small (or non-existent) team. Just watch the cost, especially on logs, as it can easily reach thousands of dollars per month if you have debug code spewing logs everywhere.
Monitoring
Monitoring can get overly complex and unwieldy as folks like to monitor everything. Start by keeping it simple, and focused on key metrics that impact your users or the system. Basic monitoring services will automaticaly pick up stuff like out of memory or diskspace, etc. and you should add a 3rd party service to monitor your public web app and APIs, as that’s what the users are looking at, and you need to know if they are down or broken.
You’ll likely want basic service monitoring such as for MySQL databases, to at least see queries per second and other load or scaling metrics that may be helpful at some point.
Beyond that, some tech stacks, notably Java, need extra monitoring of the JVM, especially the heap size and use, as many developers use inadequate default settings as it’s easy to overload or exhaust the server resources and not even know it.
Logs
Work hard to get logs from your applications into a centralized collection system, as this will help developers and operations teams really understand and troubleshoot what is doing on, especially in production when unexpected things happen. This is really important for production troubleshooting.
Good logging is an art form, but get developers in the habit of good logs from the beginning, using JSON formatted data with useful messages plus context such as what server, user, URL, task, etc. This will really help solve problems quickly, especially as you won’t have much of a support or operations team early on.
Tracing
More advanced than monitoring or logging, tracing, or what is now called Observability, is about knowing what’s going on inside your application. This is especially useful when there are many services or other moving parts, which make it very hard to know where and why something failed.
Look at Honeycomb.io as the top tool in this area.
APM, JS Errors & RUM
While monitoring and logging are important for your backend services, today’s modern applications tend to have very complex front-end systems, usually based on Javascript. Since this code actually runs on the users’ laptops or phones, it’s very hard to get good error reporting or troubleshooting info.
To solve this, a number of tools and technologies have evolved, mostly focused on the end user’s experience and what’s happening in the browser on the user’s device.
These include the broad area of Application Performance Monitoring (APM), Javascript error reporting, and Real User Monitoring (RUM) tools, from companies like Datadog. Note RUM has now become End-user experience monitoring (EUEM), which has evolved into Digital experience monitoring (DEM).
Many of these services have also merged, with new acronyms, and they can get overly complicated, but try to get basic error reporting, response time tracking, and some form of user screen recording, which can be invaluable for troubleshooting (especially with consumer-level users).
Company IT Infrastructure
Aside from your core technology, you also need to think about your general IT and things like email, websites, and especially DNS, as the latter directly intersects with your core systems.
DNS / Domains
You can register your domain anywhere, though GoDaddy seems the most popular these days and is well-known. Note that some domain endings, such as “.io” have separate registrars you have to use.
For DNS, you usually start with your registrar’s DNS, like GoDaddy, so you can get started, get email working, get your website up, etc. Then when you have cloud infrastructure, such as at AWS, you should move DNS there, e.g. to AWS Route53.
Running DNS on the cloud is more complex, but is a good investment as you’ll have good and proper control over time, which you’ll need as your system grows. Note the best way to manage systems like Route53 is via Terraform, so you can include comments on who, when, and why things were added (and deleted), and version them in GitHub for record-keeping.
Google Suite / Workspaces
Many companies use the Google workspace system, as it’s flexible, inexpensive, and familiar. You can manage your documents, email, groups, calendar, and nearly everything else there via their web interface. It’s a very easy place to start and get going quickly, usually within an hour.
Email Users & Groups
When setting up new emails for employees, use a standard first.last@domain.com format for simplicity, which also makes it much easier later to share documents, to know what email goes to who, etc.
Very importantly, create groups for EVERY new service you buy, such as GoDaddy for domains, e.g. dodaddy@domain.com or aws@domain.com for your root AWS account. Then have that group include whoever needs to know about that particular service — this makes it very easy to control your groups, distribution, etc. and the group email is the user ID you use for admin level on those third parties, so at Godaddy, you use godaddy@doman.com as the admin/root account and then add other users as needed.
Company Culture
There are many books and blogs on general company culture, but fewer on technical, security, and development culture. Even those are generally focused on hiring, engineering management, motivating the team, etc.
This guide is more concerned with the few key tenets you should instill in your team as a general rule, that have a significant impact on your technology, product, and operations.
Security
Instill a security culture right from the beginning, and from the top down. This means everyone, from CEO to office cleaning staff adheres to basic principles, where any significant violation gets an audience with the CEO.
Users & Passwords
This means having clear separate users, passwords, keys, etc. for every user and not sharing. It also means NEVER sharing users and passwords in emails or other persistent message channels, and even then, separating user and password in separate channgels (e.g. send username in email and password via SMS). Even better is a central password system like 1Password
Also, never let anyone store passwords insecurely, such as in public text files, source code, etc. Build a culture that these secrets are secret, always.
Laptop Security
As a startup, you usually won’t be able to use expensive enterprise-level network, computer, and phone management tools, so the best you can do is follow best practices, build a security culture, and generally remind folks of key points.
Educate every employee on computer security, including strong passwords, phishing scams, and always locking their computer when they are away from it. This help create a culture of security without being overly paranoid.
Also, for developers (and probably all employees), ensure their laptop disks are encrypted, backed up (with encryption), and always updated. Remind users when new MacOS, IOS, Windows, and Android updates come out so they update that day. This is is a continual reminder of the company’s focus and attention to security, both for the company and for them.
Data Security
Your data, especially information you have on your customers or users, should be protected at all times. Don’t let employees just send it around in any format or treat it carelessly. Obviously, you have to balance this with convenience and getting work done, but build a culture of data protection from the beginning.
This is especially important if you have consumer data and are under the regulations of the GDPR (Europe) or CCPA (California), which strictly controls users’ data.
Do Things Right
Startups are masters of shortcuts and doing things as quickly, cheaply, and effectively as possible. That’s fine, but there is also a balance of doing things right, or at least realizing and recognizing that something is a shortcut or likely to be problems later.
Try hard to generally do things right, including technically, whenever possible. Teams should appreciate this and go the extra mile to get it done more or less correctly. This then extends to following all the other rules, such as security, stability, etc. For if you allow lots of shortcuts in some folks will start taking them in key areas like security, too.
Code Quality
Part of doing things right means writing good code. This can be hard under time pressure, but is very important and often saves a lot of time in the end. Ensure your teams create well-structured, well-documented, logical, and easy-to-follow code from day one, as this will really help all who join later, and greatly speed future development.
You can do this with code reviews, examples, etc. but it really comes from the top when technical leaders exhibit good quality and hold others to it, too. This may also involve quality tools that help enforce good standards, formatting, naming, exception and error handling, etc. as this helps instill a quality culture from the beginning (for example Golang development shops usually get this directly from Go’s built-in quality requirements).
Documentation
Create a culture of good documentation right from the beginning. This is hard, and there is a balance of good and useful vs. dumb and wasteful, but try to get basic docs like design, code, and diagrams right from the beginning.
This will pay off over time, especially as things change and new team members join, as it dramatically reduces their time to be effective. It also greatly improves troubleshooting and customer support when teams actually know how their systems work and how to fix them.
This also means dedicating resources or time to update documentation, ideally as the code changes or right after a deployment. Get developers and others in this habit as you start.
Conclusion
This guide has covered the technical, procedural, system, cultural and security issues around software development and deployment that you should think about as your startup gets started, especially once you have developers building & deploying code.
This is a continual work in progress, but I hope it’s been useful, and feedback is always welcome.
I’m Steve Mushero — Global Technologist & Entrepreneur, working from Silicon Valley, helping startups build better software & systems, faster. Ask me how I can help you, too.
Checklist
Below is a basic checklist of key things small startups should think about and implement for baseline technology development and system management:
- Own your source code repository, on GitHub or GitLab
- Own all your accounts on all services, clouds, tools, etc.
- Limit 3rd party (outsource) access to systems
- Separate users in every system for every user
- Separate users in every system for services
- Build separate development and testing environments that closely match production
- Use an automated CI/CD system, including tests
- Use Infrastructure as Code to manage your cloud environments
- Manage secrets carefully in development and deployment
- Include security everywhere
- Use monitoring and logging everywhere, including JS error tools
- Document all the time
- Build at least basic QA from the beginning
- Don’t build what you don’t need
- Always buy when you can (vs. building)
- Use the simplest architecture you can (KISS)
- Use very mainstream programming languages and tools
- Avoid esoteric services and the latest tools
- Avoid Kubernetes unless you really need it, then let others manage it
- Use a ticket/issue tracking system for all development work
- Never share passwords in email or other persistent messages
- Set up and use Google Workspaces for basic docs and email
- Setup good DNS and eventually move it to a cloud
- Build key security, quality, and documentation into the culture