Somebody once gave me the advice that my first hire as a Security Lead shouldn't be an in-house penetration tester, because that's the one part of security you can reliably outsource.
I was biased against this, because I have been that in-house penetration tester before. As I reflected on those jobs, I realized that my role quickly shifted every time. Getting an external penetration test generally covered that need pretty well.
As a company grows, the need to bring certain functions in-house becomes apparent. If a startup is looking to an enterprise as a role model on how to structure their security team, they'll miss out on certain functions that are better suited to being outsourced at the earlier stages of a company.
As a rule of thumb, any security role which requires someone who is a dedicated specialist can generally be outsourced at an early stage startup. Many exceptions apply, and your mileage may vary.
Penetration testing should be a relatively small part of a Security Engineer's role at an early stage company. It's far less money, and far more effective, to hire a team of experts once or twice per year to come break your product apart.
An internal hire should focus instead on the things which require human relationships and a deeper understanding of the context, process, people, and strategies. This means everything from reviewing design documents, technical proposals, and pull requests. This also means triaging vulnerabilities reported by sources with less context, such as outside firms or automated tooling, to ensure that only valid issues need to be addressed by engineers.
At Vanta we use Doyensec for our penetration tests.
Responding to security alerts is an operationally intensive task. Doing so at night and over the weekend can be brutal, unless you're ready to scale out a Security Operations Center for 24/7/365 coverage. Even then, burnout is a hazard of the job.
You can significantly reduce this burden by using a managed detection and response service. They'll triage alerts, and perform their own detections, and have a human verify the applicability at all times, day or night. Then you and your team are only getting paged when you have a verified incident to handle.
At Vanta, we're using Red Canary for this. I've also heard good things about Expel.
Note: Your team should still do Tabletop Exercises to prepare to handle real-world incidents detected by these managed services.
You may find that you're early enough to start thinking about security, but aren't sure if you should be hiring your first full-time employee in that area yet.
There are some companies, like Latacora, which come in at this stage. They build out a full-fledged security program, and eventually hire you a security lead to take it over.
This can be a great option with a ton of flexibility.
To sum things up, if you can find a high quality managed service provider for the role you're considering, you should take that option seriously. Using a third-party can give you far more agility. At the early stages of a company, things can change in big ways very quickly, and the more agile your security program is, the better.
As someone responsible for security at a startup, you need to prepare for black swan events: things that almost never happen, but when they do it's a very big deal.
Security breaches, major outages, and natural disasters fall into this category. Due to their low frequency, it takes a long time for both teams and individuals to accumulate real-world experience handling them.
One tactic that is a favorite of security and reliability teams alike is the tabletop exercise.
At Vanta, we think of a tabletop exercise like a game of Dungeons and Dragons. You get a bunch of people together at a table (physically or in the Zoom metaverse), and they all pretend to go on an adventure. In this case, the adventure is responding to a black swan event.
When done right, you'll find that you can learn things about your capabilities, your knowledge gaps, and your team that you didn't know before. Then you can go and fix those things before the real black swan event. It also helps your team develop some experience in handling these low frequency events before they've spent decades in the trenches together.
You'll see a lot of references to BC/DR (business continuity / disaster recovery) and security incident response when learning about tabletop exercises. Some compliance frameworks specifically call for tabletop exercises in either or both of these. There is a lot of overlap between these concepts, but there are some distinctions as well.
Business continuity refers to your ability to continue business operations during a major event. For example, you may have invested in highly available infrastructure to ensure that your systems aren't taken offline by an outage in one region.
Disaster recovery is about coming back online after a major disaster. For example, if your database server is destroyed, how effectively can you recover backups and restore service?
Security incident response typically focuses on your ability to respond to a malicious adversary who has compromised some system that is important to your security.
Generally speaking, you'll find the first two grouped together as BC/DR.
There are scenarios that would cover all three of these concepts. If an attacker leverages an exposed AWS credential to delete your database, you would require all of these capabilities to respond effectively.
A tabletop exercise typically involves one or more people who run the scenario, a team of responders, a note-taker, and potentially some observers.
The people responsible for devising a scenario will generally plan ahead of time and map it out. It can be as simple or complex as they'd like. At a high level though, it involves some prompt which mirrors how a real world incident would come to light.
For example, "An attacker gets malware on an employee's laptop" is not a very compelling scenario. It gives too much away from the start. As a responding team, you're never going to have such a high confidence first signal without making way too many assumptions.
"The on-call Security Engineer received a Slack notification from our anti-malware tool showing that 5 different endpoints are reporting malware infections" is a much better place to start. These alerts may be false positives; the scenario could have nothing to do with malware. It could even be a hyper targeted red herring. Either way your responders have some questions to answer.
Now that you've got a scenario in mind, you can schedule a meeting and get everyone together at the table. The group running the exercise is responsible for acting out the part of "reality" for the responders.
You read out the prompt, and rather than actually digging into logs, etc. they tell you what they want to do, and you share the result.
For example, they may say "I message each of the employees in question on Slack to ask them if they recently downloaded any new software" and you might respond "One responds no, three are not online, and one says they just joined so they downloaded all of the stuff recommended in the wiki."
Another responder might ask, "What are the roles of these employees?" You shouldn't answer this. A responder can't just ask for facts, they need to ask for the effect of their actions. If they don't know how they would normally find this information, you shouldn't provide it.
A better question might be, "I look up each employee in the HR system (or Vanta ) to find their role, what does it say?" Even here you can be skeptical. You can ask, "Do you have access to that information?" The goal isn't to be difficult, it's to probe the questions that will actually matter in a real incident. If your responders don't have access to information they find important, that's exactly the type of thing a tabletop exercise can help you discover.
After your responders discover the root cause (or don't), deploy a fix (or don't), and recover (or don't), you can end the exercise. Now you should spend some time talking about what happened, why you responded this way or that, and what lessons came out of the exercise.
You should leave the meeting with a list of action items that will make the next incident response more effective, whether it's real or fake.
When Google runs these exercises, I'm confident they make them as realistic as possible and coordinate across many teams and time zones. You don't need to go so far at a startup. The goal here is to find the low hanging fruit in your response capabilities, fix them, and iterate.
For your first exercise, come up with a scenario that is simple and plausible. You can also time box the exercise to a portion of an hour long meeting. This way you ensure that everyone is fully engaged, and you leave some time for discussion at the end.
It's a small thing, but take seriously your role as "dungeon master" and hold your responding team to that same expectation. Whoever is running the scenario should keep it to themselves until the exercise has begun. Still, make sure it's fun. Be creative with your answers, and don't be afraid to joke around a little.
Make sure that someone present, ideally not any of the participants or the person running the scenario, is taking detailed notes. Every question asked by the responding team, any insights that come up in the moment, and ideas for after the exercise should be noted in raw form. After the exercise you can review them together to pull out lessons learned and specific action items.
If you're the one running the scenario, you should anticipate some of the most likely paths your responders will take. Throw in a few twists and turns, and potentially red herrings. If your scenario involves an AWS outage, check which of your vendors would also be affected. You don't have to do all the research in depth, but be prepared so that your responders need to overcome some plausible challenges.
If there are easy paths to end the exercise, make sure that you have a plausible reason for those paths to be blocked, e.g."You try to roll back the bad code change, but the deployment platform is down."
It's possible to craft your scenario such that it covers both BC/DR and IR. This is especially expedient if compliance with a standard depends on you covering both of those topics in annual exercises.
For the best results though, it helps to focus. If you're running a BC/DR exercise, you can spend more of your time focusing on the difficulties in maintaining and recovering services during an incident. If you're running a security incident response exercise, you can spend more time on dealing with the presence of an active adversary.
Consider who you'll need to communicate with. Customers, legal, IR firms, law enforcement, etc. are all fair game. Just don't dial 911 for real.
At the end of the day, the easiest way to get good at these is to schedule the meeting, get everyone in the room, and start. The first couple of times you do this, you'll learn a lot about the process itself, and you'll be able to identify some of the low hanging fruit in your response process too.
Here are some example scenarios to get you started: