How to avoid common network misconfiguration risks

Network misconfigurations aren’t simply inconvenient mistakes that might disrupt operations for a short while; they can be serious security threats. Gartner predicted that 99% of firewall breaches will be caused by misconfigurations in 2020, while our cloud security survey found that 42.5% experienced a network or application outage in the preceding 12 months, with the two main causes being operational or human error in managing devices, and device configuration changes.

Misconfigurations have hit the headlines in recent years. In July 2019, it was announced that the personal information of more than 100 million Capital One customers had been stolen, thanks to an internal misconfiguration. In 2017, millions of Verizon customer records were left exposed because an Amazon S3 storage server was accidentally left unprotected, while marketing analytics firm Alteryx suffered a similar breach, of 123 million customer records, the same year.

A single change to a network device can have far-reaching effects on your business; creating security holes for cybercriminals to exploit, preventing you from passing crucial regulatory and compliance audits, and causing costly outages which can bring your business to a standstill.

As such, the people who own security within an organization – security managers, network managers and so on – need to strike a delicate balance between protecting the organization’s assets and ensuring business continuity by enabling critical applications to function normally. In the midst of all that, the risk for misconfiguration owing to human error is a constant concern.

How can a device be misconfigured?

Let’s take a closer look at how such misconfigurations can occur – and why their impact can be so devastating.

Imagine network traffic is being filtered by a particular firewall. The organization needs to allow the traffic to move from a new web server to a database server. This looks like a simple task. All the manager needs to do is enter the command line in the firewall and add a new line to enable the traffic. There will already be a large list of firewall rules in place, of course – perhaps thousands.

Now imagine that in this new line, the manager in question types ‘neq’ instead of ‘eq’. It’s a single character. It’s very easy to do.

But in doing so, not only have they ensured that the web server cannot access the database required – that is, the single aim of the task has not been achieved – but they have also opened up that database to traffic from any other service on the network. Tens of thousands of ports could be available.

If a similar incident happens in the cloud, it can mean that traffic from anywhere on the internet is allowed to access a particular network service without going through a firewall, which means the potential for criminal exploits is enormous. Furthermore, if the original application is still working, it can be very difficult to actually identify that anything is wrong – until a breach actually happens.

An alternative cause of misconfigurations can be a clean-up of security policies which accidentally removes a rule that is still required by a critical application – once again, this can take a long time to find and resolve. A further cause of misconfigurations can be miscommunication between different teams in the organization – one team asking for a particular network change, and another team misinterpreting that request, for example.

How can organizations avoid misconfigurations?

There are many strategies to avoid misconfigurations. Some have to do with methodology – carefully separating out the duties involved in managing the network, designing and implementing change requests. Some have to do with processes, and ensuring the right levels of checks, tests and validations after a network change. Some have to do with human resource, and ensuring you have the right expertise in-house.

The problem is all of these approaches are resource heavy. They take time and slow down the business, all at a time when agility and scalability is more important than ever before.

So, as with so many business challenges where the goal is speed and accuracy combined, the solution has to be automation.

How does this work in practice? A comprehensive automation solution should begin by mapping all of the traffic flows within the organization – but, crucially, from a business application perspective. In other words, it should focus on business enablement – which applications does the organization need to operate, and which traffic flows do those applications depend on?

Next, it should define a risk matrix that covers the connectivity between the different zones within the business network. In turn, this means that when a network change request is made, for example to support a new application or amend an existing one, the solution can determine automatically whether the change should be approved, or whether it creates a new risk that might need manual checking and intervention.

From there, the solution should man the change – either adding a new security rule, modifying an existing rule, or escalating the change request to a member of the team. A fully-automated solution can use the security devices’ APIs to push changes without human intervention, provided pre-determined risk levels have not been exceeded. Finally, a validation phase needs to take place, with checks run along every device in the new traffic flow’s path.

Security and scalability in one

Device misconfigurations create security and operational issues and automation is critical in order to prevent them. A business-driven approach to security device configuration involves starting from taking control of the application and holistically managing security policies in line with those applications’ needs. A comprehensive, intelligent automation solution can not only ensure operational continuity, but it can also shore up security and make it easy to achieve and continually demonstrate regulatory compliance.