In multiprotocol label switching (MPLS) VPN security discussions, the general statement often heard is, “MPLS is not secure, because a simple operator mistake (such as the misconfiguration of a route target) can break VPN isolation.” Such statements display some fundamental misunderstandings, which this white paper will attempt to explain.
Another similar example may illustrate why operational mistakes are not an argument against a certain technology. Assume an operator makes a mistake in a firewall configuration that accidentally leaves a security hole open. Nobody would argue that firewalls are insecure due to such an act. Since the operator has the authority to make changes, the operator implicitly has the authority to make mistakes. These examples help to show why operational problems are a different category of security. Strictly speaking, you cannot trust your network operators, which can present a very difficult problem.
Three Components of Security
Security depends on three components, each of which is independent of the others:
Architecture (or algorithm) set in place: This is the formal specification. In cryptography it is the algorithm itself, in the case of MPLS VPNs, it is the formal specification (as defined in RFC4364).
Implementation of the architecture or algorithm: Refers to how the architecture or algorithm is actually being implemented. Programming mistakes, such as buffer overflows, can affect this component.
Operation thereof: This includes operator issues, such as choosing weak passwords on routers or workstations, or accidental disclosure of a shared key. For example, configurations could be sent to untrusted third parties.
Note that these components are not specific to networking or even computing. Physical security has the same three fundamental properties and the possibility of failing any of them. A door lock for example, can have weaknesses in the design (for instance, constructed using the wrong material), manufacturing mistakes (for instance, not fixed properly to the door), or operational mistakes (for instance, leaving the key under the doormat).
The main conclusion is that an operational mistake, such as a misconfiguration in an MPLS provider edge (PE) router, does not automatically imply that the architecture is insecure. Misconfigurations can happen in any technology, which means operational security measures need to be in place to catch such issues.
There are two types of operational security problems:
Accidental misconfigurations: These are accidental in nature, and are by far the most frequent type of operational issues. Mistyping a value (such as the route target in MPLS VPNs) is one example, or forgetting statements in a firewall is another example.
Deliberate misconfigurations: These are deliberate in nature but vary in their degree of maliciousness. For example, violation of the security policy to allow an operator's home system access through the corporate firewall is not as likely to be as severe as acts of sabotage by a disgruntled employee.
The impact of misconfigurations of either type can range from little or no effect to catastrophic. This is especially true in the case of accidental misconfigurations, where there is a reasonable chance that the true extent of the resulting security breach is not even discovered. It should also be noted that only a minor fraction of possible misconfigurations will actually result in a security breach. Naturally, deliberate misconfigurations will be more likely to result in a breach, since there is malicious intent with a goal to break the security policy.
Currently, in the case of MPLS VPNs, the biggest concern in the industry is accidental misconfigurations. The likelihood that an operator mistypes a route target, or makes other configuration mistakes cannot be overlooked. This type of mistake could cause a given VPN site to become part of another VPN, breaking the separation of both VPNs. When this happens by accident, it is unlikely that either side will discover the true nature of the incident. The misplaced site will usually realize that it cannot reach its business applications any longer, while the other VPN may not notice the breach at all, unless there is address space overlap or some routing issues due to the new prefixes. This issue is a serious concern of VPN users.
Operational Security Measures
The typical reaction when looking for a solution to a security problem is to look for features to configure. It is important to understand that operational problems cannot be fully solved by features, because the person making the misconfiguration may also remove the feature that is meant to protect against misconfigurations. Operational problems require operational solutions, and operational competence of the organization. Operational solutions include:
Operational security policy: There should be clear guidelines on what operators are allowed to do and what they are not allowed to do. Escalation paths need to be defined that outline the steps to follow if an operator does not have the authorization required for a specific action. The operational security policy should clearly define the responsibilities and authorization, as well as disciplinary actions in case of breaches. The policy also acts as a deterrent against deliberate misconfigurations.
Change management process: Every company running a network should create precise processes that define and control how changes to the network are executed. The state of the hardware, operating system, and configurations should be monitored, and all changes should be logged and executed in a controlled way. The logs should be evaluated and checked for potential misconfigurations. The logs can also be used to demonstrate a deliberate breach of the operational security policy. (For this, the concept of dual control is important and is discussed below.)
Access control: It is a good practice to restrict access to network devices. Access restrictions are traditionally implemented in networks via AAA authentication. This security measure is typically executed, although in many networks too many operators have access to network devices. Restricting this number to the minimum amount of operators necessary reduces the risk.
Authorization: The access an operator has should be restricted to the minimum access needed for the operator to do their job. In most cases it is not a good idea for all operators to have full-enable access (level 15) to devices. This practice can be more difficult to implement; however, simple distinctions, for example, who can and cannot enter configuration mode, go a long way.
Dual control: Security control and network control should not be the responsibility of the same group. Ideally, a security group controls who has access to what, and a network group executes the configuration actions. Typically the logs are controlled by the security group. This way it is much harder to deliberately misconfigure devices, since the security team could recognize a misconfiguration in the log files.
Secure and verify: All of the above measures are active attempts to detect a change in the network, such as a configuration change. It is also possible to detect policy violations by analyzing the traffic on the network, or the state of dynamic information such as routing tables, ARP tables, etc. For example, intrusion detection systems can create alerts when flows are seen on the network that do not correspond to the policy. There are many other ways to monitor for traffic anomalies. For example, Cisco IOS NetFlow can be instrumental in detecting misrouted packets on the network and routing tables can be checked for missing or unknown routing prefixes.
Automation: It is generally recommended to automate processes and procedures, specifically recurring verification processes, because humans tend to overlook details in log files and similar processes. Automated processes are also less likely to make mistakes, although if a mistake does happen, it is often systematic and therefore easily detectable.
It can be very difficult to implement a comprehensive operational security environment, and some measures (such as dual control) can require a certain organizational size to work properly. The goal should be to carry out incremental improvements to the overall operations process. For example, precise command level authorization schemes can be difficult to deploy and expensive to operate in large networks. Other parts of the operations process are much easier to enforce. For example, one such mechanism is a dual control system. By sending all access and configuration logs to a separate log server, to which the network operators do not have access, is a step toward discouraging deliberate misconfigurations of network devices.
Extending the network to third parties, by either outsourcing parts of the network or certain network management aspects, or by providing extranets, requires third parties to comply with the operational security measures. This adds significant complexity to the operational security policy.
Defense in Depth
The key issue with many operational control functions is that they may not always prevent mistakes from happening. They may make it harder for mistakes to occur, but a large part focuses on the detection of the mistakes after they occur. This may, to a large extent, solve deliberate misconfigurations because an engineer would probably not violate the security policy if it is known that the “attack” can be detected and traced back to the engineer. But it is not always possible to proactively avoid mistakes. Obviously this causes security concerns.
Many organizations consider additional security measures, so that the overall system is more resistant against misconfigurations.
To maintain separation when the network core does not provide full separation, potentially due to a misconfiguration, IPsec may be considered in addition to MPLS VPN. GET VPN is a variant of IPsec, which is particularly suited to run in addition to MPLS VPN. If an organization runs an IPsec VPN on top of an MPLS VPN, operator mistakes on the MPLS core will not break the separation of the VPN, because it is additionally protected by IPsec. However, this poses an additional cost and operational burden. Some organizations choose to deploy two independent firewalls with different operational groups, so that no single mistake or misconfiguration can affect overall security.
The use of several layers of security is called “defense in depth” and is a common model in security deployments. However, adding additional security layers should not be done without a proper risk analysis. It is important to understand the threats, their impact on the organization, and the cost of the additional security measures.
A risk analysis should determine whether the cost of the additional security measures is in balance with the cost of the actual risk without the additional security measures. In other words, a risk analysis should determine whether the risk’s impact is large enough to justify the extra cost of the additional counter measures. However, such a risk analysis should account for the entire network including all of its assets and current counter measures. A proper risk analysis requires significant resources.
Complexity and Security
The complexity of a network makes operational mistakes and security violations more likely. This applies to both the network architecture, as well as to the methods that are in place to protect the network. From a security perspective, less complex configurations are usually preferred.
This perspective also applies to the operational management of the network. Very complex operational procedures are more likely to cause problems. For example, under a very complex operational procedure an operator group may not have the required privileges to carry out an emergency operation. Under stress, the immediate reaction in such cases is to disable some security checks.
There is no clear guideline on what is “too complex” as this also depends on the operational model of the enterprise. This parameter will be different for a highly skilled team than a first line support team.
The key message is that adding additional operational measures for example, command level authorization, or additional security measures, such as IPsec, increases the complexity of the network, and in some cases may actually result in lower security, because the network is becoming too complex to maintain.
Regulation and Compliance
There are an increasing number of regulations requiring certain operational security measures, such as PCI, HIPAA, and Sarbanes-Oxley. Currently these regulations are the main drivers for many operational security measures. Precise access control and authorization, as well as logging are key requirements of most compliance industry standards.
Companies considering operational security measures should verify which regulations apply to their business, and what each regulation requires.
Cisco Products Covering Operational Security
Even though operational security is a process, and less feature or product driven, there are a number of Cisco products that address operational security:
Cisco's range of intrusion detection and prevention products: IDS/IPS products can alert when traffic is detected that violates the operational security policy.
Cisco IOS: Contains a number of features that help with operational security, CLI views restrict what actions a user can perform. Login enhancements provide information on unsuccessful login attempts, etc. (http://www.cisco.com/go/ios/)
Operational mistakes can break security policies and are a major concern for both service providers and enterprises. Most operational mistakes cannot be completely avoided; however, it is possible to reduce the risk of a mistake. The ability to detect a mistake and trace it back to its source could also deter insiders from making malicious misconfigurations or help to quickly detect operator mistakes.
Industry compliance regulations require certain operational security measures. Network operators should check which regulation applies and verify that the required measures are in place.
It is often possible to provide additional security measures that are not fully dependent on operational mistakes. However, before implementing additional security measures a formal risk analysis should be performed to balance the cost of the additional measures with the cost of the risk incurred due to operational weaknesses.
Michael Behringer (email@example.com) Distinguished System Engineer
This document is part of the Cisco Security portal. Cisco provides the official information contained on the Cisco Security portal in English only.
This document is provided on an “as is” basis and does not imply any kind of guarantee or warranty, including the warranties of merchantability or fitness for a particular use. Your use of the information in the document or materials linked from the document is at your own risk. Cisco reserves the right to change or update this document without notice at any time.