Data Sharing for Energy Delivery Systems
Data and information sharing is an effective strategy that can help reduce cybersecurity threats in Industrial Control Systems (ICS). When an attack is detected, a signature can be developed and shared with others to prevent the spread of the attack and enable proactive deployment of defenses. Energy Delivery Systems (EDS) can also benefit from such threat intelligence sharing schemes. To this end, the DoE has created E-ISAC (Electricity Information Sharing and Analysis Center) which encourages participants to share cybersecurity-related information such as indicators of compromise and audit logs to help other participants identify threats early and respond quickly. However, by using these logs a clever adversary (a competing vendor or an attacker compromising the shared log database) may be able to extract sensitive information pertaining to the internal infrastructure and topology of a service provider or enterprise and use this information for negative publicity, competitive advantage, or targeted attacks. Hence, participants may be reluctant to share their data via these platforms. Additionally, it is not clear how threat intelligence generated at a particular EDS can be generalized and applied to other EDSs where the infrastructure could be different and the attack manifests itself differently.
In light of these challenges, a promising solution is to share machine learning classifiers instead of sharing entire logs and other threat intelligence data. Participating vendors would simply query the classifier and determine if their infrastructure is vulnerable to the threat modeled by the classifier in question without having access to any sensitive information. With sufficient training data, the classifier would be able to generalize the underlying attack patterns and respond to queries from different EDSs with different topologies and infrastructure. However, one challenge is that classifiers have been shown to leak information about training data via repeated querying. The extent of the leakage is governed by the underlying nature of the training data and the types of models being shared. Countermeasures such as learning with differential privacy  have been proposed in the literature, but they are mostly focused on individual privacy not enterprise confidentiality. A more detailed investigation of the space is needed to provide meaningful security guarantees for EDSs.
This activity focuses on investigating and developing secure data and information sharing schemes for EDSs based on classifier sharing. The focus will be on understanding the information leaked by shared classifiers and developing models that preserve the confidentiality of the party that trained it. To train the models, we propose using audit logs from EDSs and supplement EDS data with traditional network audit logs wherever EDS data is missing (given the similarities between the two, the classifier might be able to generalize its internal models). As evident, this is a large space involving the study of secure machine learning models, understanding confidentiality in the context of EDSs and development of protection mechanisms to reduce any associated risk to the confidentiality of the training data.
The phase-wise systematic breakdown of the research activity is mentioned below:
Phase 1: Collect audit data such as network data generated in EDS. Augment that with traditional network logs (such as from  and ) where needed.
Phase 2: Investigate machine learning models feasible for EDSs and train classifiers on the collected data. Explore the type of models that are common in enterprise networks and ICSs and leverage that information where needed.
Phase 3: Investigate potential confidential information that can be inferred from the machine learning models.
Phase 4: Develop requirements for EDS confidentiality and come up with standards to categorize data and information into various levels based on their sensitivity.
Phase 5: Develop algorithms to assess confidential information inference risk.
Phase 6: Develop protection measures to eliminate or reduce information leakage.
The ultimate goal is to create mechanisms for protecting enterprise confidentiality. This includes developing machine learning models that protect training data and cannot be used to infer information pertaining to the enterprise or EDS that generated the classifier. The results of the activity would be algorithms and systems that would help achieve these goals. The tools that will be developed should be part of the data sharing approach, which is the focus of the E-ISAC and the DoE. For example, one organization could use the tools to assess the inference risks of sharing its machine learning models. The organization could then use the tools to reduce the inference risks before publishing or otherwise sharing the models. The activity will also explore the possibility of extending STIX/TAXII using its customized objects and properties to facilitate the sharing of machine learning classifiers.
The activity will allow us to better understand and answer challenges pertaining to secure data sharing in EDSs. The main questions are: What information can be extracted by reviewing the logs of an enterprise such as firewall and server access logs? What information is leaked by a classifier trained on such data? Is that leak meaningful/damaging to the EDS and a cause for concern? Can we develop metrics and standards to quantify the leaky nature of a classifier? How can we improve the confidentiality assurances of such classifiers without excessively reducing their effectiveness? Do these classifiers actually detect the types of attacks specific to EDS? Can we build more flexible data sharing schemes such as generating synthetic data that can be used to train classifiers?
Cyber security-related information such as indicators of compromise, forensics artifacts/samples, and incident reports are currently being shared among stakeholders in EDS via platforms such E-ISAC. However, the manner in which this information is shared currently has two primary disadvantages. First, the shared information may compromise enterprise confidentiality by revealing sensitive information (such as the infrastructure) of the distributor, which discourages participants from sharing data. Second, given that attacks evolve continuously and may exhibit different patterns for different topologies and infrastructure, each recipient of the threat intelligence still requires considerable effort to first map the problem onto their particular infrastructure and then develop detection/mitigation tools.
As machine learning applications are widely used in EDS to thwart evolving attacks, there is room to address these problems by sharing the machine learning classifiers and models instead of actual logs and reports. This will could minimize leakage while providing a classifier that can be used for diverse topologies and infrastructures without mapping or specialized tuning.
Reference the research activity fact sheet (PDF) for an extended gap analysis and bibliography. References noted throughout the Summary Statement are listed in this PDF fact sheet.
How does this research activity address the Roadmap to Achieve Energy Delivery Systems Cybersecurity?
The proposed research ties into the roadmap via various ways. We highlight some below:
Build a Culture of Security: With our focus on enterprise confidentiality and secure data sharing, we are enabling a data sharing approach that would encourage vendors to protect data and enterprise confidentiality while still being able to share threat and other beneficial information with peering vendors in an effective and meaningful way. Participants will develop an understanding of how data can leak from seemingly harmless logs or classifiers and what are the best practices and techniques to prevent this unwanted leakage. This will create raised awareness amongst admins and inculcate a culture rooted in security when it comes to data sharing.
Assessing and Monitoring Risks: One of the goals of this activity would be developing metrics and standards that would allow us to quantify and measure the EDS-specific confidentiality risks associated with various data sharing schemes. As mentioned previously, classifiers have been shown to leak information and despite the fact that the leakage is less severe as opposed to sharing a full network or access log, we currently lack mathematically sound mechanisms to quantify and compare the degree of leakage. Once metrics have been developed that would allow us to compute the “strength” of various approaches and machine learning models, a more informed decision can be taken as to the best choice for secure data sharing given the current context.
Manage Incidents: A classifier that models a particular attack will allow vendors to check the vulnerability of their own infrastructure against the vectors specific to the threat. If a vendor determines that it is vulnerable to the threat it can proactively make changes and deploy defenses before the attack even begins. At times, however, the classifier will predict that the attack will succeed and upon further inspection a vendor might find that its infrastructure has already been compromised (attackers in the Ukraine attack broke in and stayed dormant for 6 months before making a move). In response, the vendor could quickly flush out the attackers, patch any vulnerabilities and put up defenses. Additionally, they could change a few parameters pertaining to the infrastructure and query the classifier again to determine if the modified infrastructure is still vulnerable to the attack. If the classifier outputs a reasonable degree of security against the attack, the vendor could actually make those changes in their physical infrastructure. The approach allows vendors to better manage current incidents and allowing them to check how different changes to the underlying infrastructure and topology will help prevent future incidents.
Sustain security improvements: One notable near-term goal specified in the EDS cybersecurity Roadmap is timely sharing of threat information. The fundamental need is to quickly disseminate information pertaining to an attack to all stakeholders in order to minimize damage and contain the spread. Our classifier-sharing framework is closely aligned with this goal. We are aiming to develop technologies and tools that will together enable easy and hassle-free sharing of such information in a timely manner. Additionally, the approach minimizes the duplication of technology development efforts as participating vendors can simply update a published classifier by training it on their own incident data for the benefit of the remaining community.