research paper of database security

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Database Security

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Web Application Security Follow Following
  • Database Vault Follow Following
  • Oracle Database Vault Follow Following
  • Database Watermarking Follow Following
  • Social Networking Security and Privacy Follow Following
  • Penetration Testing Follow Following
  • Wireless Sensor Network Security Follow Following
  • XML Databases Follow Following
  • Physical security Follow Following
  • Data Warehousing and Data Mining Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Cyber risk and cybersecurity: a systematic review of data availability

  • Open access
  • Published: 17 February 2022
  • Volume 47 , pages 698–736, ( 2022 )

Cite this article

You have full access to this open access article

  • Frank Cremer 1 ,
  • Barry Sheehan   ORCID: orcid.org/0000-0003-4592-7558 1 ,
  • Michael Fortmann 2 ,
  • Arash N. Kia 1 ,
  • Martin Mullins 1 ,
  • Finbarr Murphy 1 &
  • Stefan Materne 2  

57k Accesses

54 Citations

32 Altmetric

Explore all metrics

Cybercrime is estimated to have cost the global economy just under USD 1 trillion in 2020, indicating an increase of more than 50% since 2018. With the average cyber insurance claim rising from USD 145,000 in 2019 to USD 359,000 in 2020, there is a growing necessity for better cyber information sources, standardised databases, mandatory reporting and public awareness. This research analyses the extant academic and industry literature on cybersecurity and cyber risk management with a particular focus on data availability. From a preliminary search resulting in 5219 cyber peer-reviewed studies, the application of the systematic methodology resulted in 79 unique datasets. We posit that the lack of available data on cyber risk poses a serious problem for stakeholders seeking to tackle this issue. In particular, we identify a lacuna in open databases that undermine collective endeavours to better manage this set of risks. The resulting data evaluation and categorisation will support cybersecurity researchers and the insurance industry in their efforts to comprehend, metricise and manage cyber risks.

Similar content being viewed by others

research paper of database security

The Ethical Implications of Using Artificial Intelligence in Auditing

Ivy Munoko, Helen L. Brown-Liburd & Miklos Vasarhelyi

research paper of database security

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Iqbal H. Sarker, Md Hasan Furhad & Raza Nowrozy

research paper of database security

Artificial Intelligence Crime: An Interdisciplinary Analysis of Foreseeable Threats and Solutions

Thomas C. King, Nikita Aggarwal, … Luciano Floridi

Avoid common mistakes on your manuscript.

Introduction

Globalisation, digitalisation and smart technologies have escalated the propensity and severity of cybercrime. Whilst it is an emerging field of research and industry, the importance of robust cybersecurity defence systems has been highlighted at the corporate, national and supranational levels. The impacts of inadequate cybersecurity are estimated to have cost the global economy USD 945 billion in 2020 (Maleks Smith et al. 2020 ). Cyber vulnerabilities pose significant corporate risks, including business interruption, breach of privacy and financial losses (Sheehan et al. 2019 ). Despite the increasing relevance for the international economy, the availability of data on cyber risks remains limited. The reasons for this are many. Firstly, it is an emerging and evolving risk; therefore, historical data sources are limited (Biener et al. 2015 ). It could also be due to the fact that, in general, institutions that have been hacked do not publish the incidents (Eling and Schnell 2016 ). The lack of data poses challenges for many areas, such as research, risk management and cybersecurity (Falco et al. 2019 ). The importance of this topic is demonstrated by the announcement of the European Council in April 2021 that a centre of excellence for cybersecurity will be established to pool investments in research, technology and industrial development. The goal of this centre is to increase the security of the internet and other critical network and information systems (European Council 2021 ).

This research takes a risk management perspective, focusing on cyber risk and considering the role of cybersecurity and cyber insurance in risk mitigation and risk transfer. The study reviews the existing literature and open data sources related to cybersecurity and cyber risk. This is the first systematic review of data availability in the general context of cyber risk and cybersecurity. By identifying and critically analysing the available datasets, this paper supports the research community by aggregating, summarising and categorising all available open datasets. In addition, further information on datasets is attached to provide deeper insights and support stakeholders engaged in cyber risk control and cybersecurity. Finally, this research paper highlights the need for open access to cyber-specific data, without price or permission barriers.

The identified open data can support cyber insurers in their efforts on sustainable product development. To date, traditional risk assessment methods have been untenable for insurance companies due to the absence of historical claims data (Sheehan et al. 2021 ). These high levels of uncertainty mean that cyber insurers are more inclined to overprice cyber risk cover (Kshetri 2018 ). Combining external data with insurance portfolio data therefore seems to be essential to improve the evaluation of the risk and thus lead to risk-adjusted pricing (Bessy-Roland et al. 2021 ). This argument is also supported by the fact that some re/insurers reported that they are working to improve their cyber pricing models (e.g. by creating or purchasing databases from external providers) (EIOPA 2018 ). Figure  1 provides an overview of pricing tools and factors considered in the estimation of cyber insurance based on the findings of EIOPA ( 2018 ) and the research of Romanosky et al. ( 2019 ). The term cyber risk refers to all cyber risks and their potential impact.

figure 1

An overview of the current cyber insurance informational and methodological landscape, adapted from EIOPA ( 2018 ) and Romanosky et al. ( 2019 )

Besides the advantage of risk-adjusted pricing, the availability of open datasets helps companies benchmark their internal cyber posture and cybersecurity measures. The research can also help to improve risk awareness and corporate behaviour. Many companies still underestimate their cyber risk (Leong and Chen 2020 ). For policymakers, this research offers starting points for a comprehensive recording of cyber risks. Although in many countries, companies are obliged to report data breaches to the respective supervisory authority, this information is usually not accessible to the research community. Furthermore, the economic impact of these breaches is usually unclear.

As well as the cyber risk management community, this research also supports cybersecurity stakeholders. Researchers are provided with an up-to-date, peer-reviewed literature of available datasets showing where these datasets have been used. For example, this includes datasets that have been used to evaluate the effectiveness of countermeasures in simulated cyberattacks or to test intrusion detection systems. This reduces a time-consuming search for suitable datasets and ensures a comprehensive review of those available. Through the dataset descriptions, researchers and industry stakeholders can compare and select the most suitable datasets for their purposes. In addition, it is possible to combine the datasets from one source in the context of cybersecurity or cyber risk. This supports efficient and timely progress in cyber risk research and is beneficial given the dynamic nature of cyber risks.

Cyber risks are defined as “operational risks to information and technology assets that have consequences affecting the confidentiality, availability, and/or integrity of information or information systems” (Cebula et al. 2014 ). Prominent cyber risk events include data breaches and cyberattacks (Agrafiotis et al. 2018 ). The increasing exposure and potential impact of cyber risk have been highlighted in recent industry reports (e.g. Allianz 2021 ; World Economic Forum 2020 ). Cyberattacks on critical infrastructures are ranked 5th in the World Economic Forum's Global Risk Report. Ransomware, malware and distributed denial-of-service (DDoS) are examples of the evolving modes of a cyberattack. One example is the ransomware attack on the Colonial Pipeline, which shut down the 5500 mile pipeline system that delivers 2.5 million barrels of fuel per day and critical liquid fuel infrastructure from oil refineries to states along the U.S. East Coast (Brower and McCormick 2021 ). These and other cyber incidents have led the U.S. to strengthen its cybersecurity and introduce, among other things, a public body to analyse major cyber incidents and make recommendations to prevent a recurrence (Murphey 2021a ). Another example of the scope of cyberattacks is the ransomware NotPetya in 2017. The damage amounted to USD 10 billion, as the ransomware exploited a vulnerability in the windows system, allowing it to spread independently worldwide in the network (GAO 2021 ). In the same year, the ransomware WannaCry was launched by cybercriminals. The cyberattack on Windows software took user data hostage in exchange for Bitcoin cryptocurrency (Smart 2018 ). The victims included the National Health Service in Great Britain. As a result, ambulances were redirected to other hospitals because of information technology (IT) systems failing, leaving people in need of urgent assistance waiting. It has been estimated that 19,000 cancelled treatment appointments resulted from losses of GBP 92 million (Field 2018 ). Throughout the COVID-19 pandemic, ransomware attacks increased significantly, as working from home arrangements increased vulnerability (Murphey 2021b ).

Besides cyberattacks, data breaches can also cause high costs. Under the General Data Protection Regulation (GDPR), companies are obliged to protect personal data and safeguard the data protection rights of all individuals in the EU area. The GDPR allows data protection authorities in each country to impose sanctions and fines on organisations they find in breach. “For data breaches, the maximum fine can be €20 million or 4% of global turnover, whichever is higher” (GDPR.EU 2021 ). Data breaches often involve a large amount of sensitive data that has been accessed, unauthorised, by external parties, and are therefore considered important for information security due to their far-reaching impact (Goode et al. 2017 ). A data breach is defined as a “security incident in which sensitive, protected, or confidential data are copied, transmitted, viewed, stolen, or used by an unauthorized individual” (Freeha et al. 2021 ). Depending on the amount of data, the extent of the damage caused by a data breach can be significant, with the average cost being USD 392 million Footnote 1 (IBM Security 2020 ).

This research paper reviews the existing literature and open data sources related to cybersecurity and cyber risk, focusing on the datasets used to improve academic understanding and advance the current state-of-the-art in cybersecurity. Furthermore, important information about the available datasets is presented (e.g. use cases), and a plea is made for open data and the standardisation of cyber risk data for academic comparability and replication. The remainder of the paper is structured as follows. The next section describes the related work regarding cybersecurity and cyber risks. The third section outlines the review method used in this work and the process. The fourth section details the results of the identified literature. Further discussion is presented in the penultimate section and the final section concludes.

Related work

Due to the significance of cyber risks, several literature reviews have been conducted in this field. Eling ( 2020 ) reviewed the existing academic literature on the topic of cyber risk and cyber insurance from an economic perspective. A total of 217 papers with the term ‘cyber risk’ were identified and classified in different categories. As a result, open research questions are identified, showing that research on cyber risks is still in its infancy because of their dynamic and emerging nature. Furthermore, the author highlights that particular focus should be placed on the exchange of information between public and private actors. An improved information flow could help to measure the risk more accurately and thus make cyber risks more insurable and help risk managers to determine the right level of cyber risk for their company. In the context of cyber insurance data, Romanosky et al. ( 2019 ) analysed the underwriting process for cyber insurance and revealed how cyber insurers understand and assess cyber risks. For this research, they examined 235 American cyber insurance policies that were publicly available and looked at three components (coverage, application questionnaires and pricing). The authors state in their findings that many of the insurers used very simple, flat-rate pricing (based on a single calculation of expected loss), while others used more parameters such as the asset value of the company (or company revenue) or standard insurance metrics (e.g. deductible, limits), and the industry in the calculation. This is in keeping with Eling ( 2020 ), who states that an increased amount of data could help to make cyber risk more accurately measured and thus more insurable. Similar research on cyber insurance and data was conducted by Nurse et al. ( 2020 ). The authors examined cyber insurance practitioners' perceptions and the challenges they face in collecting and using data. In addition, gaps were identified during the research where further data is needed. The authors concluded that cyber insurance is still in its infancy, and there are still several unanswered questions (for example, cyber valuation, risk calculation and recovery). They also pointed out that a better understanding of data collection and use in cyber insurance would be invaluable for future research and practice. Bessy-Roland et al. ( 2021 ) come to a similar conclusion. They proposed a multivariate Hawkes framework to model and predict the frequency of cyberattacks. They used a public dataset with characteristics of data breaches affecting the U.S. industry. In the conclusion, the authors make the argument that an insurer has a better knowledge of cyber losses, but that it is based on a small dataset and therefore combination with external data sources seems essential to improve the assessment of cyber risks.

Several systematic reviews have been published in the area of cybersecurity (Kruse et al. 2017 ; Lee et al. 2020 ; Loukas et al. 2013 ; Ulven and Wangen 2021 ). In these papers, the authors concentrated on a specific area or sector in the context of cybersecurity. This paper adds to this extant literature by focusing on data availability and its importance to risk management and insurance stakeholders. With a priority on healthcare and cybersecurity, Kruse et al. ( 2017 ) conducted a systematic literature review. The authors identified 472 articles with the keywords ‘cybersecurity and healthcare’ or ‘ransomware’ in the databases Cumulative Index of Nursing and Allied Health Literature, PubMed and Proquest. Articles were eligible for this review if they satisfied three criteria: (1) they were published between 2006 and 2016, (2) the full-text version of the article was available, and (3) the publication is a peer-reviewed or scholarly journal. The authors found that technological development and federal policies (in the U.S.) are the main factors exposing the health sector to cyber risks. Loukas et al. ( 2013 ) conducted a review with a focus on cyber risks and cybersecurity in emergency management. The authors provided an overview of cyber risks in communication, sensor, information management and vehicle technologies used in emergency management and showed areas for which there is still no solution in the literature. Similarly, Ulven and Wangen ( 2021 ) reviewed the literature on cybersecurity risks in higher education institutions. For the literature review, the authors used the keywords ‘cyber’, ‘information threats’ or ‘vulnerability’ in connection with the terms ‘higher education, ‘university’ or ‘academia’. A similar literature review with a focus on Internet of Things (IoT) cybersecurity was conducted by Lee et al. ( 2020 ). The review revealed that qualitative approaches focus on high-level frameworks, and quantitative approaches to cybersecurity risk management focus on risk assessment and quantification of cyberattacks and impacts. In addition, the findings presented a four-step IoT cyber risk management framework that identifies, quantifies and prioritises cyber risks.

Datasets are an essential part of cybersecurity research, underlined by the following works. Ilhan Firat et al. ( 2021 ) examined various cybersecurity datasets in detail. The study was motivated by the fact that with the proliferation of the internet and smart technologies, the mode of cyberattacks is also evolving. However, in order to prevent such attacks, they must first be detected; the dissemination and further development of cybersecurity datasets is therefore critical. In their work, the authors observed studies of datasets used in intrusion detection systems. Khraisat et al. ( 2019 ) also identified a need for new datasets in the context of cybersecurity. The researchers presented a taxonomy of current intrusion detection systems, a comprehensive review of notable recent work, and an overview of the datasets commonly used for assessment purposes. In their conclusion, the authors noted that new datasets are needed because most machine-learning techniques are trained and evaluated on the knowledge of old datasets. These datasets do not contain new and comprehensive information and are partly derived from datasets from 1999. The authors noted that the core of this issue is the availability of new public datasets as well as their quality. The availability of data, how it is used, created and shared was also investigated by Zheng et al. ( 2018 ). The researchers analysed 965 cybersecurity research papers published between 2012 and 2016. They created a taxonomy of the types of data that are created and shared and then analysed the data collected via datasets. The researchers concluded that while datasets are recognised as valuable for cybersecurity research, the proportion of publicly available datasets is limited.

The main contributions of this review and what differentiates it from previous studies can be summarised as follows. First, as far as we can tell, it is the first work to summarise all available datasets on cyber risk and cybersecurity in the context of a systematic review and present them to the scientific community and cyber insurance and cybersecurity stakeholders. Second, we investigated, analysed, and made available the datasets to support efficient and timely progress in cyber risk research. And third, we enable comparability of datasets so that the appropriate dataset can be selected depending on the research area.

Methodology

Process and eligibility criteria.

The structure of this systematic review is inspired by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework (Page et al. 2021 ), and the search was conducted from 3 to 10 May 2021. Due to the continuous development of cyber risks and their countermeasures, only articles published in the last 10 years were considered. In addition, only articles published in peer-reviewed journals written in English were included. As a final criterion, only articles that make use of one or more cybersecurity or cyber risk datasets met the inclusion criteria. Specifically, these studies presented new or existing datasets, used them for methods, or used them to verify new results, as well as analysed them in an economic context and pointed out their effects. The criterion was fulfilled if it was clearly stated in the abstract that one or more datasets were used. A detailed explanation of this selection criterion can be found in the ‘Study selection’ section.

Information sources

In order to cover a complete spectrum of literature, various databases were queried to collect relevant literature on the topic of cybersecurity and cyber risks. Due to the spread of related articles across multiple databases, the literature search was limited to the following four databases for simplicity: IEEE Xplore, Scopus, SpringerLink and Web of Science. This is similar to other literature reviews addressing cyber risks or cybersecurity, including Sardi et al. ( 2021 ), Franke and Brynielsson ( 2014 ), Lagerström (2019), Eling and Schnell ( 2016 ) and Eling ( 2020 ). In this paper, all databases used in the aforementioned works were considered. However, only two studies also used all the databases listed. The IEEE Xplore database contains electrical engineering, computer science, and electronics work from over 200 journals and three million conference papers (IEEE 2021 ). Scopus includes 23,400 peer-reviewed journals from more than 5000 international publishers in the areas of science, engineering, medicine, social sciences and humanities (Scopus 2021 ). SpringerLink contains 3742 journals and indexes over 10 million scientific documents (SpringerLink 2021 ). Finally, Web of Science indexes over 9200 journals in different scientific disciplines (Science 2021 ).

A search string was created and applied to all databases. To make the search efficient and reproducible, the following search string with Boolean operator was used in all databases: cybersecurity OR cyber risk AND dataset OR database. To ensure uniformity of the search across all databases, some adjustments had to be made for the respective search engines. In Scopus, for example, the Advanced Search was used, and the field code ‘Title-ABS-KEY’ was integrated into the search string. For IEEE Xplore, the search was carried out with the Search String in the Command Search and ‘All Metadata’. In the Web of Science database, the Advanced Search was used. The special feature of this search was that it had to be carried out in individual steps. The first search was carried out with the terms cybersecurity OR cyber risk with the field tag Topic (T.S. =) and the second search with dataset OR database. Subsequently, these searches were combined, which then delivered the searched articles for review. For SpringerLink, the search string was used in the Advanced Search under the category ‘Find the resources with all of the words’. After conducting this search string, 5219 studies could be found. According to the eligibility criteria (period, language and only scientific journals), 1581 studies were identified in the databases:

Scopus: 135

Springer Link: 548

Web of Science: 534

An overview of the process is given in Fig.  2 . Combined with the results from the four databases, 854 articles without duplicates were identified.

figure 2

Literature search process and categorisation of the studies

Study selection

In the final step of the selection process, the articles were screened for relevance. Due to a large number of results, the abstracts were analysed in the first step of the process. The aim was to determine whether the article was relevant for the systematic review. An article fulfilled the criterion if it was recognisable in the abstract that it had made a contribution to datasets or databases with regard to cyber risks or cybersecurity. Specifically, the criterion was considered to be met if the abstract used datasets that address the causes or impacts of cyber risks, and measures in the area of cybersecurity. In this process, the number of articles was reduced to 288. The articles were then read in their entirety, and an expert panel of six people decided whether they should be used. This led to a final number of 255 articles. The years in which the articles were published and the exact number can be seen in Fig.  3 .

figure 3

Distribution of studies

Data collection process and synthesis of the results

For the data collection process, various data were extracted from the studies, including the names of the respective creators, the name of the dataset or database and the corresponding reference. It was also determined where the data came from. In the context of accessibility, it was determined whether access is free, controlled, available for purchase or not available. It was also determined when the datasets were created and the time period referenced. The application type and domain characteristics of the datasets were identified.

This section analyses the results of the systematic literature review. The previously identified studies are divided into three categories: datasets on the causes of cyber risks, datasets on the effects of cyber risks and datasets on cybersecurity. The classification is based on the intended use of the studies. This system of classification makes it easier for stakeholders to find the appropriate datasets. The categories are evaluated individually. Although complete information is available for a large proportion of datasets, this is not true for all of them. Accordingly, the abbreviation N/A has been inserted in the respective characters to indicate that this information could not be determined by the time of submission. The term ‘use cases in the literature’ in the following and supplementary tables refers to the application areas in which the corresponding datasets were used in the literature. The areas listed there refer to the topic area on which the researchers conducted their research. Since some datasets were used interdisciplinarily, the listed use cases in the literature are correspondingly longer. Before discussing each category in the next sections, Fig.  4 provides an overview of the number of datasets found and their year of creation. Figure  5 then shows the relationship between studies and datasets in the period under consideration. Figure  6 shows the distribution of studies, their use of datasets and their creation date. The number of datasets used is higher than the number of studies because the studies often used several datasets (Table 1 ).

figure 4

Distribution of dataset results

figure 5

Correlation between the studies and the datasets

figure 6

Distribution of studies and their use of datasets

Most of the datasets are generated in the U.S. (up to 58.2%). Canada and Australia rank next, with 11.3% and 5% of all the reviewed datasets, respectively.

Additionally, to create value for the datasets for the cyber insurance industry, an assessment of the applicability of each dataset has been provided for cyber insurers. This ‘Use Case Assessment’ includes the use of the data in the context of different analyses, calculation of cyber insurance premiums, and use of the information for the design of cyber insurance contracts or for additional customer services. To reasonably account for the transition of direct hyperlinks in the future, references were directed to the main websites for longevity (nearest resource point). In addition, the links to the main pages contain further information on the datasets and different versions related to the operating systems. The references were chosen in such a way that practitioners get the best overview of the respective datasets.

Case datasets

This section presents selected articles that use the datasets to analyse the causes of cyber risks. The datasets help identify emerging trends and allow pattern discovery in cyber risks. This information gives cybersecurity experts and cyber insurers the data to make better predictions and take appropriate action. For example, if certain vulnerabilities are not adequately protected, cyber insurers will demand a risk surcharge leading to an improvement in the risk-adjusted premium. Due to the capricious nature of cyber risks, existing data must be supplemented with new data sources (for example, new events, new methods or security vulnerabilities) to determine prevailing cyber exposure. The datasets of cyber risk causes could be combined with existing portfolio data from cyber insurers and integrated into existing pricing tools and factors to improve the valuation of cyber risks.

A portion of these datasets consists of several taxonomies and classifications of cyber risks. Aassal et al. ( 2020 ) propose a new taxonomy of phishing characteristics based on the interpretation and purpose of each characteristic. In comparison, Hindy et al. ( 2020 ) presented a taxonomy of network threats and the impact of current datasets on intrusion detection systems. A similar taxonomy was suggested by Kiwia et al. ( 2018 ). The authors presented a cyber kill chain-based taxonomy of banking Trojans features. The taxonomy built on a real-world dataset of 127 banking Trojans collected from December 2014 to January 2016 by a major U.K.-based financial organisation.

In the context of classification, Aamir et al. ( 2021 ) showed the benefits of machine learning for classifying port scans and DDoS attacks in a mixture of normal and attack traffic. Guo et al. ( 2020 ) presented a new method to improve malware classification based on entropy sequence features. The evaluation of this new method was conducted on different malware datasets.

To reconstruct attack scenarios and draw conclusions based on the evidence in the alert stream, Barzegar and Shajari ( 2018 ) use the DARPA2000 and MACCDC 2012 dataset for their research. Giudici and Raffinetti ( 2020 ) proposed a rank-based statistical model aimed at predicting the severity levels of cyber risk. The model used cyber risk data from the University of Milan. In contrast to the previous datasets, Skrjanc et al. ( 2018 ) used the older dataset KDD99 to monitor large-scale cyberattacks using a cauchy clustering method.

Amin et al. ( 2021 ) used a cyberattack dataset from the Canadian Institute for Cybersecurity to identify spatial clusters of countries with high rates of cyberattacks. In the context of cybercrime, Junger et al. ( 2020 ) examined crime scripts, key characteristics of the target company and the relationship between criminal effort and financial benefit. For their study, the authors analysed 300 cases of fraudulent activities against Dutch companies. With a similar focus on cybercrime, Mireles et al. ( 2019 ) proposed a metric framework to measure the effectiveness of the dynamic evolution of cyberattacks and defensive measures. To validate its usefulness, they used the DEFCON dataset.

Due to the rapidly changing nature of cyber risks, it is often impossible to obtain all information on them. Kim and Kim ( 2019 ) proposed an automated dataset generation system called CTIMiner that collects threat data from publicly available security reports and malware repositories. They released a dataset to the public containing about 640,000 records from 612 security reports published between January 2008 and 2019. A similar approach is proposed by Kim et al. ( 2020 ), using a named entity recognition system to extract core information from cyber threat reports automatically. They created a 498,000-tag dataset during their research (Ulven and Wangen 2021 ).

Within the framework of vulnerabilities and cybersecurity issues, Ulven and Wangen ( 2021 ) proposed an overview of mission-critical assets and everyday threat events, suggested a generic threat model, and summarised common cybersecurity vulnerabilities. With a focus on hospitality, Chen and Fiscus ( 2018 ) proposed several issues related to cybersecurity in this sector. They analysed 76 security incidents from the Privacy Rights Clearinghouse database. Supplementary Table 1 lists all findings that belong to the cyber causes dataset.

Impact datasets

This section outlines selected findings of the cyber impact dataset. For cyber insurers, these datasets can form an important basis for information, as they can be used to calculate cyber insurance premiums, evaluate specific cyber risks, formulate inclusions and exclusions in cyber wordings, and re-evaluate as well as supplement the data collected so far on cyber risks. For example, information on financial losses can help to better assess the loss potential of cyber risks. Furthermore, the datasets can provide insight into the frequency of occurrence of these cyber risks. The new datasets can be used to close any data gaps that were previously based on very approximate estimates or to find new results.

Eight studies addressed the costs of data breaches. For instance, Eling and Jung ( 2018 ) reviewed 3327 data breach events from 2005 to 2016 and identified an asymmetric dependence of monthly losses by breach type and industry. The authors used datasets from the Privacy Rights Clearinghouse for analysis. The Privacy Rights Clearinghouse datasets and the Breach level index database were also used by De Giovanni et al. ( 2020 ) to describe relationships between data breaches and bitcoin-related variables using the cointegration methodology. The data were obtained from the Department of Health and Human Services of healthcare facilities reporting data breaches and a national database of technical and organisational infrastructure information. Also in the context of data breaches, Algarni et al. ( 2021 ) developed a comprehensive, formal model that estimates the two components of security risks: breach cost and the likelihood of a data breach within 12 months. For their survey, the authors used two industrial reports from the Ponemon institute and VERIZON. To illustrate the scope of data breaches, Neto et al. ( 2021 ) identified 430 major data breach incidents among more than 10,000 incidents. The database created is available and covers the period 2018 to 2019.

With a direct focus on insurance, Biener et al. ( 2015 ) analysed 994 cyber loss cases from an operational risk database and investigated the insurability of cyber risks based on predefined criteria. For their study, they used data from the company SAS OpRisk Global Data. Similarly, Eling and Wirfs ( 2019 ) looked at a wide range of cyber risk events and actual cost data using the same database. They identified cyber losses and analysed them using methods from statistics and actuarial science. Using a similar reference, Farkas et al. ( 2021 ) proposed a method for analysing cyber claims based on regression trees to identify criteria for classifying and evaluating claims. Similar to Chen and Fiscus ( 2018 ), the dataset used was the Privacy Rights Clearinghouse database. Within the framework of reinsurance, Moro ( 2020 ) analysed cyber index-based information technology activity to see if index-parametric reinsurance coverage could suggest its cedant using data from a Symantec dataset.

Paté-Cornell et al. ( 2018 ) presented a general probabilistic risk analysis framework for cybersecurity in an organisation to be specified. The results are distributions of losses to cyberattacks, with and without considered countermeasures in support of risk management decisions based both on past data and anticipated incidents. The data used were from The Common Vulnerability and Exposures database and via confidential access to a database of cyberattacks on a large, U.S.-based organisation. A different conceptual framework for cyber risk classification and assessment was proposed by Sheehan et al. ( 2021 ). This framework showed the importance of proactive and reactive barriers in reducing companies’ exposure to cyber risk and quantifying the risk. Another approach to cyber risk assessment and mitigation was proposed by Mukhopadhyay et al. ( 2019 ). They estimated the probability of an attack using generalised linear models, predicted the security technology required to reduce the probability of cyberattacks, and used gamma and exponential distributions to best approximate the average loss data for each malicious attack. They also calculated the expected loss due to cyberattacks, calculated the net premium that would need to be charged by a cyber insurer, and suggested cyber insurance as a strategy to minimise losses. They used the CSI-FBI survey (1997–2010) to conduct their research.

In order to highlight the lack of data on cyber risks, Eling ( 2020 ) conducted a literature review in the areas of cyber risk and cyber insurance. Available information on the frequency, severity, and dependency structure of cyber risks was filtered out. In addition, open questions for future cyber risk research were set up. Another example of data collection on the impact of cyberattacks is provided by Sornette et al. ( 2013 ), who use a database of newspaper articles, press reports and other media to provide a predictive method to identify triggering events and potential accident scenarios and estimate their severity and frequency. A similar approach to data collection was used by Arcuri et al. ( 2020 ) to gather an original sample of global cyberattacks from newspaper reports sourced from the LexisNexis database. This collection is also used and applied to the fields of dynamic communication and cyber risk perception by Fang et al. ( 2021 ). To create a dataset of cyber incidents and disputes, Valeriano and Maness ( 2014 ) collected information on cyber interactions between rival states.

To assess trends and the scale of economic cybercrime, Levi ( 2017 ) examined datasets from different countries and their impact on crime policy. Pooser et al. ( 2018 ) investigated the trend in cyber risk identification from 2006 to 2015 and company characteristics related to cyber risk perception. The authors used a dataset of various reports from cyber insurers for their study. Walker-Roberts et al. ( 2020 ) investigated the spectrum of risk of a cybersecurity incident taking place in the cyber-physical-enabled world using the VERIS Community Database. The datasets of impacts identified are presented below. Due to overlap, some may also appear in the causes dataset (Supplementary Table 2).

Cybersecurity datasets

General intrusion detection.

General intrusion detection systems account for the largest share of countermeasure datasets. For companies or researchers focused on cybersecurity, the datasets can be used to test their own countermeasures or obtain information about potential vulnerabilities. For example, Al-Omari et al. ( 2021 ) proposed an intelligent intrusion detection model for predicting and detecting attacks in cyberspace, which was applied to dataset UNSW-NB 15. A similar approach was taken by Choras and Kozik ( 2015 ), who used machine learning to detect cyberattacks on web applications. To evaluate their method, they used the HTTP dataset CSIC 2010. For the identification of unknown attacks on web servers, Kamarudin et al. ( 2017 ) proposed an anomaly-based intrusion detection system using an ensemble classification approach. Ganeshan and Rodrigues ( 2020 ) showed an intrusion detection system approach, which clusters the database into several groups and detects the presence of intrusion in the clusters. In comparison, AlKadi et al. ( 2019 ) used a localisation-based model to discover abnormal patterns in network traffic. Hybrid models have been recommended by Bhattacharya et al. ( 2020 ) and Agrawal et al. ( 2019 ); the former is a machine-learning model based on principal component analysis for the classification of intrusion detection system datasets, while the latter is a hybrid ensemble intrusion detection system for anomaly detection using different datasets to detect patterns in network traffic that deviate from normal behaviour.

Agarwal et al. ( 2021 ) used three different machine learning algorithms in their research to find the most suitable for efficiently identifying patterns of suspicious network activity. The UNSW-NB15 dataset was used for this purpose. Kasongo and Sun ( 2020 ), Feed-Forward Deep Neural Network (FFDNN), Keshk et al. ( 2021 ), the privacy-preserving anomaly detection framework, and others also use the UNSW-NB 15 dataset as part of intrusion detection systems. The same dataset and others were used by Binbusayyis and Vaiyapuri ( 2019 ) to identify and compare key features for cyber intrusion detection. Atefinia and Ahmadi ( 2021 ) proposed a deep neural network model to reduce the false positive rate of an anomaly-based intrusion detection system. Fossaceca et al. ( 2015 ) focused in their research on the development of a framework that combined the outputs of multiple learners in order to improve the efficacy of network intrusion, and Gauthama Raman et al. ( 2020 ) presented a search algorithm based on Support Vector machine to improve the performance of the detection and false alarm rate to improve intrusion detection techniques. Ahmad and Alsemmeari ( 2020 ) targeted extreme learning machine techniques due to their good capabilities in classification problems and handling huge data. They used the NSL-KDD dataset as a benchmark.

With reference to prediction, Bakdash et al. ( 2018 ) used datasets from the U.S. Department of Defence to predict cyberattacks by malware. This dataset consists of weekly counts of cyber events over approximately seven years. Another prediction method was presented by Fan et al. ( 2018 ), which showed an improved integrated cybersecurity prediction method based on spatial-time analysis. Also, with reference to prediction, Ashtiani and Azgomi ( 2014 ) proposed a framework for the distributed simulation of cyberattacks based on high-level architecture. Kirubavathi and Anitha ( 2016 ) recommended an approach to detect botnets, irrespective of their structures, based on network traffic flow behaviour analysis and machine-learning techniques. Dwivedi et al. ( 2021 ) introduced a multi-parallel adaptive technique to utilise an adaption mechanism in the group of swarms for network intrusion detection. AlEroud and Karabatis ( 2018 ) presented an approach that used contextual information to automatically identify and query possible semantic links between different types of suspicious activities extracted from network flows.

Intrusion detection systems with a focus on IoT

In addition to general intrusion detection systems, a proportion of studies focused on IoT. Habib et al. ( 2020 ) presented an approach for converting traditional intrusion detection systems into smart intrusion detection systems for IoT networks. To enhance the process of diagnostic detection of possible vulnerabilities with an IoT system, Georgescu et al. ( 2019 ) introduced a method that uses a named entity recognition-based solution. With regard to IoT in the smart home sector, Heartfield et al. ( 2021 ) presented a detection system that is able to autonomously adjust the decision function of its underlying anomaly classification models to a smart home’s changing condition. Another intrusion detection system was suggested by Keserwani et al. ( 2021 ), which combined Grey Wolf Optimization and Particle Swam Optimization to identify various attacks for IoT networks. They used the KDD Cup 99, NSL-KDD and CICIDS-2017 to evaluate their model. Abu Al-Haija and Zein-Sabatto ( 2020 ) provide a comprehensive development of a new intelligent and autonomous deep-learning-based detection and classification system for cyberattacks in IoT communication networks that leverage the power of convolutional neural networks, abbreviated as IoT-IDCS-CNN (IoT-based Intrusion Detection and Classification System using Convolutional Neural Network). To evaluate the development, the authors used the NSL-KDD dataset. Biswas and Roy ( 2021 ) recommended a model that identifies malicious botnet traffic using novel deep-learning approaches like artificial neural networks gutted recurrent units and long- or short-term memory models. They tested their model with the Bot-IoT dataset.

With a more forensic background, Koroniotis et al. ( 2020 ) submitted a network forensic framework, which described the digital investigation phases for identifying and tracing attack behaviours in IoT networks. The suggested work was evaluated with the Bot-IoT and UINSW-NB15 datasets. With a focus on big data and IoT, Chhabra et al. ( 2020 ) presented a cyber forensic framework for big data analytics in an IoT environment using machine learning. Furthermore, the authors mentioned different publicly available datasets for machine-learning models.

A stronger focus on a mobile phones was exhibited by Alazab et al. ( 2020 ), which presented a classification model that combined permission requests and application programme interface calls. The model was tested with a malware dataset containing 27,891 Android apps. A similar approach was taken by Li et al. ( 2019a , b ), who proposed a reliable classifier for Android malware detection based on factorisation machine architecture and extraction of Android app features from manifest files and source code.

Literature reviews

In addition to the different methods and models for intrusion detection systems, various literature reviews on the methods and datasets were also found. Liu and Lang ( 2019 ) proposed a taxonomy of intrusion detection systems that uses data objects as the main dimension to classify and summarise machine learning and deep learning-based intrusion detection literature. They also presented four different benchmark datasets for machine-learning detection systems. Ahmed et al. ( 2016 ) presented an in-depth analysis of four major categories of anomaly detection techniques, which include classification, statistical, information theory and clustering. Hajj et al. ( 2021 ) gave a comprehensive overview of anomaly-based intrusion detection systems. Their article gives an overview of the requirements, methods, measurements and datasets that are used in an intrusion detection system.

Within the framework of machine learning, Chattopadhyay et al. ( 2018 ) conducted a comprehensive review and meta-analysis on the application of machine-learning techniques in intrusion detection systems. They also compared different machine learning techniques in different datasets and summarised the performance. Vidros et al. ( 2017 ) presented an overview of characteristics and methods in automatic detection of online recruitment fraud. They also published an available dataset of 17,880 annotated job ads, retrieved from the use of a real-life system. An empirical study of different unsupervised learning algorithms used in the detection of unknown attacks was presented by Meira et al. ( 2020 ).

New datasets

Kilincer et al. ( 2021 ) reviewed different intrusion detection system datasets in detail. They had a closer look at the UNS-NB15, ISCX-2012, NSL-KDD and CIDDS-001 datasets. Stojanovic et al. ( 2020 ) also provided a review on datasets and their creation for use in advanced persistent threat detection in the literature. Another review of datasets was provided by Sarker et al. ( 2020 ), who focused on cybersecurity data science as part of their research and provided an overview from a machine-learning perspective. Avila et al. ( 2021 ) conducted a systematic literature review on the use of security logs for data leak detection. They recommended a new classification of information leak, which uses the GDPR principles, identified the most widely publicly available dataset for threat detection, described the attack types in the datasets and the algorithms used for data leak detection. Tuncer et al. ( 2020 ) presented a bytecode-based detection method consisting of feature extraction using local neighbourhood binary patterns. They chose a byte-based malware dataset to investigate the performance of the proposed local neighbourhood binary pattern-based detection method. With a different focus, Mauro et al. ( 2020 ) gave an experimental overview of neural-based techniques relevant to intrusion detection. They assessed the value of neural networks using the Bot-IoT and UNSW-DB15 datasets.

Another category of results in the context of countermeasure datasets is those that were presented as new. Moreno et al. ( 2018 ) developed a database of 300 security-related accidents from European and American sources. The database contained cybersecurity-related events in the chemical and process industry. Damasevicius et al. ( 2020 ) proposed a new dataset (LITNET-2020) for network intrusion detection. The dataset is a new annotated network benchmark dataset obtained from the real-world academic network. It presents real-world examples of normal and under-attack network traffic. With a focus on IoT intrusion detection systems, Alsaedi et al. ( 2020 ) proposed a new benchmark IoT/IIot datasets for assessing intrusion detection system-enabled IoT systems. Also in the context of IoT, Vaccari et al. ( 2020 ) proposed a dataset focusing on message queue telemetry transport protocols, which can be used to train machine-learning models. To evaluate the performance of machine-learning classifiers, Mahfouz et al. ( 2020 ) created a dataset called Game Theory and Cybersecurity (GTCS). A dataset containing 22,000 malware and benign samples was constructed by Martin et al. ( 2019 ). The dataset can be used as a benchmark to test the algorithm for Android malware classification and clustering techniques. In addition, Laso et al. ( 2017 ) presented a dataset created to investigate how data and information quality estimates enable the detection of anomalies and malicious acts in cyber-physical systems. The dataset contained various cyberattacks and is publicly available.

In addition to the results described above, several other studies were found that fit into the category of countermeasures. Johnson et al. ( 2016 ) examined the time between vulnerability disclosures. Using another vulnerabilities database, Common Vulnerabilities and Exposures (CVE), Subroto and Apriyana ( 2019 ) presented an algorithm model that uses big data analysis of social media and statistical machine learning to predict cyber risks. A similar databank but with a different focus, Common Vulnerability Scoring System, was used by Chatterjee and Thekdi ( 2020 ) to present an iterative data-driven learning approach to vulnerability assessment and management for complex systems. Using the CICIDS2017 dataset to evaluate the performance, Malik et al. ( 2020 ) proposed a control plane-based orchestration for varied, sophisticated threats and attacks. The same dataset was used in another study by Lee et al. ( 2019 ), who developed an artificial security information event management system based on a combination of event profiling for data processing and different artificial network methods. To exploit the interdependence between multiple series, Fang et al. ( 2021 ) proposed a statistical framework. In order to validate the framework, the authors applied it to a dataset of enterprise-level security breaches from the Privacy Rights Clearinghouse and Identity Theft Center database. Another framework with a defensive aspect was recommended by Li et al. ( 2021 ) to increase the robustness of deep neural networks against adversarial malware evasion attacks. Sarabi et al. ( 2016 ) investigated whether and to what extent business details can help assess an organisation's risk of data breaches and the distribution of risk across different types of incidents to create policies for protection, detection and recovery from different forms of security incidents. They used data from the VERIS Community Database.

Datasets that have been classified into the cybersecurity category are detailed in Supplementary Table 3. Due to overlap, records from the previous tables may also be included.

This paper presented a systematic literature review of studies on cyber risk and cybersecurity that used datasets. Within this framework, 255 studies were fully reviewed and then classified into three different categories. Then, 79 datasets were consolidated from these studies. These datasets were subsequently analysed, and important information was selected through a process of filtering out. This information was recorded in a table and enhanced with further information as part of the literature analysis. This made it possible to create a comprehensive overview of the datasets. For example, each dataset contains a description of where the data came from and how the data has been used to date. This allows different datasets to be compared and the appropriate dataset for the use case to be selected. This research certainly has limitations, so our selection of datasets cannot necessarily be taken as a representation of all available datasets related to cyber risks and cybersecurity. For example, literature searches were conducted in four academic databases and only found datasets that were used in the literature. Many research projects also used old datasets that may no longer consider current developments. In addition, the data are often focused on only one observation and are limited in scope. For example, the datasets can only be applied to specific contexts and are also subject to further limitations (e.g. region, industry, operating system). In the context of the applicability of the datasets, it is unfortunately not possible to make a clear statement on the extent to which they can be integrated into academic or practical areas of application or how great this effort is. Finally, it remains to be pointed out that this is an overview of currently available datasets, which are subject to constant change.

Due to the lack of datasets on cyber risks in the academic literature, additional datasets on cyber risks were integrated as part of a further search. The search was conducted on the Google Dataset search portal. The search term used was ‘cyber risk datasets’. Over 100 results were found. However, due to the low significance and verifiability, only 20 selected datasets were included. These can be found in Table 2  in the “ Appendix ”.

The results of the literature review and datasets also showed that there continues to be a lack of available, open cyber datasets. This lack of data is reflected in cyber insurance, for example, as it is difficult to find a risk-based premium without a sufficient database (Nurse et al. 2020 ). The global cyber insurance market was estimated at USD 5.5 billion in 2020 (Dyson 2020 ). When compared to the USD 1 trillion global losses from cybercrime (Maleks Smith et al. 2020 ), it is clear that there exists a significant cyber risk awareness challenge for both the insurance industry and international commerce. Without comprehensive and qualitative data on cyber losses, it can be difficult to estimate potential losses from cyberattacks and price cyber insurance accordingly (GAO 2021 ). For instance, the average cyber insurance loss increased from USD 145,000 in 2019 to USD 359,000 in 2020 (FitchRatings 2021 ). Cyber insurance is an important risk management tool to mitigate the financial impact of cybercrime. This is particularly evident in the impact of different industries. In the Energy & Commodities financial markets, a ransomware attack on the Colonial Pipeline led to a substantial impact on the U.S. economy. As a result of the attack, about 45% of the U.S. East Coast was temporarily unable to obtain supplies of diesel, petrol and jet fuel. This caused the average price in the U.S. to rise 7 cents to USD 3.04 per gallon, the highest in seven years (Garber 2021 ). In addition, Colonial Pipeline confirmed that it paid a USD 4.4 million ransom to a hacker gang after the attack. Another ransomware attack occurred in the healthcare and government sector. The victim of this attack was the Irish Health Service Executive (HSE). A ransom payment of USD 20 million was demanded from the Irish government to restore services after the hack (Tidy 2021 ). In the car manufacturing sector, Miller and Valasek ( 2015 ) initiated a cyberattack that resulted in the recall of 1.4 million vehicles and cost manufacturers EUR 761 million. The risk that arises in the context of these events is the potential for the accumulation of cyber losses, which is why cyber insurers are not expanding their capacity. An example of this accumulation of cyber risks is the NotPetya malware attack, which originated in Russia, struck in Ukraine, and rapidly spread around the world, causing at least USD 10 billion in damage (GAO 2021 ). These events highlight the importance of proper cyber risk management.

This research provides cyber insurance stakeholders with an overview of cyber datasets. Cyber insurers can use the open datasets to improve their understanding and assessment of cyber risks. For example, the impact datasets can be used to better measure financial impacts and their frequencies. These data could be combined with existing portfolio data from cyber insurers and integrated with existing pricing tools and factors to better assess cyber risk valuation. Although most cyber insurers have sparse historical cyber policy and claims data, they remain too small at present for accurate prediction (Bessy-Roland et al. 2021 ). A combination of portfolio data and external datasets would support risk-adjusted pricing for cyber insurance, which would also benefit policyholders. In addition, cyber insurance stakeholders can use the datasets to identify patterns and make better predictions, which would benefit sustainable cyber insurance coverage. In terms of cyber risk cause datasets, cyber insurers can use the data to review their insurance products. For example, the data could provide information on which cyber risks have not been sufficiently considered in product design or where improvements are needed. A combination of cyber cause and cybersecurity datasets can help establish uniform definitions to provide greater transparency and clarity. Consistent terminology could lead to a more sustainable cyber market, where cyber insurers make informed decisions about the level of coverage and policyholders understand their coverage (The Geneva Association 2020).

In addition to the cyber insurance community, this research also supports cybersecurity stakeholders. The reviewed literature can be used to provide a contemporary, contextual and categorised summary of available datasets. This supports efficient and timely progress in cyber risk research and is beneficial given the dynamic nature of cyber risks. With the help of the described cybersecurity datasets and the identified information, a comparison of different datasets is possible. The datasets can be used to evaluate the effectiveness of countermeasures in simulated cyberattacks or to test intrusion detection systems.

In this paper, we conducted a systematic review of studies on cyber risk and cybersecurity databases. We found that most of the datasets are in the field of intrusion detection and machine learning and are used for technical cybersecurity aspects. The available datasets on cyber risks were relatively less represented. Due to the dynamic nature and lack of historical data, assessing and understanding cyber risk is a major challenge for cyber insurance stakeholders. To address this challenge, a greater density of cyber data is needed to support cyber insurers in risk management and researchers with cyber risk-related topics. With reference to ‘Open Science’ FAIR data (Jacobsen et al. 2020 ), mandatory reporting of cyber incidents could help improve cyber understanding, awareness and loss prevention among companies and insurers. Through greater availability of data, cyber risks can be better understood, enabling researchers to conduct more in-depth research into these risks. Companies could incorporate this new knowledge into their corporate culture to reduce cyber risks. For insurance companies, this would have the advantage that all insurers would have the same understanding of cyber risks, which would support sustainable risk-based pricing. In addition, common definitions of cyber risks could be derived from new data.

The cybersecurity databases summarised and categorised in this research could provide a different perspective on cyber risks that would enable the formulation of common definitions in cyber policies. The datasets can help companies addressing cybersecurity and cyber risk as part of risk management assess their internal cyber posture and cybersecurity measures. The paper can also help improve risk awareness and corporate behaviour, and provides the research community with a comprehensive overview of peer-reviewed datasets and other available datasets in the area of cyber risk and cybersecurity. This approach is intended to support the free availability of data for research. The complete tabulated review of the literature is included in the Supplementary Material.

This work provides directions for several paths of future work. First, there are currently few publicly available datasets for cyber risk and cybersecurity. The older datasets that are still widely used no longer reflect today's technical environment. Moreover, they can often only be used in one context, and the scope of the samples is very limited. It would be of great value if more datasets were publicly available that reflect current environmental conditions. This could help intrusion detection systems to consider current events and thus lead to a higher success rate. It could also compensate for the disadvantages of older datasets by collecting larger quantities of samples and making this contextualisation more widespread. Another area of research may be the integratability and adaptability of cybersecurity and cyber risk datasets. For example, it is often unclear to what extent datasets can be integrated or adapted to existing data. For cyber risks and cybersecurity, it would be helpful to know what requirements need to be met or what is needed to use the datasets appropriately. In addition, it would certainly be helpful to know whether datasets can be modified to be used for cyber risks or cybersecurity. Finally, the ability for stakeholders to identify machine-readable cybersecurity datasets would be useful because it would allow for even clearer delineations or comparisons between datasets. Due to the lack of publicly available datasets, concrete benchmarks often cannot be applied.

Average cost of a breach of more than 50 million records.

Aamir, M., S.S.H. Rizvi, M.A. Hashmani, M. Zubair, and J. Ahmad. 2021. Machine learning classification of port scanning and DDoS attacks: A comparative analysis. Mehran University Research Journal of Engineering and Technology 40 (1): 215–229. https://doi.org/10.22581/muet1982.2101.19 .

Article   Google Scholar  

Aamir, M., and S.M.A. Zaidi. 2019. DDoS attack detection with feature engineering and machine learning: The framework and performance evaluation. International Journal of Information Security 18 (6): 761–785. https://doi.org/10.1007/s10207-019-00434-1 .

Aassal, A. El, S. Baki, A. Das, and R.M. Verma. 2020. 2020. An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8: 22170–22192. https://doi.org/10.1109/ACCESS.2020.2969780 .

Abu Al-Haija, Q., and S. Zein-Sabatto. 2020. An efficient deep-learning-based detection and classification system for cyber-attacks in IoT communication networks. Electronics 9 (12): 26. https://doi.org/10.3390/electronics9122152 .

Adhikari, U., T.H. Morris, and S.Y. Pan. 2018. Applying Hoeffding adaptive trees for real-time cyber-power event and intrusion classification. IEEE Transactions on Smart Grid 9 (5): 4049–4060. https://doi.org/10.1109/tsg.2017.2647778 .

Agarwal, A., P. Sharma, M. Alshehri, A.A. Mohamed, and O. Alfarraj. 2021. Classification model for accuracy and intrusion detection using machine learning approach. PeerJ Computer Science . https://doi.org/10.7717/peerj-cs.437 .

Agrafiotis, I., J.R.C.. Nurse, M. Goldsmith, S. Creese, and D. Upton. 2018. A taxonomy of cyber-harms: Defining the impacts of cyber-attacks and understanding how they propagate. Journal of Cybersecurity 4: tyy006.

Agrawal, A., S. Mohammed, and J. Fiaidhi. 2019. Ensemble technique for intruder detection in network traffic. International Journal of Security and Its Applications 13 (3): 1–8. https://doi.org/10.33832/ijsia.2019.13.3.01 .

Ahmad, I., and R.A. Alsemmeari. 2020. Towards improving the intrusion detection through ELM (extreme learning machine). CMC Computers Materials & Continua 65 (2): 1097–1111. https://doi.org/10.32604/cmc.2020.011732 .

Ahmed, M., A.N. Mahmood, and J.K. Hu. 2016. A survey of network anomaly detection techniques. Journal of Network and Computer Applications 60: 19–31. https://doi.org/10.1016/j.jnca.2015.11.016 .

Al-Jarrah, O.Y., O. Alhussein, P.D. Yoo, S. Muhaidat, K. Taha, and K. Kim. 2016. Data randomization and cluster-based partitioning for Botnet intrusion detection. IEEE Transactions on Cybernetics 46 (8): 1796–1806. https://doi.org/10.1109/TCYB.2015.2490802 .

Al-Mhiqani, M.N., R. Ahmad, Z.Z. Abidin, W. Yassin, A. Hassan, K.H. Abdulkareem, N.S. Ali, and Z. Yunos. 2020. A review of insider threat detection: Classification, machine learning techniques, datasets, open challenges, and recommendations. Applied Sciences—Basel 10 (15): 41. https://doi.org/10.3390/app10155208 .

Al-Omari, M., M. Rawashdeh, F. Qutaishat, M. Alshira’H, and N. Ababneh. 2021. An intelligent tree-based intrusion detection model for cyber security. Journal of Network and Systems Management 29 (2): 18. https://doi.org/10.1007/s10922-021-09591-y .

Alabdallah, A., and M. Awad. 2018. Using weighted Support Vector Machine to address the imbalanced classes problem of Intrusion Detection System. KSII Transactions on Internet and Information Systems 12 (10): 5143–5158. https://doi.org/10.3837/tiis.2018.10.027 .

Alazab, M., M. Alazab, A. Shalaginov, A. Mesleh, and A. Awajan. 2020. Intelligent mobile malware detection using permission requests and API calls. Future Generation Computer Systems—the International Journal of eScience 107: 509–521. https://doi.org/10.1016/j.future.2020.02.002 .

Albahar, M.A., R.A. Al-Falluji, and M. Binsawad. 2020. An empirical comparison on malicious activity detection using different neural network-based models. IEEE Access 8: 61549–61564. https://doi.org/10.1109/ACCESS.2020.2984157 .

AlEroud, A.F., and G. Karabatis. 2018. Queryable semantics to detect cyber-attacks: A flow-based detection approach. IEEE Transactions on Systems, Man, and Cybernetics: Systems 48 (2): 207–223. https://doi.org/10.1109/TSMC.2016.2600405 .

Algarni, A.M., V. Thayananthan, and Y.K. Malaiya. 2021. Quantitative assessment of cybersecurity risks for mitigating data breaches in business systems. Applied Sciences (switzerland) . https://doi.org/10.3390/app11083678 .

Alhowaide, A., I. Alsmadi, and J. Tang. 2021. Towards the design of real-time autonomous IoT NIDS. Cluster Computing—the Journal of Networks Software Tools and Applications . https://doi.org/10.1007/s10586-021-03231-5 .

Ali, S., and Y. Li. 2019. Learning multilevel auto-encoders for DDoS attack detection in smart grid network. IEEE Access 7: 108647–108659. https://doi.org/10.1109/ACCESS.2019.2933304 .

AlKadi, O., N. Moustafa, B. Turnbull, and K.K.R. Choo. 2019. Mixture localization-based outliers models for securing data migration in cloud centers. IEEE Access 7: 114607–114618. https://doi.org/10.1109/ACCESS.2019.2935142 .

Allianz. 2021. Allianz Risk Barometer. https://www.agcs.allianz.com/content/dam/onemarketing/agcs/agcs/reports/Allianz-Risk-Barometer-2021.pdf . Accessed 15 May 2021.

Almiani, M., A. AbuGhazleh, A. Al-Rahayfeh, S. Atiewi, and Razaque, A. 2020. Deep recurrent neural network for IoT intrusion detection system. Simulation Modelling Practice and Theory 101: 102031. https://doi.org/10.1016/j.simpat.2019.102031

Alsaedi, A., N. Moustafa, Z. Tari, A. Mahmood, and A. Anwar. 2020. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8: 165130–165150. https://doi.org/10.1109/access.2020.3022862 .

Alsamiri, J., and K. Alsubhi. 2019. Internet of Things cyber attacks detection using machine learning. International Journal of Advanced Computer Science and Applications 10 (12): 627–634.

Alsharafat, W. 2013. Applying artificial neural network and eXtended classifier system for network intrusion detection. International Arab Journal of Information Technology 10 (3): 230–238.

Google Scholar  

Amin, R.W., H.E. Sevil, S. Kocak, G. Francia III., and P. Hoover. 2021. The spatial analysis of the malicious uniform resource locators (URLs): 2016 dataset case study. Information (switzerland) 12 (1): 1–18. https://doi.org/10.3390/info12010002 .

Arcuri, M.C., L.Z. Gai, F. Ielasi, and E. Ventisette. 2020. Cyber attacks on hospitality sector: Stock market reaction. Journal of Hospitality and Tourism Technology 11 (2): 277–290. https://doi.org/10.1108/jhtt-05-2019-0080 .

Arp, D., M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C.E.R.T. Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Ndss 14: 23–26.

Ashtiani, M., and M.A. Azgomi. 2014. A distributed simulation framework for modeling cyber attacks and the evaluation of security measures. Simulation 90 (9): 1071–1102. https://doi.org/10.1177/0037549714540221 .

Atefinia, R., and M. Ahmadi. 2021. Network intrusion detection using multi-architectural modular deep neural network. Journal of Supercomputing 77 (4): 3571–3593. https://doi.org/10.1007/s11227-020-03410-y .

Avila, R., R. Khoury, R. Khoury, and F. Petrillo. 2021. Use of security logs for data leak detection: A systematic literature review. Security and Communication Networks 2021: 29. https://doi.org/10.1155/2021/6615899 .

Azeez, N.A., T.J. Ayemobola, S. Misra, R. Maskeliunas, and R. Damasevicius. 2019. Network Intrusion Detection with a Hashing Based Apriori Algorithm Using Hadoop MapReduce. Computers 8 (4): 15. https://doi.org/10.3390/computers8040086 .

Bakdash, J.Z., S. Hutchinson, E.G. Zaroukian, L.R. Marusich, S. Thirumuruganathan, C. Sample, B. Hoffman, and G. Das. 2018. Malware in the future forecasting of analyst detection of cyber events. Journal of Cybersecurity . https://doi.org/10.1093/cybsec/tyy007 .

Barletta, V.S., D. Caivano, A. Nannavecchia, and M. Scalera. 2020. Intrusion detection for in-vehicle communication networks: An unsupervised Kohonen SOM approach. Future Internet . https://doi.org/10.3390/FI12070119 .

Barzegar, M., and M. Shajari. 2018. Attack scenario reconstruction using intrusion semantics. Expert Systems with Applications 108: 119–133. https://doi.org/10.1016/j.eswa.2018.04.030 .

Bessy-Roland, Y., A. Boumezoued, and C. Hillairet. 2021. Multivariate Hawkes process for cyber insurance. Annals of Actuarial Science 15 (1): 14–39.

Bhardwaj, A., V. Mangat, and R. Vig. 2020. Hyperband tuned deep neural network with well posed stacked sparse AutoEncoder for detection of DDoS attacks in cloud. IEEE Access 8: 181916–181929. https://doi.org/10.1109/ACCESS.2020.3028690 .

Bhati, B.S., C.S. Rai, B. Balamurugan, and F. Al-Turjman. 2020. An intrusion detection scheme based on the ensemble of discriminant classifiers. Computers & Electrical Engineering 86: 9. https://doi.org/10.1016/j.compeleceng.2020.106742 .

Bhattacharya, S., S.S.R. Krishnan, P.K.R. Maddikunta, R. Kaluri, S. Singh, T.R. Gadekallu, M. Alazab, and U. Tariq. 2020. A novel PCA-firefly based XGBoost classification model for intrusion detection in networks using GPU. Electronics 9 (2): 16. https://doi.org/10.3390/electronics9020219 .

Bibi, I., A. Akhunzada, J. Malik, J. Iqbal, A. Musaddiq, and S. Kim. 2020. A dynamic DL-driven architecture to combat sophisticated android malware. IEEE Access 8: 129600–129612. https://doi.org/10.1109/ACCESS.2020.3009819 .

Biener, C., M. Eling, and J.H. Wirfs. 2015. Insurability of cyber risk: An empirical analysis. The   Geneva Papers on Risk and Insurance—Issues and Practice 40 (1): 131–158. https://doi.org/10.1057/gpp.2014.19 .

Binbusayyis, A., and T. Vaiyapuri. 2019. Identifying and benchmarking key features for cyber intrusion detection: An ensemble approach. IEEE Access 7: 106495–106513. https://doi.org/10.1109/ACCESS.2019.2929487 .

Biswas, R., and S. Roy. 2021. Botnet traffic identification using neural networks. Multimedia Tools and Applications . https://doi.org/10.1007/s11042-021-10765-8 .

Bouyeddou, B., F. Harrou, B. Kadri, and Y. Sun. 2021. Detecting network cyber-attacks using an integrated statistical approach. Cluster Computing—the Journal of Networks Software Tools and Applications 24 (2): 1435–1453. https://doi.org/10.1007/s10586-020-03203-1 .

Bozkir, A.S., and M. Aydos. 2020. LogoSENSE: A companion HOG based logo detection scheme for phishing web page and E-mail brand recognition. Computers & Security 95: 18. https://doi.org/10.1016/j.cose.2020.101855 .

Brower, D., and M. McCormick. 2021. Colonial pipeline resumes operations following ransomware attack. Financial Times .

Cai, H., F. Zhang, and A. Levi. 2019. An unsupervised method for detecting shilling attacks in recommender systems by mining item relationship and identifying target items. The Computer Journal 62 (4): 579–597. https://doi.org/10.1093/comjnl/bxy124 .

Cebula, J.J., M.E. Popeck, and L.R. Young. 2014. A Taxonomy of Operational Cyber Security Risks Version 2 .

Chadza, T., K.G. Kyriakopoulos, and S. Lambotharan. 2020. Learning to learn sequential network attacks using hidden Markov models. IEEE Access 8: 134480–134497. https://doi.org/10.1109/ACCESS.2020.3011293 .

Chatterjee, S., and S. Thekdi. 2020. An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems. Reliability Engineering and System Safety . https://doi.org/10.1016/j.ress.2019.106664 .

Chattopadhyay, M., R. Sen, and S. Gupta. 2018. A comprehensive review and meta-analysis on applications of machine learning techniques in intrusion detection. Australasian Journal of Information Systems 22: 27.

Chen, H.S., and J. Fiscus. 2018. The inhospitable vulnerability: A need for cybersecurity risk assessment in the hospitality industry. Journal of Hospitality and Tourism Technology 9 (2): 223–234. https://doi.org/10.1108/JHTT-07-2017-0044 .

Chhabra, G.S., V.P. Singh, and M. Singh. 2020. Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimedia Tools and Applications 79 (23–24): 15881–15900. https://doi.org/10.1007/s11042-018-6338-1 .

Chiba, Z., N. Abghour, K. Moussaid, A. Elomri, and M. Rida. 2019. Intelligent approach to build a Deep Neural Network based IDS for cloud environment using combination of machine learning algorithms. Computers and Security 86: 291–317. https://doi.org/10.1016/j.cose.2019.06.013 .

Choras, M., and R. Kozik. 2015. Machine learning techniques applied to detect cyber attacks on web applications. Logic Journal of the IGPL 23 (1): 45–56. https://doi.org/10.1093/jigpal/jzu038 .

Chowdhury, S., M. Khanzadeh, R. Akula, F. Zhang, S. Zhang, H. Medal, M. Marufuzzaman, and L. Bian. 2017. Botnet detection using graph-based feature clustering. Journal of Big Data 4 (1): 14. https://doi.org/10.1186/s40537-017-0074-7 .

Cost Of A Cyber Incident: Systematic Review And Cross-Validation, Cybersecurity & Infrastructure Agency , 1, https://www.cisa.gov/sites/default/files/publications/CISA-OCE_Cost_of_Cyber_Incidents_Study-FINAL_508.pdf (2020).

D’Hooge, L., T. Wauters, B. Volckaert, and F. De Turck. 2019. Classification hardness for supervised learners on 20 years of intrusion detection data. IEEE Access 7: 167455–167469. https://doi.org/10.1109/access.2019.2953451 .

Damasevicius, R., A. Venckauskas, S. Grigaliunas, J. Toldinas, N. Morkevicius, T. Aleliunas, and P. Smuikys. 2020. LITNET-2020: An annotated real-world network flow dataset for network intrusion detection. Electronics 9 (5): 23. https://doi.org/10.3390/electronics9050800 .

De Giovanni, A.L.D., and M. Pirra. 2020. On the determinants of data breaches: A cointegration analysis. Decisions in Economics and Finance . https://doi.org/10.1007/s10203-020-00301-y .

Deng, L., D. Li, X. Yao, and H. Wang. 2019. Retracted Article: Mobile network intrusion detection for IoT system based on transfer learning algorithm. Cluster Computing 22 (4): 9889–9904. https://doi.org/10.1007/s10586-018-1847-2 .

Donkal, G., and G.K. Verma. 2018. A multimodal fusion based framework to reinforce IDS for securing Big Data environment using Spark. Journal of Information Security and Applications 43: 1–11. https://doi.org/10.1016/j.jisa.2018.10.001 .

Dunn, C., N. Moustafa, and B. Turnbull. 2020. Robustness evaluations of sustainable machine learning models against data Poisoning attacks in the Internet of Things. Sustainability 12 (16): 17. https://doi.org/10.3390/su12166434 .

Dwivedi, S., M. Vardhan, and S. Tripathi. 2021. Multi-parallel adaptive grasshopper optimization technique for detecting anonymous attacks in wireless networks. Wireless Personal Communications . https://doi.org/10.1007/s11277-021-08368-5 .

Dyson, B. 2020. COVID-19 crisis could be ‘watershed’ for cyber insurance, says Swiss Re exec. https://www.spglobal.com/marketintelligence/en/news-insights/latest-news-headlines/covid-19-crisis-could-be-watershed-for-cyber-insurance-says-swiss-re-exec-59197154 . Accessed 7 May 2020.

EIOPA. 2018. Understanding cyber insurance—a structured dialogue with insurance companies. https://www.eiopa.europa.eu/sites/default/files/publications/reports/eiopa_understanding_cyber_insurance.pdf . Accessed 28 May 2018

Elijah, A.V., A. Abdullah, N.Z. JhanJhi, M. Supramaniam, and O.B. Abdullateef. 2019. Ensemble and deep-learning methods for two-class and multi-attack anomaly intrusion detection: An empirical study. International Journal of Advanced Computer Science and Applications 10 (9): 520–528.

Eling, M., and K. Jung. 2018. Copula approaches for modeling cross-sectional dependence of data breach losses. Insurance Mathematics & Economics 82: 167–180. https://doi.org/10.1016/j.insmatheco.2018.07.003 .

Eling, M., and W. Schnell. 2016. What do we know about cyber risk and cyber risk insurance? Journal of Risk Finance 17 (5): 474–491. https://doi.org/10.1108/jrf-09-2016-0122 .

Eling, M., and J. Wirfs. 2019. What are the actual costs of cyber risk events? European Journal of Operational Research 272 (3): 1109–1119. https://doi.org/10.1016/j.ejor.2018.07.021 .

Eling, M. 2020. Cyber risk research in business and actuarial science. European Actuarial Journal 10 (2): 303–333.

Elmasry, W., A. Akbulut, and A.H. Zaim. 2019. Empirical study on multiclass classification-based network intrusion detection. Computational Intelligence 35 (4): 919–954. https://doi.org/10.1111/coin.12220 .

Elsaid, S.A., and N.S. Albatati. 2020. An optimized collaborative intrusion detection system for wireless sensor networks. Soft Computing 24 (16): 12553–12567. https://doi.org/10.1007/s00500-020-04695-0 .

Estepa, R., J.E. Díaz-Verdejo, A. Estepa, and G. Madinabeitia. 2020. How much training data is enough? A case study for HTTP anomaly-based intrusion detection. IEEE Access 8: 44410–44425. https://doi.org/10.1109/ACCESS.2020.2977591 .

European Council. 2021. Cybersecurity: how the EU tackles cyber threats. https://www.consilium.europa.eu/en/policies/cybersecurity/ . Accessed 10 May 2021

Falco, G. et al. 2019. Cyber risk research impeded by disciplinary barriers. Science (American Association for the Advancement of Science) 366 (6469): 1066–1069.

Fan, Z.J., Z.P. Tan, C.X. Tan, and X. Li. 2018. An improved integrated prediction method of cyber security situation based on spatial-time analysis. Journal of Internet Technology 19 (6): 1789–1800. https://doi.org/10.3966/160792642018111906015 .

Fang, Z.J., M.C. Xu, S.H. Xu, and T.Z. Hu. 2021. A framework for predicting data breach risk: Leveraging dependence to cope with sparsity. IEEE Transactions on Information Forensics and Security 16: 2186–2201. https://doi.org/10.1109/tifs.2021.3051804 .

Farkas, S., O. Lopez, and M. Thomas. 2021. Cyber claim analysis using Generalized Pareto regression trees with applications to insurance. Insurance: Mathematics and Economics 98: 92–105. https://doi.org/10.1016/j.insmatheco.2021.02.009 .

Farsi, H., A. Fanian, and Z. Taghiyarrenani. 2019. A novel online state-based anomaly detection system for process control networks. International Journal of Critical Infrastructure Protection 27: 11. https://doi.org/10.1016/j.ijcip.2019.100323 .

Ferrag, M.A., L. Maglaras, S. Moschoyiannis, and H. Janicke. 2020. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications 50: 19. https://doi.org/10.1016/j.jisa.2019.102419 .

Field, M. 2018. WannaCry cyber attack cost the NHS £92m as 19,000 appointments cancelled. https://www.telegraph.co.uk/technology/2018/10/11/wannacry-cyber-attack-cost-nhs-92m-19000-appointments-cancelled/ . Accessed 9 May 2018.

FitchRatings. 2021. U.S. Cyber Insurance Market Update (Spike in Claims Leads to Decline in 2020 Underwriting Performance). https://www.fitchratings.com/research/insurance/us-cyber-insurance-market-update-spike-in-claims-leads-to-decline-in-2020-underwriting-performance-26-05-2021 .

Fossaceca, J.M., T.A. Mazzuchi, and S. Sarkani. 2015. MARK-ELM: Application of a novel Multiple Kernel Learning framework for improving the robustness of network intrusion detection. Expert Systems with Applications 42 (8): 4062–4080. https://doi.org/10.1016/j.eswa.2014.12.040 .

Franke, U., and J. Brynielsson. 2014. Cyber situational awareness–a systematic review of the literature. Computers & security 46: 18–31.

Freeha, K., K.J. Hwan, M. Lars, and M. Robin. 2021. Data breach management: An integrated risk model. Information & Management 58 (1): 103392. https://doi.org/10.1016/j.im.2020.103392 .

Ganeshan, R., and P. Rodrigues. 2020. Crow-AFL: Crow based adaptive fractional lion optimization approach for the intrusion detection. Wireless Personal Communications 111 (4): 2065–2089. https://doi.org/10.1007/s11277-019-06972-0 .

GAO. 2021. CYBER INSURANCE—Insurers and policyholders face challenges in an evolving market. https://www.gao.gov/assets/gao-21-477.pdf . Accessed 16 May 2021.

Garber, J. 2021. Colonial Pipeline fiasco foreshadows impact of Biden energy policy. https://www.foxbusiness.com/markets/colonial-pipeline-fiasco-foreshadows-impact-of-biden-energy-policy . Accessed 4 May 2021.

Gauthama Raman, M.R., N. Somu, S. Jagarapu, T. Manghnani, T. Selvam, K. Krithivasan, and V.S. Shankar Sriram. 2020. An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm. Artificial Intelligence Review 53 (5): 3255–3286. https://doi.org/10.1007/s10462-019-09762-z .

Gavel, S., A.S. Raghuvanshi, and S. Tiwari. 2021. Distributed intrusion detection scheme using dual-axis dimensionality reduction for Internet of things (IoT). Journal of Supercomputing . https://doi.org/10.1007/s11227-021-03697-5 .

GDPR.EU. 2021. FAQ. https://gdpr.eu/faq/ . Accessed 10 May 2021.

Georgescu, T.M., B. Iancu, and M. Zurini. 2019. Named-entity-recognition-based automated system for diagnosing cybersecurity situations in IoT networks. Sensors (switzerland) . https://doi.org/10.3390/s19153380 .

Giudici, P., and E. Raffinetti. 2020. Cyber risk ordering with rank-based statistical models. AStA Advances in Statistical Analysis . https://doi.org/10.1007/s10182-020-00387-0 .

Goh, J., S. Adepu, K.N. Junejo, and A. Mathur. 2016. A dataset to support research in the design of secure water treatment systems. In CRITIS.

Gong, X.Y., J.L. Lu, Y.F. Zhou, H. Qiu, and R. He. 2021. Model uncertainty based annotation error fixing for web attack detection. Journal of Signal Processing Systems for Signal Image and Video Technology 93 (2–3): 187–199. https://doi.org/10.1007/s11265-019-01494-1 .

Goode, S., H. Hoehle, V. Venkatesh, and S.A. Brown. 2017. USER compensation as a data breach recovery action: An investigation of the sony playstation network breach. MIS Quarterly 41 (3): 703–727.

Guo, H., S. Huang, C. Huang, Z. Pan, M. Zhang, and F. Shi. 2020. File entropy signal analysis combined with wavelet decomposition for malware classification. IEEE Access 8: 158961–158971. https://doi.org/10.1109/ACCESS.2020.3020330 .

Habib, M., I. Aljarah, and H. Faris. 2020. A Modified multi-objective particle swarm optimizer-based Lévy flight: An approach toward intrusion detection in Internet of Things. Arabian Journal for Science and Engineering 45 (8): 6081–6108. https://doi.org/10.1007/s13369-020-04476-9 .

Hajj, S., R. El Sibai, J.B. Abdo, J. Demerjian, A. Makhoul, and C. Guyeux. 2021. Anomaly-based intrusion detection systems: The requirements, methods, measurements, and datasets. Transactions on Emerging Telecommunications Technologies 32 (4): 36. https://doi.org/10.1002/ett.4240 .

Heartfield, R., G. Loukas, A. Bezemskij, and E. Panaousis. 2021. Self-configurable cyber-physical intrusion detection for smart homes using reinforcement learning. IEEE Transactions on Information Forensics and Security 16: 1720–1735. https://doi.org/10.1109/tifs.2020.3042049 .

Hemo, B., T. Gafni, K. Cohen, and Q. Zhao. 2020. Searching for anomalies over composite hypotheses. IEEE Transactions on Signal Processing 68: 1181–1196. https://doi.org/10.1109/TSP.2020.2971438

Hindy, H., D. Brosset, E. Bayne, A.K. Seeam, C. Tachtatzis, R. Atkinson, and X. Bellekens. 2020. A taxonomy of network threats and the effect of current datasets on intrusion detection systems. IEEE Access 8: 104650–104675. https://doi.org/10.1109/ACCESS.2020.3000179 .

Hong, W., D. Huang, C. Chen, and J. Lee. 2020. Towards accurate and efficient classification of power system contingencies and cyber-attacks using recurrent neural networks. IEEE Access 8: 123297–123309. https://doi.org/10.1109/ACCESS.2020.3007609 .

Husák, M., M. Zádník, V. Bartos, and P. Sokol. 2020. Dataset of intrusion detection alerts from a sharing platform. Data in Brief 33: 106530.

IBM Security. 2020. Cost of a Data breach Report. https://www.capita.com/sites/g/files/nginej291/files/2020-08/Ponemon-Global-Cost-of-Data-Breach-Study-2020.pdf . Accessed 19 May 2021.

IEEE. 2021. IEEE Quick Facts. https://www.ieee.org/about/at-a-glance.html . Accessed 11 May 2021.

Kilincer, I.F., F. Ertam, and S. Abdulkadir. 2021. Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks 188: 107840. https://doi.org/10.1016/j.comnet.2021.107840 .

Jaber, A.N., and S. Ul Rehman. 2020. FCM-SVM based intrusion detection system for cloud computing environment. Cluster Computing—the Journal of Networks Software Tools and Applications 23 (4): 3221–3231. https://doi.org/10.1007/s10586-020-03082-6 .

Jacobs, J., S. Romanosky, B. Edwards, M. Roytman, and I. Adjerid. 2019. Exploit prediction scoring system (epss). arXiv:1908.04856

Jacobsen, A. et al. 2020. FAIR principles: Interpretations and implementation considerations. Data Intelligence 2 (1–2): 10–29. https://doi.org/10.1162/dint_r_00024 .

Jahromi, A.N., S. Hashemi, A. Dehghantanha, R.M. Parizi, and K.K.R. Choo. 2020. An enhanced stacked LSTM method with no random initialization for malware threat hunting in safety and time-critical systems. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (5): 630–640. https://doi.org/10.1109/TETCI.2019.2910243 .

Jang, S., S. Li, and Y. Sung. 2020. FastText-based local feature visualization algorithm for merged image-based malware classification framework for cyber security and cyber defense. Mathematics 8 (3): 13. https://doi.org/10.3390/math8030460 .

Javeed, D., T.H. Gao, and M.T. Khan. 2021. SDN-enabled hybrid DL-driven framework for the detection of emerging cyber threats in IoT. Electronics 10 (8): 16. https://doi.org/10.3390/electronics10080918 .

Johnson, P., D. Gorton, R. Lagerstrom, and M. Ekstedt. 2016. Time between vulnerability disclosures: A measure of software product vulnerability. Computers & Security 62: 278–295. https://doi.org/10.1016/j.cose.2016.08.004 .

Johnson, P., R. Lagerström, M. Ekstedt, and U. Franke. 2018. Can the common vulnerability scoring system be trusted? A Bayesian analysis. IEEE Transactions on Dependable and Secure Computing 15 (6): 1002–1015. https://doi.org/10.1109/TDSC.2016.2644614 .

Junger, M., V. Wang, and M. Schlömer. 2020. Fraud against businesses both online and offline: Crime scripts, business characteristics, efforts, and benefits. Crime Science 9 (1): 13. https://doi.org/10.1186/s40163-020-00119-4 .

Kalutarage, H.K., H.N. Nguyen, and S.A. Shaikh. 2017. Towards a threat assessment framework for apps collusion. Telecommunication Systems 66 (3): 417–430. https://doi.org/10.1007/s11235-017-0296-1 .

Kamarudin, M.H., C. Maple, T. Watson, and N.S. Safa. 2017. A LogitBoost-based algorithm for detecting known and unknown web attacks. IEEE Access 5: 26190–26200. https://doi.org/10.1109/ACCESS.2017.2766844 .

Kasongo, S.M., and Y.X. Sun. 2020. A deep learning method with wrapper based feature extraction for wireless intrusion detection system. Computers & Security 92: 15. https://doi.org/10.1016/j.cose.2020.101752 .

Keserwani, P.K., M.C. Govil, E.S. Pilli, and P. Govil. 2021. A smart anomaly-based intrusion detection system for the Internet of Things (IoT) network using GWO–PSO–RF model. Journal of Reliable Intelligent Environments 7 (1): 3–21. https://doi.org/10.1007/s40860-020-00126-x .

Keshk, M., E. Sitnikova, N. Moustafa, J. Hu, and I. Khalil. 2021. An integrated framework for privacy-preserving based anomaly detection for cyber-physical systems. IEEE Transactions on Sustainable Computing 6 (1): 66–79. https://doi.org/10.1109/TSUSC.2019.2906657 .

Khan, I.A., D.C. Pi, A.K. Bhatia, N. Khan, W. Haider, and A. Wahab. 2020. Generating realistic IoT-based IDS dataset centred on fuzzy qualitative modelling for cyber-physical systems. Electronics Letters 56 (9): 441–443. https://doi.org/10.1049/el.2019.4158 .

Khraisat, A., I. Gondal, P. Vamplew, J. Kamruzzaman, and A. Alazab. 2020. Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine. Electronics 9 (1): 18. https://doi.org/10.3390/electronics9010173 .

Khraisat, A., I. Gondal, P. Vamplew, and J. Kamruzzaman. 2019. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2 (1): 20. https://doi.org/10.1186/s42400-019-0038-7 .

Kilincer, I.F., F. Ertam, and A. Sengur. 2021. Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks 188: 16. https://doi.org/10.1016/j.comnet.2021.107840 .

Kim, D., and H.K. Kim. 2019. Automated dataset generation system for collaborative research of cyber threat analysis. Security and Communication Networks 2019: 10. https://doi.org/10.1155/2019/6268476 .

Kim, G., C. Lee, J. Jo, and H. Lim. 2020. Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network. International Journal of Machine Learning and Cybernetics 11 (10): 2341–2355. https://doi.org/10.1007/s13042-020-01122-6 .

Kirubavathi, G., and R. Anitha. 2016. Botnet detection via mining of traffic flow characteristics. Computers & Electrical Engineering 50: 91–101. https://doi.org/10.1016/j.compeleceng.2016.01.012 .

Kiwia, D., A. Dehghantanha, K.K.R. Choo, and J. Slaughter. 2018. A cyber kill chain based taxonomy of banking Trojans for evolutionary computational intelligence. Journal of Computational Science 27: 394–409. https://doi.org/10.1016/j.jocs.2017.10.020 .

Koroniotis, N., N. Moustafa, and E. Sitnikova. 2020. A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework. Future Generation Computer Systems 110: 91–106. https://doi.org/10.1016/j.future.2020.03.042 .

Kruse, C.S., B. Frederick, T. Jacobson, and D. Kyle Monticone. 2017. Cybersecurity in healthcare: A systematic review of modern threats and trends. Technology and Health Care 25 (1): 1–10.

Kshetri, N. 2018. The economics of cyber-insurance. IT Professional 20 (6): 9–14. https://doi.org/10.1109/MITP.2018.2874210 .

Kumar, R., P. Kumar, R. Tripathi, G.P. Gupta, T.R. Gadekallu, and G. Srivastava. 2021. SP2F: A secured privacy-preserving framework for smart agricultural Unmanned Aerial Vehicles. Computer Networks . https://doi.org/10.1016/j.comnet.2021.107819 .

Kumar, R., and R. Tripathi. 2021. DBTP2SF: A deep blockchain-based trustworthy privacy-preserving secured framework in industrial internet of things systems. Transactions on Emerging Telecommunications Technologies 32 (4): 27. https://doi.org/10.1002/ett.4222 .

Laso, P.M., D. Brosset, and J. Puentes. 2017. Dataset of anomalies and malicious acts in a cyber-physical subsystem. Data in Brief 14: 186–191. https://doi.org/10.1016/j.dib.2017.07.038 .

Lee, J., J. Kim, I. Kim, and K. Han. 2019. Cyber threat detection based on artificial neural networks using event profiles. IEEE Access 7: 165607–165626. https://doi.org/10.1109/ACCESS.2019.2953095 .

Lee, S.J., P.D. Yoo, A.T. Asyhari, Y. Jhi, L. Chermak, C.Y. Yeun, and K. Taha. 2020. IMPACT: Impersonation attack detection via edge computing using deep Autoencoder and feature abstraction. IEEE Access 8: 65520–65529. https://doi.org/10.1109/ACCESS.2020.2985089 .

Leong, Y.-Y., and Y.-C. Chen. 2020. Cyber risk cost and management in IoT devices-linked health insurance. The Geneva Papers on Risk and Insurance—Issues and Practice 45 (4): 737–759. https://doi.org/10.1057/s41288-020-00169-4 .

Levi, M. 2017. Assessing the trends, scale and nature of economic cybercrimes: overview and Issues: In Cybercrimes, cybercriminals and their policing, in crime, law and social change. Crime, Law and Social Change 67 (1): 3–20. https://doi.org/10.1007/s10611-016-9645-3 .

Li, C., K. Mills, D. Niu, R. Zhu, H. Zhang, and H. Kinawi. 2019a. Android malware detection based on factorization machine. IEEE Access 7: 184008–184019. https://doi.org/10.1109/ACCESS.2019.2958927 .

Li, D.Q., and Q.M. Li. 2020. Adversarial deep ensemble: evasion attacks and defenses for malware detection. IEEE Transactions on Information Forensics and Security 15: 3886–3900. https://doi.org/10.1109/tifs.2020.3003571 .

Li, D.Q., Q.M. Li, Y.F. Ye, and S.H. Xu. 2021. A framework for enhancing deep neural networks against adversarial malware. IEEE Transactions on Network Science and Engineering 8 (1): 736–750. https://doi.org/10.1109/tnse.2021.3051354 .

Li, R.H., C. Zhang, C. Feng, X. Zhang, and C.J. Tang. 2019b. Locating vulnerability in binaries using deep neural networks. IEEE Access 7: 134660–134676. https://doi.org/10.1109/access.2019.2942043 .

Li, X., M. Xu, P. Vijayakumar, N. Kumar, and X. Liu. 2020. Detection of low-frequency and multi-stage attacks in industrial Internet of Things. IEEE Transactions on Vehicular Technology 69 (8): 8820–8831. https://doi.org/10.1109/TVT.2020.2995133 .

Liu, H.Y., and B. Lang. 2019. Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences—Basel 9 (20): 28. https://doi.org/10.3390/app9204396 .

Lopez-Martin, M., B. Carro, and A. Sanchez-Esguevillas. 2020. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Systems with Applications . https://doi.org/10.1016/j.eswa.2019.112963 .

Loukas, G., D. Gan, and Tuan Vuong. 2013. A review of cyber threats and defence approaches in emergency management. Future Internet 5: 205–236.

Luo, C.C., S. Su, Y.B. Sun, Q.J. Tan, M. Han, and Z.H. Tian. 2020. A convolution-based system for malicious URLs detection. CMC—Computers Materials Continua 62 (1): 399–411.

Mahbooba, B., M. Timilsina, R. Sahal, and M. Serrano. 2021. Explainable artificial intelligence (XAI) to enhance trust management in intrusion detection systems using decision tree model. Complexity 2021: 11. https://doi.org/10.1155/2021/6634811 .

Mahdavifar, S., and A.A. Ghorbani. 2020. DeNNeS: Deep embedded neural network expert system for detecting cyber attacks. Neural Computing & Applications 32 (18): 14753–14780. https://doi.org/10.1007/s00521-020-04830-w .

Mahfouz, A., A. Abuhussein, D. Venugopal, and S. Shiva. 2020. Ensemble classifiers for network intrusion detection using a novel network attack dataset. Future Internet 12 (11): 1–19. https://doi.org/10.3390/fi12110180 .

Maleks Smith, Z., E. Lostri, and J.A. Lewis. 2020. The hidden costs of cybercrime. https://www.mcafee.com/enterprise/en-us/assets/reports/rp-hidden-costs-of-cybercrime.pdf . Accessed 16 May 2021.

Malik, J., A. Akhunzada, I. Bibi, M. Imran, A. Musaddiq, and S.W. Kim. 2020. Hybrid deep learning: An efficient reconnaissance and surveillance detection mechanism in SDN. IEEE Access 8: 134695–134706. https://doi.org/10.1109/ACCESS.2020.3009849 .

Manimurugan, S. 2020. IoT-Fog-Cloud model for anomaly detection using improved Naive Bayes and principal component analysis. Journal of Ambient Intelligence and Humanized Computing . https://doi.org/10.1007/s12652-020-02723-3 .

Martin, A., R. Lara-Cabrera, and D. Camacho. 2019. Android malware detection through hybrid features fusion and ensemble classifiers: The AndroPyTool framework and the OmniDroid dataset. Information Fusion 52: 128–142. https://doi.org/10.1016/j.inffus.2018.12.006 .

Mauro, M.D., G. Galatro, and A. Liotta. 2020. Experimental review of neural-based approaches for network intrusion management. IEEE Transactions on Network and Service Management 17 (4): 2480–2495. https://doi.org/10.1109/TNSM.2020.3024225 .

McLeod, A., and D. Dolezel. 2018. Cyber-analytics: Modeling factors associated with healthcare data breaches. Decision Support Systems 108: 57–68. https://doi.org/10.1016/j.dss.2018.02.007 .

Meira, J., R. Andrade, I. Praca, J. Carneiro, V. Bolon-Canedo, A. Alonso-Betanzos, and G. Marreiros. 2020. Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. Journal of Ambient Intelligence and Humanized Computing 11 (11): 4477–4489. https://doi.org/10.1007/s12652-019-01417-9 .

Miao, Y., J. Ma, X. Liu, J. Weng, H. Li, and H. Li. 2019. Lightweight fine-grained search over encrypted data in Fog computing. IEEE Transactions on Services Computing 12 (5): 772–785. https://doi.org/10.1109/TSC.2018.2823309 .

Miller, C., and C. Valasek. 2015. Remote exploitation of an unaltered passenger vehicle. Black Hat USA 2015 (S 91).

Mireles, J.D., E. Ficke, J.H. Cho, P. Hurley, and S.H. Xu. 2019. Metrics towards measuring cyber agility. IEEE Transactions on Information Forensics and Security 14 (12): 3217–3232. https://doi.org/10.1109/tifs.2019.2912551 .

Mishra, N., and S. Pandya. 2021. Internet of Things applications, security challenges, attacks, intrusion detection, and future visions: A systematic review. IEEE Access . https://doi.org/10.1109/ACCESS.2021.3073408 .

Monshizadeh, M., V. Khatri, B.G. Atli, R. Kantola, and Z. Yan. 2019. Performance evaluation of a combined anomaly detection platform. IEEE Access 7: 100964–100978. https://doi.org/10.1109/ACCESS.2019.2930832 .

Moreno, V.C., G. Reniers, E. Salzano, and V. Cozzani. 2018. Analysis of physical and cyber security-related events in the chemical and process industry. Process Safety and Environmental Protection 116: 621–631. https://doi.org/10.1016/j.psep.2018.03.026 .

Moro, E.D. 2020. Towards an economic cyber loss index for parametric cover based on IT security indicator: A preliminary analysis. Risks . https://doi.org/10.3390/risks8020045 .

Moustafa, N., E. Adi, B. Turnbull, and J. Hu. 2018. A new threat intelligence scheme for safeguarding industry 4.0 systems. IEEE Access 6: 32910–32924. https://doi.org/10.1109/ACCESS.2018.2844794 .

Moustakidis, S., and P. Karlsson. 2020. A novel feature extraction methodology using Siamese convolutional neural networks for intrusion detection. Cybersecurity . https://doi.org/10.1186/s42400-020-00056-4 .

Mukhopadhyay, A., S. Chatterjee, K.K. Bagchi, P.J. Kirs, and G.K. Shukla. 2019. Cyber Risk Assessment and Mitigation (CRAM) framework using Logit and Probit models for cyber insurance. Information Systems Frontiers 21 (5): 997–1018. https://doi.org/10.1007/s10796-017-9808-5 .

Murphey, H. 2021a. Biden signs executive order to strengthen US cyber security. https://www.ft.com/content/4d808359-b504-4014-85f6-68e7a2851bf1?accessToken=zwAAAXl0_ifgkc9NgINZtQRAFNOF9mjnooUb8Q.MEYCIQDw46SFWsMn1iyuz3kvgAmn6mxc0rIVfw10Lg1ovJSfJwIhAK2X2URzfSqHwIS7ddRCvSt2nGC2DcdoiDTG49-4TeEt&sharetype=gift?token=fbcd6323-1ecf-4fc3-b136-b5b0dd6a8756 . Accessed 7 May 2021.

Murphey, H. 2021b. Millions of connected devices have security flaws, study shows. https://www.ft.com/content/0bf92003-926d-4dee-87d7-b01f7c3e9621?accessToken=zwAAAXnA7f2Ikc8L-SADkm1N7tOH17AffD6WIQ.MEQCIDjBuROvhmYV0Mx3iB0cEV7m5oND1uaCICxJu0mzxM0PAiBam98q9zfHiTB6hKGr1gGl0Azt85yazdpX9K5sI8se3Q&sharetype=gift?token=2538218d-77d9-4dd3-9649-3cb556a34e51 . Accessed 6 May 2021.

Murugesan, V., M. Shalinie, and M.H. Yang. 2018. Design and analysis of hybrid single packet IP traceback scheme. IET Networks 7 (3): 141–151. https://doi.org/10.1049/iet-net.2017.0115 .

Mwitondi, K.S., and S.A. Zargari. 2018. An iterative multiple sampling method for intrusion detection. Information Security Journal 27 (4): 230–239. https://doi.org/10.1080/19393555.2018.1539790 .

Neto, N.N., S. Madnick, A.M.G. De Paula, and N.M. Borges. 2021. Developing a global data breach database and the challenges encountered. ACM Journal of Data and Information Quality 13 (1): 33. https://doi.org/10.1145/3439873 .

Nurse, J.R.C., L. Axon, A. Erola, I. Agrafiotis, M. Goldsmith, and S. Creese. 2020. The data that drives cyber insurance: A study into the underwriting and claims processes. In 2020 International conference on cyber situational awareness, data analytics and assessment (CyberSA), 15–19 June 2020.

Oliveira, N., I. Praca, E. Maia, and O. Sousa. 2021. Intelligent cyber attack detection and classification for network-based intrusion detection systems. Applied Sciences—Basel 11 (4): 21. https://doi.org/10.3390/app11041674 .

Page, M.J. et al. 2021. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Systematic Reviews 10 (1): 89. https://doi.org/10.1186/s13643-021-01626-4 .

Pajouh, H.H., R. Javidan, R. Khayami, A. Dehghantanha, and K.R. Choo. 2019. A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Transactions on Emerging Topics in Computing 7 (2): 314–323. https://doi.org/10.1109/TETC.2016.2633228 .

Parra, G.D., P. Rad, K.K.R. Choo, and N. Beebe. 2020. Detecting Internet of Things attacks using distributed deep learning. Journal of Network and Computer Applications 163: 13. https://doi.org/10.1016/j.jnca.2020.102662 .

Paté-Cornell, M.E., M. Kuypers, M. Smith, and P. Keller. 2018. Cyber risk management for critical infrastructure: A risk analysis model and three case studies. Risk Analysis 38 (2): 226–241. https://doi.org/10.1111/risa.12844 .

Pooser, D.M., M.J. Browne, and O. Arkhangelska. 2018. Growth in the perception of cyber risk: evidence from U.S. P&C Insurers. The Geneva Papers on Risk and Insurance—Issues and Practice 43 (2): 208–223. https://doi.org/10.1057/s41288-017-0077-9 .

Pu, G., L. Wang, J. Shen, and F. Dong. 2021. A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Science and Technology 26 (2): 146–153. https://doi.org/10.26599/TST.2019.9010051 .

Qiu, J., W. Luo, L. Pan, Y. Tai, J. Zhang, and Y. Xiang. 2019. Predicting the impact of android malicious samples via machine learning. IEEE Access 7: 66304–66316. https://doi.org/10.1109/ACCESS.2019.2914311 .

Qu, X., L. Yang, K. Guo, M. Sun, L. Ma, T. Feng, S. Ren, K. Li, and X. Ma. 2020. Direct batch growth hierarchical self-organizing mapping based on statistics for efficient network intrusion detection. IEEE Access 8: 42251–42260. https://doi.org/10.1109/ACCESS.2020.2976810 .

Rahman, Md.S., S. Halder, Md. Ashraf Uddin, and U.K. Acharjee. 2021. An efficient hybrid system for anomaly detection in social networks. Cybersecurity 4 (1): 10. https://doi.org/10.1186/s42400-021-00074-w .

Ramaiah, M., V. Chandrasekaran, V. Ravi, and N. Kumar. 2021. An intrusion detection system using optimized deep neural network architecture. Transactions on Emerging Telecommunications Technologies 32 (4): 17. https://doi.org/10.1002/ett.4221 .

Raman, M.R.G., K. Kannan, S.K. Pal, and V.S.S. Sriram. 2016. Rough set-hypergraph-based feature selection approach for intrusion detection systems. Defence Science Journal 66 (6): 612–617. https://doi.org/10.14429/dsj.66.10802 .

Rathore, S., J.H. Park. 2018. Semi-supervised learning based distributed attack detection framework for IoT. Applied Soft Computing 72: 79–89. https://doi.org/10.1016/j.asoc.2018.05.049 .

Romanosky, S., L. Ablon, A. Kuehn, and T. Jones. 2019. Content analysis of cyber insurance policies: How do carriers price cyber risk? Journal of Cybersecurity (oxford) 5 (1): tyz002.

Sarabi, A., P. Naghizadeh, Y. Liu, and M. Liu. 2016. Risky business: Fine-grained data breach prediction using business profiles. Journal of Cybersecurity 2 (1): 15–28. https://doi.org/10.1093/cybsec/tyw004 .

Sardi, Alberto, Alessandro Rizzi, Enrico Sorano, and Anna Guerrieri. 2021. Cyber risk in health facilities: A systematic literature review. Sustainability 12 (17): 7002.

Sarker, Iqbal H., A.S.M. Kayes, Shahriar Badsha, Hamed Alqahtani, Paul Watters, and Alex Ng. 2020. Cybersecurity data science: An overview from machine learning perspective. Journal of Big Data 7 (1): 41. https://doi.org/10.1186/s40537-020-00318-5 .

Scopus. 2021. Factsheet. https://www.elsevier.com/__data/assets/pdf_file/0017/114533/Scopus_GlobalResearch_Factsheet2019_FINAL_WEB.pdf . Accessed 11 May 2021.

Sentuna, A., A. Alsadoon, P.W.C. Prasad, M. Saadeh, and O.H. Alsadoon. 2021. A novel Enhanced Naïve Bayes Posterior Probability (ENBPP) using machine learning: Cyber threat analysis. Neural Processing Letters 53 (1): 177–209. https://doi.org/10.1007/s11063-020-10381-x .

Shaukat, K., S.H. Luo, V. Varadharajan, I.A. Hameed, S. Chen, D.X. Liu, and J.M. Li. 2020. Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13 (10): 27. https://doi.org/10.3390/en13102509 .

Sheehan, B., F. Murphy, M. Mullins, and C. Ryan. 2019. Connected and autonomous vehicles: A cyber-risk classification framework. Transportation Research Part a: Policy and Practice 124: 523–536. https://doi.org/10.1016/j.tra.2018.06.033 .

Sheehan, B., F. Murphy, A.N. Kia, and R. Kiely. 2021. A quantitative bow-tie cyber risk classification and assessment framework. Journal of Risk Research 24 (12): 1619–1638.

Shlomo, A., M. Kalech, and R. Moskovitch. 2021. Temporal pattern-based malicious activity detection in SCADA systems. Computers & Security 102: 17. https://doi.org/10.1016/j.cose.2020.102153 .

Singh, K.J., and T. De. 2020. Efficient classification of DDoS attacks using an ensemble feature selection algorithm. Journal of Intelligent Systems 29 (1): 71–83. https://doi.org/10.1515/jisys-2017-0472 .

Skrjanc, I., S. Ozawa, T. Ban, and D. Dovzan. 2018. Large-scale cyber attacks monitoring using Evolving Cauchy Possibilistic Clustering. Applied Soft Computing 62: 592–601. https://doi.org/10.1016/j.asoc.2017.11.008 .

Smart, W. 2018. Lessons learned review of the WannaCry Ransomware Cyber Attack. https://www.england.nhs.uk/wp-content/uploads/2018/02/lessons-learned-review-wannacry-ransomware-cyber-attack-cio-review.pdf . Accessed 7 May 2021.

Sornette, D., T. Maillart, and W. Kröger. 2013. Exploring the limits of safety analysis in complex technological systems. International Journal of Disaster Risk Reduction 6: 59–66. https://doi.org/10.1016/j.ijdrr.2013.04.002 .

Sovacool, B.K. 2008. The costs of failure: A preliminary assessment of major energy accidents, 1907–2007. Energy Policy 36 (5): 1802–1820. https://doi.org/10.1016/j.enpol.2008.01.040 .

SpringerLink. 2021. Journal Search. https://rd.springer.com/search?facet-content-type=%22Journal%22 . Accessed 11 May 2021.

Stojanovic, B., K. Hofer-Schmitz, and U. Kleb. 2020. APT datasets and attack modeling for automated detection methods: A review. Computers & Security 92: 19. https://doi.org/10.1016/j.cose.2020.101734 .

Subroto, A., and A. Apriyana. 2019. Cyber risk prediction through social media big data analytics and statistical machine learning. Journal of Big Data . https://doi.org/10.1186/s40537-019-0216-1 .

Tan, Z., A. Jamdagni, X. He, P. Nanda, R.P. Liu, and J. Hu. 2015. Detection of denial-of-service attacks based on computer vision techniques. IEEE Transactions on Computers 64 (9): 2519–2533. https://doi.org/10.1109/TC.2014.2375218 .

Tidy, J. 2021. Irish cyber-attack: Hackers bail out Irish health service for free. https://www.bbc.com/news/world-europe-57197688 . Accessed 6 May 2021.

Tuncer, T., F. Ertam, and S. Dogan. 2020. Automated malware recognition method based on local neighborhood binary pattern. Multimedia Tools and Applications 79 (37–38): 27815–27832. https://doi.org/10.1007/s11042-020-09376-6 .

Uhm, Y., and W. Pak. 2021. Service-aware two-level partitioning for machine learning-based network intrusion detection with high performance and high scalability. IEEE Access 9: 6608–6622. https://doi.org/10.1109/ACCESS.2020.3048900 .

Ulven, J.B., and G. Wangen. 2021. A systematic review of cybersecurity risks in higher education. Future Internet 13 (2): 1–40. https://doi.org/10.3390/fi13020039 .

Vaccari, I., G. Chiola, M. Aiello, M. Mongelli, and E. Cambiaso. 2020. MQTTset, a new dataset for machine learning techniques on MQTT. Sensors 20 (22): 17. https://doi.org/10.3390/s20226578 .

Valeriano, B., and R.C. Maness. 2014. The dynamics of cyber conflict between rival antagonists, 2001–11. Journal of Peace Research 51 (3): 347–360. https://doi.org/10.1177/0022343313518940 .

Varghese, J.E., and B. Muniyal. 2021. An Efficient IDS framework for DDoS attacks in SDN environment. IEEE Access 9: 69680–69699. https://doi.org/10.1109/ACCESS.2021.3078065 .

Varsha, M. V., P. Vinod, K.A. Dhanya. 2017 Identification of malicious android app using manifest and opcode features. Journal of Computer Virology and Hacking Techniques 13 (2): 125–138. https://doi.org/10.1007/s11416-016-0277-z

Velliangiri, S., and H.M. Pandey. 2020. Fuzzy-Taylor-elephant herd optimization inspired Deep Belief Network for DDoS attack detection and comparison with state-of-the-arts algorithms. Future Generation Computer Systems—the International Journal of Escience 110: 80–90. https://doi.org/10.1016/j.future.2020.03.049 .

Verma, A., and V. Ranga. 2020. Machine learning based intrusion detection systems for IoT applications. Wireless Personal Communications 111 (4): 2287–2310. https://doi.org/10.1007/s11277-019-06986-8 .

Vidros, S., C. Kolias, G. Kambourakis, and L. Akoglu. 2017. Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Future Internet 9 (1): 19. https://doi.org/10.3390/fi9010006 .

Vinayakumar, R., M. Alazab, K.P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman. 2019. Deep learning approach for intelligent intrusion detection system. IEEE Access 7: 41525–41550. https://doi.org/10.1109/access.2019.2895334 .

Walker-Roberts, S., M. Hammoudeh, O. Aldabbas, M. Aydin, and A. Dehghantanha. 2020. Threats on the horizon: Understanding security threats in the era of cyber-physical systems. Journal of Supercomputing 76 (4): 2643–2664. https://doi.org/10.1007/s11227-019-03028-9 .

Web of Science. 2021. Web of Science: Science Citation Index Expanded. https://clarivate.com/webofsciencegroup/solutions/webofscience-scie/ . Accessed 11 May 2021.

World Economic Forum. 2020. WEF Global Risk Report. http://www3.weforum.org/docs/WEF_Global_Risk_Report_2020.pdf . Accessed 13 May 2020.

Xin, Y., L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and C. Wang. 2018. Machine learning and deep learning methods for cybersecurity. IEEE Access 6: 35365–35381. https://doi.org/10.1109/ACCESS.2018.2836950 .

Xu, C., J. Zhang, K. Chang, and C. Long. 2013. Uncovering collusive spammers in Chinese review websites. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management.

Yang, J., T. Li, G. Liang, W. He, and Y. Zhao. 2019. A Simple recurrent unit model based intrusion detection system with DCGAN. IEEE Access 7: 83286–83296. https://doi.org/10.1109/ACCESS.2019.2922692 .

Yuan, B.G., J.F. Wang, D. Liu, W. Guo, P. Wu, and X.H. Bao. 2020. Byte-level malware classification based on Markov images and deep learning. Computers & Security 92: 12. https://doi.org/10.1016/j.cose.2020.101740 .

Zhang, S., X.M. Ou, and D. Caragea. 2015. Predicting cyber risks through national vulnerability database. Information Security Journal 24 (4–6): 194–206. https://doi.org/10.1080/19393555.2015.1111961 .

Zhang, Y., P. Li, and X. Wang. 2019. Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access 7: 31711–31722.

Zheng, Muwei, Hannah Robbins, Zimo Chai, Prakash Thapa, and Tyler Moore. 2018. Cybersecurity research datasets: taxonomy and empirical analysis. In 11th {USENIX} workshop on cyber security experimentation and test ({CSET} 18).

Zhou, X., W. Liang, S. Shimizu, J. Ma, and Q. Jin. 2021. Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems. IEEE Transactions on Industrial Informatics 17 (8): 5790–5798. https://doi.org/10.1109/TII.2020.3047675 .

Zhou, Y.Y., G. Cheng, S.Q. Jiang, and M. Dai. 2020. Building an efficient intrusion detection system based on feature selection and ensemble classifier. Computer Networks 174: 17. https://doi.org/10.1016/j.comnet.2020.107247 .

Download references

Open Access funding provided by the IReL Consortium.

Author information

Authors and affiliations.

University of Limerick, Limerick, Ireland

Frank Cremer, Barry Sheehan, Arash N. Kia, Martin Mullins & Finbarr Murphy

TH Köln University of Applied Sciences, Cologne, Germany

Michael Fortmann & Stefan Materne

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Barry Sheehan .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 334 kb)

Supplementary file1 (docx 418 kb), rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cremer, F., Sheehan, B., Fortmann, M. et al. Cyber risk and cybersecurity: a systematic review of data availability. Geneva Pap Risk Insur Issues Pract 47 , 698–736 (2022). https://doi.org/10.1057/s41288-022-00266-6

Download citation

Received : 15 June 2021

Accepted : 20 January 2022

Published : 17 February 2022

Issue Date : July 2022

DOI : https://doi.org/10.1057/s41288-022-00266-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cyber insurance
  • Systematic review
  • Cybersecurity
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

The Impact of Artificial Intelligence on Data System Security: A Literature Review

Ricardo raimundo.

1 ISEC Lisboa, Instituto Superior de Educação e Ciências, 1750-142 Lisbon, Portugal; [email protected]

Albérico Rosário

2 Research Unit on Governance, Competitiveness and Public Policies (GOVCOPP), University of Aveiro, 3810-193 Aveiro, Portugal

Associated Data

Not applicable.

Diverse forms of artificial intelligence (AI) are at the forefront of triggering digital security innovations based on the threats that are arising in this post-COVID world. On the one hand, companies are experiencing difficulty in dealing with security challenges with regard to a variety of issues ranging from system openness, decision making, quality control, and web domain, to mention a few. On the other hand, in the last decade, research has focused on security capabilities based on tools such as platform complacency, intelligent trees, modeling methods, and outage management systems in an effort to understand the interplay between AI and those issues. the dependence on the emergence of AI in running industries and shaping the education, transports, and health sectors is now well known in the literature. AI is increasingly employed in managing data security across economic sectors. Thus, a literature review of AI and system security within the current digital society is opportune. This paper aims at identifying research trends in the field through a systematic bibliometric literature review (LRSB) of research on AI and system security. the review entails 77 articles published in the Scopus ® database, presenting up-to-date knowledge on the topic. the LRSB results were synthesized across current research subthemes. Findings are presented. the originality of the paper relies on its LRSB method, together with an extant review of articles that have not been categorized so far. Implications for future research are suggested.

1. Introduction

The assumption that the human brain may be deemed quite comparable to computers in some ways offers the spontaneous basis for artificial intelligence (AI), which is supported by psychology through the idea of humans and animals operating like machines that process information by devices of associative memory [ 1 ]. Nowadays, researchers are working on the possibilities of AI to cope with varying issues of systems security across diverse sectors. Hence, AI is commonly considered an interdisciplinary research area that attracts considerable attention both in economics and social domains as it offers a myriad of technological breakthroughs with regard to systems security [ 2 ]. There is a universal trend of investing in AI technology to face security challenges of our daily lives, such as statistical data, medicine, and transportation [ 3 ].

Some claim that specific data from key sectors have supported the development of AI, namely the availability of data from e-commerce [ 4 ], businesses [ 5 ], and government [ 6 ], which provided substantial input to ameliorate diverse machine-learning solutions and algorithms, in particular with respect to systems security [ 7 ]. Additionally, China and Russia have acknowledged the importance of AI for systems security and competitiveness in general [ 8 , 9 ]. Similarly, China has recognized the importance of AI in terms of housing security, aiming at becoming an authority in the field [ 10 ]. Those efforts are already being carried out in some leading countries in order to profit the most from its substantial benefits [ 9 ]. In spite of the huge development of AI in the last few years, the discussion around the topic of systems security is sparse [ 11 ]. Therefore, it is opportune to acquaint the last developments regarding the theme in order to map the advancements in the field and ensuing outcomes [ 12 ]. In view of this, we intend to find out the principal trends of issues discussed on the topic these days in order to answer the main research question: What is the impact of AI on data system security?

The article is organized as follows. In Section 2 , we put forward diverse theoretical concepts related to AI in systems security. In Section 3 , we present the methodological approach. In Section 4 , we discuss the main fields of use of AI with regard to systems security, which came out from the literature. Finally, we conclude this paper by suggesting implications and future research avenues.

2. Literature Trends: AI and Systems Security

The concept of AI was introduced following the creation of the notion of digital computing machine in an attempt to ascertain whether a machine is able to “think” [ 1 ] or if the machine can carry out humans’ tasks [ 13 ]. AI is a vast domain of information and computer technologies (ICT), which aims at designing systems that can operate autonomously, analogous to the individuals’ decision-making process [ 14 ].In terms of AI, a machine may learn from experience through processing an immeasurable quantity of data while distinguishing patterns in it, as in the case of Siri [ 15 ] and image recognition [ 16 ], technologies based on machine learning that is a subtheme of AI, defined as intelligent systems with the capacity to think and learn [ 1 ].

Furthermore, AI entails a myriad of related technologies, such as neural networks [ 17 ] and machine learning [ 18 ], just to mention a few, and we can identify some research areas of AI:

  • (I) Machine learning is a myriad of technologies that allow computers to carry out algorithms based on gathered data and distinct orders, providing the machine the capabilities to learn without instructions from humans, adjusting its own algorithm to the situation, while learning and recoding itself, such as Google and Siri when performing distinct tasks ordered by voice [ 19 ]. As well, video surveillance that tracks unusual behavior [ 20 ];
  • (II) Deep learning constitutes the ensuing progress of machine learning, in which the machine carry out tasks directly from pictures, text, and sound, through a wide set of data architecture that entails numerous layers in order to learn and characterize data with several levels of abstraction imitating thus how the natural brain processes information [ 21 ]. This is illustrated, for example, in forming a certificate database structure of university performance key indicators, in order to fix issues such as identity authentication [ 21 ];
  • (III) Neural networks are composed of a pattern recognition system that machine/deep learning operates to perform learning from observational data, figuring out its own solutions such as an auto-steering gear system with a fuzzy regulator, which enables to select optimal neural network models of the vessel paths, to obtain in this way control activity [ 22 ];
  • (IV) Natural language processing machines analyze language and speech as it is spoken, resorting to machine learning and natural language processing, such as developing a swarm intelligence and active system, while mounting friendly human-computer interface software for users, to be implemented in educational and e-learning organizations [ 23 ];
  • (V) Expert systems are composed of software arrangements that assist in achieving answers to distinct inquiries provided either by a customer or by another software set, in which expert knowledge is set aside in a particular area of the application that includes a reasoning component to access answers, in view of the environmental information and subsequent decision making [ 24 ].

Those subthemes of AI are applied to many sectors, such as health institutions, education, and management, through varying applications related to systems security. These abovementioned processes have been widely deployed to solve important security issues such as the following application trends ( Figure 1 ):

  • (a) Cyber security, in terms of computer crime, behavior research, access control, and surveillance, as for example the case of computer vision, in which an algorithmic analyses images, CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) techniques [ 6 , 7 , 12 , 19 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ];
  • (b) Information management, namely in supporting decision making, business strategy, and expert systems, for example, by improving the quality of the relevant strategic decisions by analyzing big data, as well as in the management of the quality of complex objects [ 2 , 4 , 5 , 11 , 14 , 24 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 ];
  • (c) Societies and institutions, regarding computer networks, privacy, and digitalization, legal and clinical assistance, for example, in terms of legal support of cyber security, digital modernization, systems to support police investigations and the efficiency of technological processes in transport [ 8 , 9 , 10 , 15 , 17 , 18 , 20 , 21 , 23 , 28 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 ];
  • (d) Neural networks, for example, in terms of designing a model of human personality for use in robotic systems [ 1 , 13 , 16 , 22 , 74 , 75 ].

An external file that holds a picture, illustration, etc.
Object name is sensors-21-07029-g001.jpg

Subthemes/network of all keywords of AI—source: own elaboration.

Through these streams of research, we will explain how the huge potential of AI can be deployed to over-enhance systems security that is in use both in states and organizations, to mitigate risks and increase returns while identifying, averting cyber attacks, and determine the best course of action [ 19 ]. AI could even be unveiled as more effective than humans in averting potential threats by various security solutions such as redundant systems of video surveillance, VOIP voice network technology security strategies [ 36 , 76 , 77 ], and dependence upon diverse platforms for protection (platform complacency) [ 30 ].

The design of the abovementioned conceptual and technological framework was not made randomly, as we did a preliminary search on Scopus with the keywords “Artificial Intelligence” and “Security”.

3. Materials and Methods

We carried out a systematic bibliometric literature review (LRSB) of the “Impact of AI on Data System Security”. the LRSB is a study concept that is based on a detailed, thorough study of the recognition and synthesis of information, being an alternative to traditional literature reviews, improving: (i) the validity of the review, providing a set of steps that can be followed if the study is replicated; (ii) accuracy, providing and demonstrating arguments strictly related to research questions; and (iii) the generalization of the results, allowing the synthesis and analysis of accumulated knowledge [ 78 , 79 , 80 ]. Thus, the LRSB is a “guiding instrument” that allows you to guide the review according to the objectives.

The study is performed following Raimundo and Rosário suggestions as follows: (i) definition of the research question; (ii) location of the studies; (iii) selection and evaluation of studies; (iv) analysis and synthesis; (v) presentation of results; finally (vi) discussion and conclusion of results. This methodology ensures a comprehensive, auditable, replicable review that answers the research questions.

The review was carried out in June 2021, with a bibliographic search in the Scopus database of scientific articles published until June 2021. the search was carried out in three phases: (i) using the keyword Artificial Intelligence “382,586 documents were obtained; (ii) adding the keyword “Security”, we obtained a set of 15,916 documents; we limited ourselves to Business, Management, and Accounting 401 documents were obtained and finally (iii) exact keyword: Data security, Systems security a total of 77 documents were obtained ( Table 1 ).

Screening methodology.

Source: own elaboration.

The search strategy resulted in 77 academic documents. This set of eligible break-downs was assessed for academic and scientific relevance and quality. Academic Documents, Conference Paper (43); Article (29); Review (3); Letter (1); and retracted (1).

Peer-reviewed academic documents on the impact of artificial intelligence on data system security were selected until 2020. In the period under review, 2021 was the year with the highest number of peer-reviewed academic documents on the subject, with 18 publications, with 7 publications already confirmed for 2021. Figure 2 reviews peer-reviewed publications published until 2021.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-07029-g002.jpg

Number of documents by year. Source: own elaboration.

The publications were sorted out as follows: 2011 2nd International Conference on Artificial Intelligence Management Science and Electronic Commerce Aimsec 2011 Proceedings (14); Proceedings of the 2020 IEEE International Conference Quality Management Transport and Information Security Information Technologies IT and Qm and Is 2020 (6); Proceedings of the 2019 IEEE International Conference Quality Management Transport and Information Security Information Technologies IT and Qm and Is 2019 (5); Computer Law and Security Review (4); Journal of Network and Systems Management (4); Decision Support Systems (3); Proceedings 2021 21st Acis International Semi Virtual Winter Conference on Software Engineering Artificial Intelligence Networking and Parallel Distributed Computing Snpd Winter 2021 (3); IEEE Transactions on Engineering Management (2); Ictc 2019 10th International Conference on ICT Convergence ICT Convergence Leading the Autonomous Future (2); Information and Computer Security (2); Knowledge Based Systems (2); with 1 publication (2013 3rd International Conference on Innovative Computing Technology Intech 2013; 2020 IEEE Technology and Engineering Management Conference Temscon 2020; 2020 International Conference on Technology and Entrepreneurship Virtual Icte V 2020; 2nd International Conference on Current Trends In Engineering and Technology Icctet 2014; ACM Transactions on Management Information Systems; AFE Facilities Engineering Journal; Electronic Design; Facct 2021 Proceedings of the 2021 ACM Conference on Fairness Accountability and Transparency; HAC; ICE B 2010 Proceedings of the International Conference on E Business; IEEE Engineering Management Review; Icaps 2008 Proceedings of the 18th International Conference on Automated Planning and Scheduling; Icaps 2009 Proceedings of the 19th International Conference on Automated Planning and Scheduling; Industrial Management and Data Systems; Information and Management; Information Management and Computer Security; Information Management Computer Security; Information Systems Research; International Journal of Networking and Virtual Organisations; International Journal of Production Economics; International Journal of Production Research; Journal of the Operational Research Society; Proceedings 2020 2nd International Conference on Machine Learning Big Data and Business Intelligence Mlbdbi 2020; Proceedings Annual Meeting of the Decision Sciences Institute; Proceedings of the 2014 Conference on IT In Business Industry and Government An International Conference By Csi on Big Data Csibig 2014; Proceedings of the European Conference on Innovation and Entrepreneurship Ecie; TQM Journal; Technology In Society; Towards the Digital World and Industry X 0 Proceedings of the 29th International Conference of the International Association for Management of Technology Iamot 2020; Wit Transactions on Information and Communication Technologies).

We can say that in recent years there has been some interest in research on the impact of artificial intelligence on data system security.

In Table 2 , we analyze for the Scimago Journal & Country Rank (SJR), the best quartile, and the H index by publication.

Scimago journal and country rank impact factor.

Note: * data not available. Source: own elaboration.

Information Systems Research is the most quoted publication with 3510 (SJR), Q1, and H index 159.

There is a total of 11 journals on Q1, 3 journals on Q2 and 2 journals on Q3, and 2 journal on Q4. Journals from best quartile Q1 represent 27% of the 41 journals titles; best quartile Q2 represents 7%, best quartile Q3 represents 5%, and finally, best Q4 represents 5% each of the titles of 41 journals. Finally, 23 of the publications representing 56%, the data are not available.

As evident from Table 2 , the significant majority of articles on artificial intelligence on data system security rank on the Q1 best quartile index.

The subject areas covered by the 77 scientific documents were: Business, Management and Accounting (77); Computer Science (57); Decision Sciences (36); Engineering (21); Economics, Econometrics, and Finance (15); Social Sciences (13); Arts and Humanities (3); Psychology (3); Mathematics (2); and Energy (1).

The most quoted article was “CCANN: An intrusion detection system based on combining cluster centers and nearest neighbors” from Lin, Ke, and Tsai 290 quotes published in the Knowledge-Based Systems with 1590 (SJR), the best quartile (Q1) and with H index (121). the published article proposes a new resource representation approach, a cluster center, and the nearest neighbor approach.

In Figure 3 , we can analyze the evolution of citations of documents published between 2010 and 2021, with a growing number of citations with an R2 of 0.45%.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-07029-g003.jpg

Evolution and number of citations between 2010 and 2021. Source: own elaboration.

The h index was used to verify the productivity and impact of the documents, based on the largest number of documents included that had at least the same number of citations. Of the documents considered for the h index, 11 have been cited at least 11 times.

In Appendix A , Table A1 , citations of all scientific articles until 2021 are analyzed; 35 documents were not cited until 2021.

Appendix A , Table A2 , examines the self-quotation of documents until 2021, in which self-quotation was identified for a total of 16 self-quotations.

In Figure 4 , a bibliometric analysis was performed to analyze and identify indicators on the dynamics and evolution of scientific information using the main keywords. the analysis of the bibliometric research results using the scientific software VOSviewe aims to identify the main keywords of research in “Artificial Intelligence” and “Security”.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-07029-g004.jpg

Network of linked keywords. Source: own elaboration.

The linked keywords can be analyzed in Figure 4 , making it possible to clarify the network of keywords that appear together/linked in each scientific article, allowing us to know the topics analyzed by the research and to identify future research trends.

4. Discussion

By examining the selected pieces of literature, we have identified four principal areas that have been underscored and deserve further investigation with regard to cyber security in general: business decision making, electronic commerce business, AI social applications, and neural networks ( Figure 4 ). There is a myriad of areas in where AI cyber security can be applied throughout social, private, and public domains of our daily lives, from Internet banking to digital signatures.

First, it has been discussed the possible decreasing of unnecessary leakage of accounting information [ 27 ], mainly through security drawbacks of VOIP technology in IP network systems and subsequent safety measures [ 77 ], which comprises a secure dynamic password used in Internet banking [ 29 ].

Second, it has been researched some computer user cyber security behaviors, which includes both a naïve lack of concern about the likelihood of facing security threats and dependence upon specific platforms for protection, as well as the dependence on guidance from trusted social others [ 30 ], which has been partly resolved through a mobile agent (MA) management systems in distributed networks, while operating a model of an open management framework that provides a broad range of processes to enforce security policies [ 31 ].

Third, AI cyber systems security always aims at achieving stability of the programming and analysis procedures by clarifying the relationship of code fault-tolerance programming with code security in detail to strengthen it [ 33 ], offering an overview of existing cyber security tasks and roadmap [ 32 ].

Fourth, in this vein, numerous AI tools have been developed to achieve a multi-stage security task approach for a full security life cycle [ 38 ]. New digital signature technology has been built, amidst the elliptic curve cryptography, of increasing reliance [ 28 ]; new experimental CAPTCHA has been developed, through more interference characters and colorful background [ 8 ] to provide better protection against spambots, allowing people with little knowledge of sign languages to recognize gestures on video relatively fast [ 70 ]; novel detection approach beyond traditional firewall systems have been developed (e.g., cluster center and nearest neighbor—CANN) of higher efficiency for detection of attacks [ 71 ]; security solutions of AI for IoT (e.g., blockchain), due to its centralized architecture of security flaws [ 34 ]; and integrated algorithm of AI to identify malicious web domains for security protection of Internet users [ 19 ].

In sum, AI has progressed lately by advances in machine learning, with multilevel solutions to the security problems faced in security issues both in operating systems and networks, comprehending algorithms, methods, and tools lengthily used by security experts for the better of the systems [ 6 ]. In this way, we present a detailed overview of the impacts of AI on each of those fields.

4.1. Business Decision Making

AI has an increasing impact on systems security aimed at supporting decision making at the management level. More and more, it is discussed expert systems that, along with the evolution of computers, are able to integrate systems into corporate culture [ 24 ]. Such systems are expected to maximize benefits against costs in situations where a decision-making agent has to decide between a limited set of strategies of sparse information [ 14 ], while a strategic decision in a relatively short period of time is required demanded and of quality, for example by intelligent analysis of big data [ 39 ].

Secondly, it has been adopted distributed decision models coordinated toward an overall solution, reliant on a decision support platform [ 40 ], either more of a mathematical/modeling support of situational approach to complex objects [ 41 ], or more of a web-based multi-perspective decision support system (DSS) [ 42 ].

Thirdly, the problem of software for the support of management decisions was resolved by combining a systematic approach with heuristic methods and game-theoretic modeling [ 43 ] that, in the case of industrial security, reduces the subsequent number of incidents [ 44 ].

Fourthly, in terms of industrial management and ISO information security control, a semantic decision support system increases the automation level and support the decision-maker at identifying the most appropriate strategy against a modeled environment [ 45 ] while providing understandable technology that is based on the decisions and interacts with the machine [ 46 ].

Finally, with respect to teamwork, AI validates a theoretical model of behavioral decision theory to assist organizational leaders in deciding on strategic initiatives [ 11 ] while allowing understanding who may have information that is valuable for solving a collaborative scheduling problem [ 47 ].

4.2. Electronic Commerce Business

The third research stream focuses on e-commerce solutions to improve its systems security. This AI research stream focuses on business, principally on security measures to electronic commerce (e-commerce), in order to avoid cyber attacks, innovate, achieve information, and ultimately obtain clients [ 5 ].

First, it has been built intelligent models around the factors that induce Internet users to make an online purchase, to build effective strategies [ 48 ], whereas it is discussed the cyber security issues by diverse AI models for controlling unauthorized intrusion [ 49 ], in particular in some countries such as China, to solve drawbacks in firewall technology, data encryption [ 4 ] and qualification [ 2 ].

Second, to adapt to the increasingly demanding environment nowadays of a world pandemic, in terms of finding new revenue sources for business [ 3 ] and restructure business digital processes to promote new products and services with enough privacy and manpower qualified accordingly and able to deal with the AI [ 50 ].

Third, to develop AI able to intelligently protect business either by a distinct model of decision trees amidst the Internet of Things (IoT) [ 51 ] or by ameliorating network management through active networks technology, of multi-agent architecture able to imitate the reactive behavior and logical inference of a human expert [ 52 ].

Fourth, to reconceptualize the role of AI within the proximity’s spatial and non-spatial dimensions of a new digital industry framework, aiming to connect the physical and digital production spaces both in the traditional and new technology-based approaches (e.g., industry 4.0), promoting thus innovation partnerships and efficient technology and knowledge transfer [ 53 ]. In this vein, there is an attempt to move the management systems from a centralized to a distributed paradigm along the network and based on criteria such as for example the delegation degree [ 54 ] that inclusive allows the transition from industry 4.0 to industry 5.0i, through AI in the form of Internet of everything, multi-agent systems and emergent intelligence and enterprise architecture [ 58 ].

Fifth, in terms of manufacturing environments, following that networking paradigm, there is also an attempt to manage agent communities in distributed and varied manufacturing environments through an AI multi-agent virtual manufacturing system (e.g., MetaMorph) that optimizes real-time planning and security [ 55 ]. In addition, in manufacturing, smart factories have been built to mitigate security vulnerabilities of intelligent manufacturing processes automation by AI security measures and devices [ 56 ] as, for example, in the design of a mine security monitoring configuration software platform of a real-time framework (e.g., the device management class diagram) [ 26 ]. Smart buildings in manufacturing and nonmanufacturing environments have been adopted, aiming at reducing costs, the height of the building, and minimizing the space required for users [ 57 ].

Finally, aiming at augmenting the cyber security of e-commerce and business in general, other projects have been put in place, such as computer-assisted audit tools (CAATs), able to carry on continuous auditing, allowing auditors to augment their productivity amidst the real-time accounting and electronic data interchange [ 59 ] and a surge in the demand of high-tech/AI jobs [ 60 ].

4.3. AI Social Applications

As seen, AI systems security can be widely deployed across almost all society domains, be in regulation, Internet security, computer networks, digitalization, health, and other numerous fields (see Figure 4 ).

First, it has been an attempt to regulate cyber security, namely in terms of legal support of cyber security, with regard to the application of artificial intelligence technology [ 61 ], in an innovative and economical/political-friendly way [ 9 ] and in fields such as infrastructures, by ameliorating the efficiency of technological processes in transport, reducing, for example, the inter train stops [ 63 ] and education, by improving the cyber security of university E-Gov, for example in forming a certificate database structure of university performance key indicators [ 21 ] e-learning organizations by swarm intelligence [ 23 ] and acquainting the risk a digital campus will face according to ISO series standards and criteria of risk levels [ 25 ] while suggesting relevant solutions to key issues in its network information safety [ 12 ].

Second, some moral and legal issues have risen, in particular in relation to privacy, sex, and childhood. Is the case of the ethical/legal legitimacy of publishing open-source dual-purpose machine-learning algorithms [ 18 ], the needed legislated framework comprising regulatory agencies and representatives of all stakeholder groups gathered around AI [ 68 ], the gendering issue of VPAs as female (e.g., Siri) as replicate normative assumptions about the potential role of women as secondary to men [ 15 ], the need of inclusion of communities to uphold its own code [ 35 ] and the need to improve the legal position of people and children in particular that are exposed to AI-mediated risk profiling practices [ 7 , 69 ].

Third, the traditional industry also benefits from AI, given that it can improve, for example, the safety of coal mine, by analyzing the coal mine safety scheme storage structure, building data warehouse and analysis [ 64 ], ameliorating, as well, the security of smart cities and ensuing intelligent devices and networks, through AI frameworks (e.g., United Theory of Acceptance and Use of Technology—UTAUT) [ 65 ], housing [ 10 ] and building [ 66 ] security system in terms of energy balance (e.g., Direct Digital Control System), implying fuzzy logic as a non-precise program tool that allows the systems to function well [ 66 ], or even in terms of data integrity attacks to outage management system OMSs and ensuing AI means to detect and mitigate them [ 67 ].

Fourth, the citizens, in general, have reaped benefits from areas of AI such as police investigation, through expert systems that offer support in terms of profiling and tracking criminals based on machine-learning and neural network techniques [ 17 ], video surveillance systems of real-time accuracy [ 76 ], resorting to models to detect moving objects keeping up with environment changes [ 36 ], of dynamical sensor selection in processing the image streams of all cameras simultaneously [ 37 ], whereas ambient intelligence (AmI) spaces, in where devices, sensors, and wireless networks, combine data from diverse sources and monitor user preferences and their subsequent results on users’ privacy under a regulatory privacy framework [ 62 ].

Finally, AI has granted the society noteworthy progress in terms of clinical assistance in terms of an integrated electronic health record system into the existing risk management software to monitor sepsis at intensive care unit (ICU) through a peer-to-peer VPN connection and with a fast and intuitive user interface [ 72 ]. As well, it has offered an AI organizational solution of innovative housing model that combines remote surveillance, diagnostics, and the use of sensors and video to detect anomalies in the behavior and health of the elderly [ 20 ], together with a case-based decision support system for the automatic real-time surveillance and diagnosis of health care-associated infections, by diverse machine-learning techniques [ 73 ].

4.4. Neural Networks

Neural networks, or the process through which machines learn from observational data, coming up with their own solutions, have been lately discussed over some stream of issues.

First, it has been argued that it is opportune to develop a software library for creating artificial neural networks for machine learning to solve non-standard tasks [ 74 ], along a decentralized and integrated AI environment that can accommodate video data storage and event-driven video processing, gathered from varying sources, such as video surveillance systems [ 16 ], which images could be improved through AI [ 75 ].

Second, such neural networks architecture has progressed into a huge number of neurons in the network, in which the devices of associative memory were designed with the number of neurons comparable to the human brain within supercomputers [ 1 ]. Subsequently, such neural networks can be modeled on the base of switches architecture to interconnect neurons and to store the training results in the memory, on the base of the genetic algorithms to be exported to other robotic systems: a model of human personality for use in robotic systems in medicine and biology [ 13 ].

Finally, the neural network is quite representative of AI, in the attempt of, once trained in human learning and self-learning, could operate without human guidance, as in the case of a current positioning vessel seaway systems, involving a fuzzy logic regulator, a neural network classifier enabling to select optimal neural network models of the vessel paths, to obtain control activity [ 22 ].

4.5. Data Security and Access Control Mechanisms

Access control can be deemed as a classic security model that is pivotal do any security and privacy protection processes to support data access from different environments, as well as to protect unauthorized access according to a given security policy [ 81 ]. In this vein, data security and access control-related mechanisms have been widely debated these days, particularly with regard to their distinct contextual conditions in terms, for example, of spatial and temporal environs that differ according to diverse, decentralized networks. Those networks constitute a major challenge because they are dynamically located on “cloud” or “fog” environments, rather than fixed desktop structures, demanding thus innovative approaches in terms of access security, such as fog-based context-aware access control (FB-CAAC) [ 81 ]. Context-awareness is, therefore, an important characteristic of changing environs, where users access resources anywhere and anytime. As a result, it is paramount to highlight the interplay between the information, now based on fuzzy sets, and its situational context to implement context-sensitive access control policies, as well, through diverse criteria such as, for example, following subject and action-specific attributes. In this way, different contextual conditions, such as user profile information, social relationship information, and so on, need to be added to the traditional, spatial and temporal approaches to sustain these dynamic environments [ 81 ]. In the end, the corresponding policies should aim at defining the security and privacy requirements through a fog-based context-aware access control model that should be respected for distributed cloud and fog networks.

5. Conclusion and Future Research Directions

This piece of literature allowed illustrating the AI impacts on systems security, which influence our daily digital life, business decision making, e-commerce, diverse social and legal issues, and neural networks.

First, AI will potentially impact our digital and Internet lives in the future, as the major trend is the emergence of increasingly new malicious threats from the Internet environment; likewise, greater attention should be paid to cyber security. Accordingly, the progressively more complexity of business environment will demand, as well, more and more AI-based support systems to decision making that enables management to adapt in a faster and accurate way while requiring unique digital e-manpower.

Second, with regard to the e-commerce and manufacturing issues, principally amidst the world pandemic of COVID-19, it tends to augment exponentially, as already observed, which demands subsequent progress with respect to cyber security measures and strategies. the same, regarding the social applications of AI that, following the increase in distance services, will also tend to adopt this model, applied to improved e-health, e-learning, and e-elderly monitoring systems.

Third, subsequent divisive issues are being brought to the academic arena, which demands progress in terms of a legal framework, able to comprehend all the abovementioned issues in order to assist the political decisions and match the expectations of citizens.

Lastly, it is inevitable further progress in neural networks platforms, as it represents the cutting edge of AI in terms of human thinking imitation technology, the main goal of AI applications.

To summarize, we have presented useful insights with respect to the impact of AI in systems security, while we illustrated its influence both on the people’ service delivering, in particular in security domains of their daily matters, health/education, and in the business sector, through systems capable of supporting decision making. In addition, we over-enhance the state of the art in terms of AI innovations applied to varying fields.

Future Research Issues

Due to the aforementioned scenario, we also suggest further research avenues to reinforce existing theories and develop new ones, in particular the deployment of AI technologies in small medium enterprises (SMEs), of sparse resources and from traditional sectors that constitute the core of intermediate economies and less developed and peripheral regions. In addition, the building of CAAC solutions constitutes a promising field in order to control data resources in the cloud and throughout changing contextual conditions.

Acknowledgments

We would like to express our gratitude to the Editor and the Referees. They offered extremely valuable suggestions or improvements. the authors were supported by the GOVCOPP Research Unit of Universidade de Aveiro and ISEC Lisboa, Higher Institute of Education and Sciences.

Overview of document citations period ≤ 2010 to 2021.

Overview of document self-citation period ≤ 2010 to 2020.

Author Contributions

Conceptualization, R.R. and A.R.; data curation, R.R. and A.R.; formal analysis, R.R. and A.R.; funding acquisition, R.R. and A.R.; investigation, R.R. and A.R.; methodology, R.R. and A.R.; project administration, R.R. and A.R.; software, R.R. and A.R.; validation, R.R. and A.R.; resources, R.R. and A.R.; writing—original draft preparation, R.R. and A.R.; writing—review and editing, R.R. and A.R.; visualization, R.R. and A.R.; supervision, R.R. and A.R.; project administration, R.R. and A.R.; All authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The authors declare no conflict of interest. the funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Search Menu
  • Editor's Choice
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Journal of Cybersecurity
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Editors-in-Chief

Tyler Moore

About the journal

Journal of Cybersecurity publishes accessible articles describing original research in the inherently interdisciplinary world of computer, systems, and information security …

Latest articles

Cybersecurity Month

Call for Papers

Journal of Cybersecurity is soliciting papers for a special collection on the philosophy of information security. This collection will explore research at the intersection of philosophy, information security, and philosophy of science.

Find out more

CYBERS High Impact 480x270.png

High-Impact Research Collection

Explore a collection of freely available high-impact research from 2020 and 2021 published in the Journal of Cybersecurity .

Browse the collection here

submit

Submit your paper

Join the conversation moving the science of security forward. Visit our Instructions to Authors for more information about how to submit your manuscript.

Read and publish

Read and Publish deals

Authors interested in publishing in Journal of Cybersecurity may be able to publish their paper Open Access using funds available through their institution’s agreement with OUP.

Find out if your institution is participating

Related Titles

cybersecurityandcyberwar

Affiliations

  • Online ISSN 2057-2093
  • Print ISSN 2057-2085
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Big Data Security and Privacy Protection

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

data security Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Big Data Security Management Countermeasures in the Prevention and Control of Computer Network Crime

This paper aims to study the Countermeasures of big data security management in the prevention and control of computer network crime in the absence of relevant legislation and judicial practice. Starting from the concepts and definitions of computer crime and network crime, this paper puts forward the comparison matrix, investigation and statistics method and characteristic measure of computer crime. Through the methods of crime scene investigation, network investigation and network tracking, this paper studies the big data security management countermeasures in the prevention and control of computer network crime from the perspective of criminology. The experimental results show that the phenomenon of low age is serious, and the number of Teenagers Participating in network crime is on the rise. In all kinds of cases, criminals under the age of 35 account for more than 50%.

Fog Computing with IoT Device’s Data Security Management Using Density Control Weighted Election and Extensible Authentication Protocol

Integration of blockchain with connected and autonomous vehicles: vision and challenge.

Connected and Autonomous Vehicles (CAVs) are introduced to improve individuals’ quality of life by offering a wide range of services. They collect a huge amount of data and exchange them with each other and the infrastructure. The collected data usually includes sensitive information about the users and the surrounding environment. Therefore, data security and privacy are among the main challenges in this industry. Blockchain, an emerging distributed ledger, has been considered by the research community as a potential solution for enhancing data security, integrity, and transparency in Intelligent Transportation Systems (ITS). However, despite the emphasis of governments on the transparency of personal data protection practices, CAV stakeholders have not been successful in communicating appropriate information with the end users regarding the procedure of collecting, storing, and processing their personal data, as well as the data ownership. This article provides a vision of the opportunities and challenges of adopting blockchain in ITS from the “data transparency” and “privacy” perspective. The main aim is to answer the following questions: (1) Considering the amount of personal data collected by the CAVs, such as location, how would the integration of blockchain technology affect transparency , fairness , and lawfulness of personal data processing concerning the data subjects (as this is one of the main principles in the existing data protection regulations)? (2) How can the trade-off between transparency and privacy be addressed in blockchain-based ITS use cases?

SecNVM: An Efficient and Write-Friendly Metadata Crash Consistency Scheme for Secure NVM

Data security is an indispensable part of non-volatile memory (NVM) systems. However, implementing data security efficiently on NVM is challenging, since we have to guarantee the consistency of user data and the related security metadata. Existing consistency schemes ignore the recoverability of the SGX style integrity tree (SIT) and the access correlation between metadata blocks, thereby generating unnecessary NVM write traffic. In this article, we propose SecNVM, an efficient and write-friendly metadata crash consistency scheme for secure NVM. SecNVM utilizes the observation that for a lazily updated SIT, the lost tree nodes after a crash can be recovered by the corresponding child nodes in NVM. It reduces the SIT persistency overhead through a restrained write-back metadata cache and exploits the SIT inter-layer dependency for recovery. Next, leveraging the strong access correlation between the counter and DMAC, SecNVM improves the efficiency of security metadata access through a novel collaborative counter-DMAC scheme. In addition, it adopts a lightweight address tracker to reduce the cost of address tracking for fast recovery. Experiments show that compared to the state-of-the-art schemes, SecNVM improves the performance and decreases write traffic a lot, and achieves an acceptable recovery time.

Review on Blockchain Technology

Abstract: Blockchain is a technology that has the potential to cause big changes in our corporate environment and will have a significant influence over the next few decades. It has the potential to alter our perception of business operations and revolutionise our economy. Blockchain is a decentralised and distributed ledger system that, since it cannot be tampered with or faked, attempts to assure transparency, data security, and integrity. Only a few studies have looked at the usage of Blockchain Technology in other contexts or sectors, with the majority of current Blockchain Technology research focusing on its use for cryptocurrencies like Bitcoin. Blockchain technology is more than simply bitcoin; it may be used in government, finance and banking, accounting, and business process managementAs a result, the goal of this study is to examine and investigate the advantages and drawbacks of Blockchain Technology for current and future applications. As a consequence, a large number of published studies were thoroughly assessed and analysed based on their contributions to the Blockchain body of knowledge. Keywords: Blockchain Technology, Bitcoin, Cryptocurrency, Digital currency

China’s Data Security Policies Leading to the Cyber Security Law

A novel framework of an iot-blockchain-based intelligent system.

With the growing need of technology into varied fields, dependency is getting directly proportional to ease of user-friendly smart systems. The advent of artificial intelligence in these smart systems has made our lives easier. Several Internet of Things- (IoT-) based smart refrigerator systems are emerging which support self-monitoring of contents, but the systems lack to achieve the optimized run time and data security. Therefore, in this research, a novel design is implemented with the hardware level of integration of equipment with a more sophisticated software design. It was attempted to design a new smart refrigerator system, which has the capability of automatic self-checking and self-purchasing, by integrating smart mobile device applications and IoT technology with minimal human intervention carried through Blynk application on a mobile phone. The proposed system automatically makes periodic checks and then waits for the owner’s decision to either allow the system to repurchase these products via Ethernet or reject the purchase option. The paper also discussed the machine level integration with artificial intelligence by considering several features and implemented state-of-the-art machine learning classifiers to give automatic decisions. The blockchain technology is cohesively combined to store and propagate data for the sake of data security and privacy concerns. In combination with IoT devices, machine learning, and blockchain technology, the proposed model of the paper can provide a more comprehensive and valuable feedback-driven system. The experiments have been performed and evaluated using several information retrieval metrics using visualization tools. Therefore, our proposed intelligent system will save effort, time, and money which helps us to have an easier, faster, and healthier lifestyle.

BARRIERS TO THE ADOPTION OF NEW SAFETY TECHNOLOGIES IN CONSTRUCTION: A DEVELOPING COUNTRY CONTEXT

The adoption rate of new technologies is still relatively low in the construction industry, particularly for mitigating occupational safety and health (OSH) risks, which is traditionally a largely labor-intensive activity in developing countries, occupying ill-afforded non-productive management resources. However, understanding why this is the case is a relatively unresearched area in developing countries such as Malaysia. In aiming to help redress this situation, this study explored the major barriers involved, firstly by a detailed literature review to identify the main barriers hampering the adoption of new technologies for safety science and management in construction. Then, a questionnaire survey of Malaysian construction practitioners was used to prioritize these barriers. A factor analysis further identified six major dimensions underlying the barriers, relating to the lack of OSH regulations and legislation, technological limitations, lack of genuine organizational commitment, prohibitive costs, poor safety culture within the construction industry, and privacy and data security concerns. Taken together, the findings provide a valuable reference to assist industry practitioners and researchers regarding the critical barriers to the adoption of new technologies for construction safety management in Malaysia and other similar developing countries, and bridge the identified knowledge gap concerning the dimensionality of the barriers.

Design and Development of Maritime Data Security Management Platform

Since the e-Navigation strategy was put forward, various countries and regions in the world have researched e-Navigation test platforms. However, the sources of navigation data are multi-source, and there are still difficulties in the unified acquisition, processing, analysis and application of multi-source data. Users often find it difficult to obtain the required comprehensive navigation information. The purpose of this paper is to use e-Navigation architecture to design and develop maritime data security management platform, strengthen navigation safety guarantee, strengthen Marine environment monitoring, share navigation and safety information, improve the ability of shipping transportation organizations in ports, and protect the marine environment. Therefore, this paper proposes a four-layer system architecture based on Java 2 Platform Enterprise Edition (J2EE) technology, and designs a unified maritime data storage, analysis and management platform, which realizes the intelligent, visualized and modular management of maritime data at shipside and the shore. This platform can provide comprehensive data resource services for ship navigation and support the analysis and mining of maritime big data. This paper expounds on the design, development scheme and demonstration operation scheme of the maritime data security management platform from the system structure and data exchange mode.

Mapping the quantity, quality and structural indicators of Asian (48 countries and 3 territories) research productivity on cloud computing

PurposeThe purpose of this study was to map the quantity (frequency), quality (impact) and structural indicators (correlations) of research produced on cloud computing in 48 countries and 3 territories in the Asia continent.Design/methodology/approachTo achieve the objectives of the study and scientifically map the indicators, data were extracted from the Scopus database. The extracted bibliographic data was first cleaned properly using Endnote and then analyzed using Biblioshiny and VosViewer application software. In the software, calculations include citations count; h, g and m indexes; Bradford's and Lotka's laws; and other scientific mappings.FindingsResults of the study indicate that China remained the most productive, impactful and collaborative country in Asia. All the top 20 impactful authors were also from China. The other most researched areas associated with cloud computing were revealed to be mobile cloud computing and data security in clouds. The most prominent journal currently publishing research studies on cloud computing was “Advances in Intelligent Systems and Computing.”Originality/valueThe study is the first of its kind which identified the quantity (frequencies), quality (impact) and structural indicators (correlations) of Asian (48 countries and 3 territories) research productivity on cloud computing. The results are of great importance for researchers and countries interested in further exploring, publishing and increasing cross country collaborations related to the phenomenon of cloud computing.

Export Citation Format

Share document.

Blue polygon background

Database security refers to the range of tools, controls and measures designed to establish and preserve database confidentiality, integrity and availability. Confidentiality is the element that’s compromised in most data breaches.

Database security must address and protect the following:

  • The data in the database.
  • The database management system (DBMS).
  • Any associated applications.
  • The physical database server or the virtual database server and the underlying hardware.
  • The computing or network infrastructure that is used to access the database.

Database security is a complex and challenging endeavor that involves all aspects of information security technologies and practices. It’s also naturally at odds with database usability. The more accessible and usable the database, the more vulnerable it is to security threats; the more invulnerable the database is to threats, the more difficult it is to access and use. This paradox is sometimes referred to as Anderson’s Rule (link resides outside ibm.com).

Get insights to better manage the risk of a data breach with the latest Cost of a Data Breach report.

Register for the X-Force Threat Intelligence Index

By definition, a data breach is a failure to maintain the confidentiality of data in a database. How much harm a data breach inflicts on your enterprise depends on various consequences or factors:

  • Compromised intellectual property: Your intellectual property—trade secrets, inventions, proprietary practices—can be critical to your ability to maintain a competitive advantage in your market. If that intellectual property is stolen or exposed, your competitive advantage can be difficult or impossible to maintain or recover.
  • Damage to brand reputation: Customers or partners might be unwilling to buy your products or services (or do business with your company) if they don’t feel they can trust you to protect your data or theirs.
  • Business continuity ( or lack thereof): Some businesses cannot continue to operate until a breach is resolved.
  • Fines or penalties for non-compliance: The financial impact for failing to comply with global regulations such as the Sarbannes-Oxley Act (SAO) or Payment Card Industry Data Security Standard (PCI DSS), industry-specific data privacy regulations such as HIPAA, or regional data privacy regulations, such as Europe’s General Data Protection Regulation (GDPR) can be devastating, with fines in the worst cases exceeding several million dollars per violation .
  • Costs of repairing breaches and notifying customers: In addition to the cost of communicating a breach to customer, a breached organization must pay for forensic and investigative activities, crisis management, triage, repair of the affected systems and more.

Many software misconfigurations, vulnerabilities or patterns of carelessness or misuse can result in breaches. The following are among the most common types or causes of database security attacks.

Insider threats

An insider threat is a security threat from any one of three sources with privileged access to the database:

  • A malicious insider who intends to do harm.
  • A negligent insider who makes errors that make the database vulnerable to attack.
  • An infiltrator, an outsider who somehow obtains credentials via a scheme, such as phishing or by gaining access to the credential database itself.

Insider threats are among the most common causes of database security breaches and are often the result of allowing too many employees to hold privileged user access credentials.

Human error

Accidents, weak passwords, password sharing and other unwise or uninformed user behaviors continue to be the cause of nearly half (49%) of all reported data breaches .

Exploitation of database software vulnerabilities

Hackers make their living by finding and targeting vulnerabilities in all kinds of software, including database management software. All major commercial database software vendors and open source database management platforms issue regular security patches to address these vulnerabilities, but failure to apply these patches in a timely fashion can increase your exposure.

SQL or NoSQL injection attacks

A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack strings into database queries that are served by web applications or HTTP headers. Organizations that don’t follow secure web application coding practices and perform regular vulnerability testing are open to these attacks.

Buffer overflow exploitation

Buffer overflow occurs when a process attempts to write more data to a fixed-length block of memory than it is allowed to hold. Attackers can use the excess data, which is stored in adjacent memory addresses, as a foundation from which to start attacks.

Malware is software that is written specifically to take advantage of vulnerabilities or otherwise cause damage to the database. Malware can arrive via any endpoint device connecting to the database’s network.

Attacks on backups

Organizations that fail to protect backup data with the same stringent controls that are used to protect the database itself can be vulnerable to attacks on backups.

These threats are exacerbated by the following:

  • Growing data volumes: Data capture, storage and processing continues to grow exponentially across nearly all organizations. Any data security tools or practices need to be highly scalable to meet near and distant future needs.
  • Infrastructure sprawl : Network environments are becoming increasingly complex, particularly as businesses move workloads to multicloud or hybrid cloud architectures, making the choice, deployment and management of security solutions ever more challenging.
  • Increasingly stringent regulatory requirements: The worldwide regulatory compliance landscape continues to grow in complexity, making adhering to all mandates more difficult.
  • Cybersecurity skills shortage: Experts predict there might be as many as 8 million unfilled cybersecurity positions by 2022 .

Denial of service (DoS and DDoS) attacks

In a denial of service (DoS) attack, the attacker deluges the target server—in this case the database server—with so many requests that the server can no longer fulfill legitimate requests from actual users, and, often, the server becomes unstable or crashes.

In a distributed denial of service attack (DDoS), the deluge comes from multiple servers, making it more difficult to stop the attack.

Because databases are network-accessible, any security threat to any component within or portion of the network infrastructure is also a threat to the database, and any attack impacting a user’s device or workstation can threaten the database. Thus, database security must extend far beyond the confines of the database alone.

When evaluating database security in your environment to decide on your team’s top priorities, consider each of the following areas:

  • Physical security: Whether your database server is on-premises or in a cloud data center, it must be located within a secure, climate-controlled environment. If your database server is in a cloud data center, your cloud provider takes care of this for you.
  • Administrative and network access controls: The practical minimum number of users should have access to the database, and their permissions should be restricted to the minimum levels necessary for them to do their jobs. Likewise, network access should be limited to the minimum level of permissions necessary.
  • User account and device security: Always be aware of who is accessing the database and when and how the data is being used. Data monitoring solutions can alert you if data activities are unusual or appear risky. All user devices connecting to the network housing the database should be physically secure (in the hands of the right user only) and subject to security controls at all times.
  • Encryption: All data, including data in the database and credential data, should be protected with best-in-class encryption while at rest and in transit. All encryption keys should be handled in accordance with best practice guidelines.
  • Database software security: Always use the latest version of your database management software, and apply all patches when they are issued.
  • Application and web server security: Any application or web server that interacts with the database can be a channel for attack and should be subject to ongoing security testing and best practice management.
  • Backup security: All backups, copies or images of the database must be subject to the same (or equally stringent) security controls as the database itself.
  • Auditing: Record all logins to the database server and operating system, and log all operations that are performed on sensitive data as well. Database security standard audits should be performed regularly.

In addition to implementing layered security controls across your entire network environment, database security requires you to establish the correct controls and policies for access to the database itself. These include:

  • Administrative controls to govern installation, change and configuration management for the database.
  • Preventive controls to govern access, encryption, tokenization and masking.
  • Detective controls to monitor database activity monitoring and data loss prevention tools. These solutions make it possible to identify and alert on anomalous or suspicious activities.

Database security policies should be integrated with and support your overall business goals, such as protection of critical intellectual property and your cybersecurity policies and cloud security policies . Ensure that you have designated responsibility for maintaining and auditing security controls within your organization and that your policies complement those of your cloud provider in shared responsibility agreements. Security controls, security awareness training and education programs, and penetration testing and vulnerability assessment strategies should all be established in support of your formal security policies.

Today, a wide array of vendors offer data protection tools and platforms. A full-scale solution should include all of the following capabilities:

  • Discovery: Look for a tool that can scan for and classify vulnerabilities across all your databases—whether they’re hosted in the cloud or on-premises—and offer recommendations for remediating any vulnerabilities that are identified. Discovery capabilities are often required to conform to regulatory compliance mandates.
  • Data activity monitoring: The solution should be able to monitor and audit all data activities across all databases, regardless of whether your deployment is on-premises, in the cloud, or in a container . It should alert you to suspicious activities in real-time so that you can respond to threats more quickly. You’ll also want a solution that can enforce rules, policies and separation of duties and that offers visibility into the status of your data through a comprehensive and unified user interface. Make sure that any solution you choose can generate the reports you need to meet compliance requirements.
  • Encryption and tokenization capabilities: Upon a breach, encryption offers a final line of defense against compromise. Any tool that you choose should include flexible encryption capabilities that can safeguard data in on-premises, cloud, hybrid or multicloud environments. Look for a tool with file, volume and application encryption capabilities that conform to your industry’s compliance requirements, which might demand tokenization (data masking) or advanced security key management capabilities.
  • Data security optimization and risk analysis: A tool that can generate contextual insights by combining data security information with advanced analytics will enable you to accomplish optimization, risk analysis and reporting with ease. Choose a solution that can retain and synthesize large quantities of historical and recent data about the status and security of your databases, and look for one that offers data exploration, auditing and reporting capabilities through a comprehensive but user-friendly self-service dashboard.

Continuous edge-to-edge cloud protection for your data and applications with regulatory compliance.

Wide visibility, compliance and protection throughout the data security lifecycle.

Comprehensive data protection for the most critical enterprise data.

Learn more about data organization in the cloud.

Now in its 17th year, the 2022 Cost of a Data Breach report shares the latest insights into the expanding threat landscape and offers recommendations for how to save time and limit losses.

In this introduction to networking, learn how computer networks work, the architecture used to design networks, and how to keep them secure.

Learn how the IBM Security Guardium family of products can help your organization meet the changing threat landscape with advanced analytics, real-time alerts, streamlined compliance, automated data discovery classification and posture management.

Help | Advanced Search

Computer Science > Cryptography and Security

Title: membership inference attacks and privacy in topic modeling.

Abstract: Recent research shows that large language models are susceptible to privacy attacks that infer aspects of the training data. However, it is unclear if simpler generative models, like topic models, share similar vulnerabilities. In this work, we propose an attack against topic models that can confidently identify members of the training data in Latent Dirichlet Allocation. Our results suggest that the privacy risks associated with generative modeling are not restricted to large neural models. Additionally, to mitigate these vulnerabilities, we explore differentially private (DP) topic modeling. We propose a framework for private topic modeling that incorporates DP vocabulary selection as a pre-processing step, and show that it improves privacy while having limited effects on practical utility.

Submission history

Access paper:.

  • Download PDF
  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

This paper is in the following e-collection/theme issue:

Published on 8.3.2024 in Vol 26 (2024)

Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges

Authors of this article:

Author Orcid Image

  • Yan Chen * , PhD   ; 
  • Pouyan Esmaeilzadeh * , PhD  

Department of Information Systems and Business Analytics, College of Business, Florida International University, Miami, FL, United States

*all authors contributed equally

Corresponding Author:

Pouyan Esmaeilzadeh, PhD

Department of Information Systems and Business Analytics

College of Business

Florida International University

Modesto A Maidique Campus

11200 SW 8th St, RB 261 B

Miami, FL, 33199

United States

Phone: 1 3053483302

Email: [email protected]

As advances in artificial intelligence (AI) continue to transform and revolutionize the field of medicine, understanding the potential uses of generative AI in health care becomes increasingly important. Generative AI, including models such as generative adversarial networks and large language models, shows promise in transforming medical diagnostics, research, treatment planning, and patient care. However, these data-intensive systems pose new threats to protected health information. This Viewpoint paper aims to explore various categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support, while identifying security and privacy threats within each phase of the life cycle of such systems (ie, data collection, model development, and implementation phases). The objectives of this study were to analyze the current state of generative AI in health care, identify opportunities and privacy and security challenges posed by integrating these technologies into existing health care infrastructure, and propose strategies for mitigating security and privacy risks. This study highlights the importance of addressing the security and privacy threats associated with generative AI in health care to ensure the safe and effective use of these systems. The findings of this study can inform the development of future generative AI systems in health care and help health care organizations better understand the potential benefits and risks associated with these systems. By examining the use cases and benefits of generative AI across diverse domains within health care, this paper contributes to theoretical discussions surrounding AI ethics, security vulnerabilities, and data privacy regulations. In addition, this study provides practical insights for stakeholders looking to adopt generative AI solutions within their organizations.

Introduction

Artificial intelligence (AI) is transforming many industries, including health care. AI has the potential to revolutionize health care by enabling the detection of signs, patterns, diseases, anomalies, and risks. From administrative automation to clinical decision support, AI holds immense potential to improve patient outcomes, lower costs, and accelerate medical discoveries [ 1 ]. An especially promising subset of AI is generative models, which are algorithms that can synthesize new data, imagery, text, and other content with humanlike creativity and nuance based on patterns learned from existing data [ 2 ]. Generative AI could power clinical practices in health care, from generating synthetic patient data to augmenting rare disease research to creating AI-assisted drug discovery systems [ 3 ]. Generative AI has the potential to detect signs, patterns, diseases, anomalies, and risks and assist in screening patients for various chronic diseases, making more accurate and data-driven diagnoses and improving clinical decision-making [ 4 ]. Generative AI also has the potential to transform patient care with generative AI virtual health assistants [ 5 ].

However, generative AI systems pose acute privacy and security risks along with their transformative potential because of their vast data requirements and opacity [ 6 ]. Generative AI models can be trained on sensitive, multimodal patient data, which could be exploited by malicious actors. Therefore, the collection and processing of sensitive patient data, along with tasks such as model training, model building, and implementing generative AI systems, present potential security and privacy risks. Given the sensitive nature of medical data, any compromise can have dire consequences, not just in data breaches but also in patients’ trust and the perceived reliability of medical institutions. As these AI systems move from laboratory to clinical deployment, a measured approach is required to map and mitigate their vulnerabilities. Another challenge of using generative AI models is that they can be biased, which could lead to inaccurate diagnoses and treatments [ 7 ].

Despite the growing interest in generative AI in health care, there is a gap in the literature regarding a comprehensive examination of the unique security and privacy threats associated with generative AI systems. Our study attempts to provide insights into the different categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support. This study also aims to address the gap by identifying security and privacy threats and mapping them to the life cycle of various generative AI systems in health care, from data collection through model building to clinical implementation. By identifying and analyzing these threats, we can gain insights into the vulnerabilities and risks associated with the use of generative AI in health care. We also seek to contribute to theory and practice by highlighting the importance of addressing these threats and proposing mitigation strategies.

The findings of this study can inform the development of future generative AI systems in health care and help health care organizations better understand the potential benefits and risks of using these systems. The significance of this study lies in its potential to inform policy makers, health care organizations, and AI developers about the security and privacy challenges associated with generative AI in health care. The findings of this study can guide the development of robust data governance frameworks, secure infrastructure, and ethical guidelines to ensure the safe and responsible use of generative AI in health care. With careful governance, the benefits of generative models can be realized while safeguarding patient data and public trust. Ultimately, this study contributes to the advancement of knowledge in the field of AI in health care and supports the development of secure and privacy-preserving generative AI systems for improved patient care and outcomes.

Generative AI Applications in Health Care

Generative AI models use neural networks to identify patterns and structures within existing data to generate new and original content. Generative AI refers to techniques such as generative adversarial networks (GANs) and large language models (LLMs) that synthesize novel outputs such as images, text, and molecular structures [ 8 ]. GANs use 2 neural networks, a generator and a discriminator, that compete against each other to become better at generating synthetic data [ 9 ]. LLMs such as GPT-4 (OpenAI) are trained on massive text data and can generate synthetic natural language text, code, and so on [ 10 ].

Generative AI has spurred a wide range of applications in health care. This subset of AI has the potential to make a breakthrough in medical diagnostic applications, given its capability to build models using multimodal medical data [ 5 ]. Generative AI also promises to accelerate drug discovery by inventing optimized molecular candidates [ 11 ]. In research settings, these generative AI techniques can hypothesize promising new directions by creatively combining concepts [ 12 ]. Generative AI also has applications in engaging patients through natural conversation powered by LLMs [ 2 ]. When integrated into clinical workflows, it may also provide physicians with patient-specific treatment suggestions [ 13 ].

The classification of generative AI systems presented in Table 1 was developed based on a careful analysis of the various factors that differentiate these technologies.

Differentiating Factors

The goal was to provide a framework for better understanding the diversity of generative AI across health care settings. We leverage several key factors to differentiate the applications and provide insights into this emerging field, described in the following sections.

The clinical setting categorizes where in the health care workflow the generative AI system is applied, such as diagnostics, treatment planning, drug discovery, clinical decision support, and patient education [ 14 ]. This provides insights into the breadth of health care contexts leveraging these technologies.

Generative AI tools are tailored to different types of users in health care, from clinicians to researchers to patients [ 15 ]. Categorization by intended user groups reveals how generative AI penetrates various stakeholder groups and which user groups may adopt and interact with generative AI applications.

The data sources powering generative AI systems vary significantly, from electronic health records (EHRs) and medical imaging to biomedical literature, laboratory tests, and patient-provided data [ 16 ]. Categorization by data inputs illustrates how different data fuel different categories of applications.

Output Data

The outputs produced by the system, such as images, care planning, prescription advice, treatment options, drug molecules, text, risk scores, and education materials [ 17 ], demonstrate the wide range of generative AI capabilities in health care.

Personalization Level

The level of personalization to individual patients reveals the precision of the outputs, from generalized to fully patient specific. This provides a perspective on the customizability of the generative AI system.

Workflow Integration

Some generative AI systems are designed as stand-alone applications, whereas others are integrated into clinical workflows via EHRs, order sets, and so on. Categorization by workflow integration sheds light on the level of adoption, implementation practices, and integration of these tools.

Validation Needs

The extent of validation required, from noncritical outputs to those needing rigorous US Food and Drug Administration approval [ 18 ], highlights differences in oversight and impact levels.

Impact: profiling the benefits and use cases served by the generative AI technology, such as improving diagnostics, reducing medication errors, or accelerating drug discovery, provides insights into the varied impacts.

Discussing risks and limitations provides a balanced view of concerns such as algorithmic bias, privacy concerns, security issues, system vulnerability, and clinical integration challenges.

Human-AI Collaboration

Generative AI systems differ in the level of human involvement required, from fully automated to human-in-the-loop (human engagement in overseeing and interacting with the AI’s operational process) [ 19 ]. Categorization by human-AI partnership provides insights into the changing dynamics between humans and AI across health care.

This study aims to reveal crucial differences, use cases, adoption levels, various risks, and implementation practices by developing categories based on these key attributes of generative AI systems. The proposed framework clarifies the heterogeneous landscape of generative AI in health care and enables a trend analysis across categories. These factors provide a perspective on how generative AI manifests distinctly for various users, data types, workflows, risk factors, and human-AI partnerships within health care. By systematically analyzing the diverse range of generative AI systems across health care settings using the key factors discussed previously, we can classify the heterogeneous landscape of generative AI in health care into 5 overarching categories: medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support.

Medical Diagnostics

Generative AI techniques can analyze data from wearables, EHRs, and medical images (eg, x-rays, magnetic resonance imaging, and computed tomography scans) to detect signs, patterns, diseases, anomalies, and risks and generate descriptive findings to improve diagnoses. Systems such as AI-Rad Companion leverage natural language generation models to compose radiology reports automatically, highlighting potential abnormalities and issues for clinician review [ 20 ]. This assists radiologists by providing initial draft findings more rapidly. However, clinicians must thoroughly validate any generative AI outputs before clinical use. Ongoing challenges include reducing false positives and negatives [ 21 ].

Drug Discovery

Generative AI shows promise for expediting and enhancing drug discovery through inventing optimized molecular structures de novo. Techniques such as GANs combined with reinforcement learning allow the intelligent generation of molecular graph representations [ 22 ]. Companies such as Insilico Medicine are using these generative chemistry techniques to propose novel target-specific drug candidates with desired properties. This accelerates preclinical pharmaceutical research. However, validating toxicity and efficacy remains critical before human trials.

Virtual Health Assistants

Generative models such as LLMs can power conversational agents that understand and respond to patient questions and concerns [ 23 ]. Companies such as Sensely and Woebot Health leverage these techniques to create virtual assistants that explain symptoms, provide health information, and offer screening triage advice through natural dialogue [ 24 ]. This increases access and engagement for patients. However, challenges remain around privacy, information accuracy, and integration into provider workflows [ 25 ].

Medical Research

In research settings, generative AI can formulate novel hypotheses by making unexpected combinations of concepts, mimicking human creativity and intuition. Claude from Anthropic can read research papers and propose unexplored directions worth investigating [ 26 ]. This unique generative capacity could accelerate scientific advancement. However, corroboration by human researchers is crucial to prevent the blind acceptance of AI-generated findings [ 27 ].

Clinical Decision Support

Integrating generative AI into clinical workflows could provide patient-specific suggestions to assist physicians in decision-making. Glass AI leverages LLMs such as GPT-3 to generate tailored treatment options based on patient data for physicians to review [ 15 ]. This could improve outcomes and reduce errors. However, bias mitigation and high validation thresholds are critical before real-world adoption [ 28 ].

By holistically examining all the key factors, we can see how each one contributes to delineating these 5 high-level categories that provide a comprehensive snapshot of the generative AI landscape in health care. Analyzing these 5 categories through the lens of the proposed factors enables our study to reveal crucial differences, use cases, benefits, limitations, and implementation practices of generative AI technologies across major health care domains.

Literature Review

The adoption of AI (powered by various models) is accelerating across health care for applications ranging from medical imaging to virtual assistants. However, the data-intensive nature and complexity of these systems introduce acute privacy and security vulnerabilities that must be addressed to ensure safe and ethical deployment in clinical settings. This literature review covers 2 topics. First, we highlight the dual nature of technological advancements in generative AI within health care, its benefits, and its risks, particularly in terms of privacy and security that it entails. Second, we explain AI regulation and compare the key aspects of the European Union (EU) AI Act and the US AI Bill of Rights.

Generative AI: Balancing Benefits and Risks

The use of generative AI systems in medicine holds promise for improvements in areas such as patient education and diagnosis support. However, recent studies highlight that privacy and security concerns may slow user adoption. A survey explores the application of GANs toward ensuring privacy and security [ 29 ]. It highlights how GANs can be used to address increasing privacy concerns and strengthen privacy regulations in various applications, including medical image analysis. The unique feature of GANs in this context is their adversarial training characteristic, which allows them to investigate privacy and security issues without predetermined assumptions about opponents’ capabilities. This is crucial because these capabilities are often complex to determine with traditional attack and defense mechanisms. In the privacy and security models using GANs, the generator can be modeled in two ways: (1) as an attacker aiming to fool a defender (the discriminator) to simulate an attack scenario and (2) as a defender resisting a powerful attacker (the discriminator) to simulate a defense scenario.

Examples of defense models include generative adversarial privacy [ 30 ], privacy-preserving adversarial networks [ 31 ], compressive adversarial privacy [ 32 ], and reconstructive adversarial network [ 33 ]. These GAN-based mechanisms offer innovative ways to enhance privacy and security in various machine learning and data processing scenarios. The examples are described in the subsequent sections.

Protection of Preimage Privacy

The compressive privacy GAN is designed to preprocess private data before the training stage in machine learning as a service scenarios [ 34 ]. It includes 3 modules: a generator module (G) as a privatization mechanism for generating privacy-preserving data, a service module (S) providing prediction services, and an attacker module (A) that mimics an attacker aiming to reconstruct the data. The objective is to ensure optimal performance of the prediction service, even in the face of strong attackers, by intentionally increasing the reconstruction error. This method defends against preimage privacy attacks in machine learning as a service by ensuring that the input data of a service module contains no sensitive information.

Privacy in Distributed Learning Systems

In decentralized learning systems, such as distributed selective stochastic gradient descent [ 35 ] and federated learning (FL) [ 36 ], data are trained locally by different participants without data sharing. This setup can protect data privacy to some extent, but it is not perfect. The GAN-based models in these systems can mimic data distribution and potentially threaten data privacy. The potential risks associated with the application of GAN-based models in decentralized learning systems are multifaceted, highlighting the need for robust privacy protection measures. These risks are explained as the following: an attacker might use GANs to recover sensitive information within the distributed training system, and a malicious server can reveal user-level privacy in distributed learning systems by training a multitask GAN with auxiliary identification.

Protection mechanisms include embedding a “buried point layer” in local models to detect abnormal changes and block attackers and integrating GAN with FL to produce realistic data without privacy leakage.

Differential Privacy in GANs

To address the problem of privacy leakage in the models, two solutions have been proposed: (1) adding a regularization term in a loss function to avoid overfitting and improve robustness; for example, this method can be applied to defend against membership inference attacks, [ 37 ] and (2) adding acceptable noise into the model parameters to hinder privacy inference attacks. Such methods have been used for privacy protection, particularly the combination of differential privacy and neural networks [ 38 ].

In medical research, the widespread use of medical data, particularly in image analysis, raises significant concerns about the potential exposure of individual identities. An innovative adversarial training method focused on identity-obfuscated segmentation has been proposed to address this challenge [ 39 ]. This method is underpinned by a deep convolutional GAN-based framework comprising three key components: (1) a deep encoder network, functioning as the generator, efficiently obscuring identity markers in medical images by incorporating additional noise; (2) a binary classifier serves as the discriminator, ensuring that the transformed images retain a resemblance to their original counterparts; and (3) a convolutional neural network–based network dedicated to medical image analysis, acting as an alternate discriminator responsible for analyzing the segmentation details of the images. This framework integrates an encoder, a binary classifier, and a segmentation analysis network to form a robust approach to safeguard medical data privacy while preserving the integrity and efficacy of medical image segmentation.

The use of EHR medical records has significantly advanced medical research while simultaneously amplifying concerns regarding the privacy of this sensitive information. In response, Choi et al [ 40 ] devised the medical GAN (medGAN), an innovative adaptation of the standard GAN framework, aimed at producing synthetic patient records that respect privacy. The medGAN excels at generating high-dimensional discrete variables. Its architecture uses an autoencoder as the generator, which creates synthetic medical data augmented with noise. A binary classifier functions as the discriminator, ensuring the resemblance of these data to real records. The outcome is synthetic medical data suitable for various uses, such as distribution analysis, predictive modeling, and medical expert evaluations, minimizing the privacy risks associated with both identity and attributes. Furthering these advancements, Yale et al [ 41 ] conducted an in-depth evaluation of medGAN’s ability to protect privacy in medical records. In a parallel development, Torfi and Fox [ 42 ] introduced Correlation-Capturing Convolutional Generative Adversarial Networks (CorGAN), which focuses on the correlations within medical records. Unlike medGAN, CorGAN uses a dual autoencoder in its generator, enabling the creation of sequential EHRs rather than discrete entries. This approach enhances predictive accuracy, providing more effective assistance to medical professionals [ 43 ].

Similarly, Nova [ 14 ] discusses the transformative impact of generative AI on EHRs and medical language processing, underlining the accompanying privacy concerns. It examines the balance between the utility of GANs in generating health care data and the preservation of privacy. Rane [ 44 ] explores the wider privacy and security implications of using generative AI models, such as ChatGPT, in health care within the context of Industry 4.0 and Industry 5.0 transformation. The impact of generative content on individual privacy is further explored by Bale et al [ 45 ], emphasizing the ethical considerations in health care.

Ghosheh et al [ 46 ] suggest that the use of GANs to create synthetic EHRs creates many privacy challenges (eg, reidentification and membership attacks). Hernandez et al [ 47 ] discuss privacy concerns related to synthetic tabular data generation in health care. Various methods and evaluation metrics are used to assess the privacy dimension of the synthetic tabular data generation approaches. These methods include identity disclosure, attribute disclosure, distance to the closest record, membership attack, maximum real-to-synthetic similarity, differential privacy cost, and GANs. For instance, differential privacy is an approach that adds noise to the data to prevent the identification of individuals. GANs can create new and nonreal data points. Other advanced statistical and machine learning techniques attempt to balance data utility and privacy. Each method has its strengths and limitations, and the choice depends on the specific requirements of the health care application and the sensitivity of the data involved.

The applications and challenges of generative AI in health care, including privacy issues and AI-human collaboration, are explored by Fui-Hoon et al [ 48 ]. They discuss several privacy issues related to generative AI, such as the potential disclosure of sensitive or private information by generative AI systems, the widening of the digital divide, and the collection of personal and organizational data by these systems, which raises concerns about security and confidentiality. In addition, they highlight regulatory and policy challenges, such as issues with copyright for AI-generated content, the lack of human control over AI behavior, data fragmentation, and information asymmetries between technology giants and regulatory authorities.

A study discusses the potential of FL as a privacy-preserving approach in health care AI applications [ 49 ]. FL is a distributed AI paradigm that offers privacy preservation in smart health care systems by allowing models to be trained without accessing the local data of participants. It provides privacy to end users by only sharing gradients during training. The target of FL in health care AI applications is to preserve the privacy of sensitive patient information communicated between hospitals and end users, particularly through Internet of Medical Things (IoMT) devices. The approach incorporates advanced techniques such as reinforcement learning, digital twin, and GANs to detect and prevent privacy threats in IoMT networks. The potential beneficiaries of FL in health care include patients, health care providers, and organizations involved in collaborative health care research and analysis. However, implementing FL in IoMT networks presents challenges, such as the need for robust FL for diffused health data sets, the integration of FL with next-generation IoMT networks, and the use of blockchain for decentralized and secure data storage. Furthermore, incentive mechanisms are being explored to encourage the participation of IoMT devices in FL, and digital twin technology is being leveraged to create secure web-based environments for remote patient monitoring and health care research. Overall, FL in health care AI applications aims to address privacy and security concerns while enabling collaborative and efficient health care systems.

Another study emphasizes the need for secure and robust machine learning techniques in health care, particularly focusing on privacy and security [ 50 ]. Finally, a study addresses the vulnerabilities of generative models to adversarial attacks (eg, evasion attacks and membership inference attacks), highlighting a significant area of concern in health care data security [ 51 ]. These studies collectively underscore the need for a balanced approach to leveraging the benefits of AI-driven health care innovations while ensuring robust privacy and security measures.

AI, Legal Challenges, and Regulation

AI, especially generative AI, has presented many legal challenges, raising many profound questions on how AI can be legally, securely, and safely used by businesses and individuals [ 52 ]. The EU AI Act, passed in 2023, is the first comprehensive legal framework to specifically regulate AI systems [ 53 ]. It categorizes systems by risk level and introduces mandatory requirements for high-risk AI related to data and documentation, transparency, human oversight, accuracy, cybersecurity, and so on. As stated in the act, national authorities will oversee compliance.

The US AI Bill of Rights, unveiled in 2023, takes a different approach as a nonbinding set of principles to guide AI development and use focused on concepts such as algorithmic discrimination awareness, data privacy, notice and explanation of AI, and human alternatives and oversight [ 54 ]. Rather than authoritative regulation, it promotes voluntary adoption by organizations.

Although the EU law institutes enforceable accountability around risky AI, the US bill espouses aspirational AI ethics principles. Both identify important issues such as potential bias, privacy risks, and the need for human control but tackle them differently—the EU through compliance requirements and the United States through voluntary principles. Each seeks more responsible AI but via divergent methods that fit their governance models. Despite differences in methods, there is a consensus on fundamental issues such as ensuring transparency, maintaining accuracy, minimizing adverse effects, and providing mechanisms for redressal.

Specifically, for generative AI such as ChatGPT, the EU AI Act mandates transparency requirements, such as disclosing AI-generated content, designing models to prevent illegal content generation, and publishing training data summaries. Although the principles mentioned in the US AI Bill of Rights do not specifically address generative AI, they provide a framework for the ethical and responsible use of all AI technologies, including generative AI. The principles emphasize safety, nondiscrimination, privacy, transparency, and human oversight, all of which are relevant to developing and deploying generative AI systems.

Ultimately, the EU legislates binding rules that companies must follow, whereas the United States issues guidance that organizations may freely adopt. Despite this schism, both highlight growing policy makers’ concern over AI’s societal impacts and the emergence of either compulsory or optional frameworks aimed at accountability. As leading AI powers craft different but related policy solutions, ongoing collaboration around shared values while allowing varied implementations will be important for setting global AI standards.

Security and Privacy Threats in the Life Cycle of a Generative AI in Health Care System

Although generative AI in health care holds great promise, substantial validation is required before real-world deployment. Ethical risks around reliability, accountability, algorithmic bias, and data privacy as well as security risks related to confidentiality, integrity, and availability must be addressed through a human-centric approach [ 55 ]. Liu et al [ 56 ] surveyed the security and privacy attacks related to machine learning and developed a taxonomy. The taxonomy classifies those attacks into three categories: (1) attacks targeting classifiers; (2) attacks violating integrity, availability, and privacy (ie, part of confidentiality); and (3) attacks with or without specificity. They also summarize the defense techniques in the training phase and the testing and inferring phase of the life cycle of machine learning, for example, data sanitization techniques against data poisoning attacks in the training phase and privacy-preserving techniques against privacy attacks in the testing or inferring phase. Similarly, Hu et al [ 57 ] present an overall framework of attacks and defense strategies based on the following five phases of the AI life cycle: (1) data collection phase—main security threats include databases, fake data, data breaches, and sensor attacks; defense strategies include data sanitization and data government; (2) data processing phase—image scaling is the main threat; recommended defense strategies include image reconstruction and data randomization; (3) training phase—data poisoning is the main threat; defense strategies focus on techniques that can identify and remove poisoned data (eg, the certified defense technique proposed by Tang et al [ 58 ]) and provide robust and reliable AI models; (4) inference phase—this phase mainly faces adversarial example attacks such as white-box, gray-box, and black-box attacks depending on how much the attacker knows about the target model; a variety of defense strategies can be implemented to tackle such attacks, such as adopting strategies in phases 1 to 3 to modify data (eg, data reconstruction and randomization) or modify or enhance models with newer model construction methods resistant to adversarial example attacks (eg, using deep neural networks and GAN-based networks [ 58 , 59 ]); (5) integration phase—AI models face AI biases, confidentiality attacks (eg, model inversion, model extraction, and various privacy attacks), and code vulnerability exploitation; defense strategies in this phase should be comprehensive via integrating various solutions such as fuzz testing and blockchain-based privacy protection.

Generative AI is built upon machine learning and AI techniques and hence faces similar security and privacy threats, as summarized in the studies by Liu et al [ 56 ] and Hu et al [ 57 ]. Nevertheless, because generative AI, such as LLMs, often requires large volumes of data (eg, large volumes of patient data) to train, it faces many existing and new security and privacy threats. If deployed carelessly, generative models increase the avenues for protected health information (PHI) to be leaked, stolen, or exposed in a breach. For example, deidentifying data for LLMs is challenging [ 60 ]. Even anonymized patterns in data could potentially reidentify individuals if models are improperly handled after training. One example is medical image analysis, as deidentified medical images could be reidentified in medical image analysis because of the massive amount of image data used in training [ 39 ]. LLMs in health care also face data quality and bias issues, similar to any machine learning model, leading to erroneous medical conclusions or recommendations [ 61 ].

Furthermore, hackers could also exploit vulnerabilities in systems hosting generative models to access the sensitive health data used for training. Skilled hackers may be able to feed prompts to models to obtain outputs of specific patient details that allow reidentification even from anonymized data. For example, improperly secured LLMs could enable bad actors to generate fake patient data or insurance claims [ 62 ]. In general, generative AI in health care encounters many of the same security and privacy threats as general AI and machine learning systems, along with new threats stemming from its unique context. On the basis of the life cycle in the studies by Liu et al [ 56 ] and Hu et al [ 57 ], our study presents a 3-phase life cycle for generative AI. It also identifies security and privacy threats and maps them to the life cycle of various generative AI systems in health care ( Figure 1 ). It should be noted that although this study primarily discusses various security and privacy threats associated with generative AI in health care (such as AI hallucination in health care), many of these threats are not unique to generative AI systems and are also prevalent in broader AI systems and machine learning models in health care and other fields.

research paper of database security

Data Collection and Processing Phase

Similar to AI systems in other fields, almost all types of generative AI in health care face integrity threats. The main integrity threats in this phase are traditionally owing to errors and biases. Unintentionally, the increased data volume and complexity of generative AI threatens data integrity because errors and biases are prone to occur [ 63 ]. Errors and biases also depend on the data sources for different types of generative AI in health care. For example, assembling genomic databases and chemical compound or protein structure databases for drug discovery is extremely challenging and could be error ridden because many genomic and protein databases lack necessary annotations, are inconsistent in formats, and may be poor in data quality [ 64 ].

Intentionally, data poisoning can occur when data are collected from various software packages by tampering with data. For example, malicious insiders can tamper with data intentionally when gathering data from various software sources. For example, malicious actors can internationally submit mislabeled genomic sequences and chemical compound protein structures to tamper genomic databases and chemical compound or protein structure databases, leading to fault training models and AI hallucination.

In addition to data poisoning from software, in health care, data may be gathered from sensors embedded in medical devices and equipment. Sensor data can be spoofed [ 65 , 66 ], tampered with, and thus poisoned. Furthermore, medical data contains a large number of images. Adversaries can exploit the difference in cognitive processes between AI and humans and tamper with images during the data collection and processing phase. Image-scaling attacks, in which an adversary manipulates images so that changes are imperceptible to the human eye but recognizable by AI after downscaling, represent one such form of attack [ 67 , 68 ]. Other attacks on data sources of medical images include, but are not limited to, copy-move tampering (ie, copying an area and moving it to another area), classical inpainting tampering (ie, patching a missing area with tampered image slices), deep inpainting tampering (ie, similar to classical inpainting tampering but using highly realistic image slides generated by GANs), sharpening, blurring, and resampling [ 69 ]. In scenarios where AI in imaging diagnostics is targeted by such attacks, the image data can be poisoned with malicious information. Furthermore, generative AI, such as GANs, has empowered hackers to generate or change the attributes or content of medical images with high visual realism, making the detection of tampered images extremely difficult [ 69 ].

Moreover, many generative AI applications in health care rely on LLMs and are trained on large amounts of internet data without being properly screened and filtered [ 70 ]. Adversaries can use AI technologies to automatically generate large quantities of fake data to poison data to be fed into LLMs, resulting in deteriorated performance of the models (eg, accuracy and fairness) and eventually AI hallucination, misinformation or disinformation, and deepfakes. Although some of these threats are not unique to generative AI in health care, they can be particularly risky if false information is used for medical decision-making. Generative AI also carries unique integrity risks. As mentioned before, its capability to create synthetic data leads to a unique integrity risk—AI hallucination. In the health care context, generative AI in health care could be used to create fake medical records or alter existing ones. Fabricated medical data can be fed again into LLMs, further threatening the integrity of medical information. For instance, the malicious use of deepfakes generated by deep generative models could fabricate a patient’s medical history to falsely claim insurance or lead to incorrect treatments. Another example is that a generative AI model may create synthetic radiology reports to diagnose nonexistent medical conditions, leading to misdiagnosis or unnecessary treatment.

By contrast, research has used synthetic data in AI for medicine and health care to address the scarcity of annotated medical data in the real world [ 71 ]. For instance, deep generative models are used to create synthetic images such as skin lesions, pathology slides, colon mucosa, and chest x-rays, thereby greatly improving the reproducibility of medical data [ 71 ]. With the development of generative AI, researchers have increasingly used GANs to synthesize realistic training data for data imputation when the data lacks distribution. Noise-to-image and image-to-image GANs have been used to synthesize realistic training magnetic resonance imaging images to boost the performance of convolutional neural networks for image diagnostic AI [ 39 , 72 ]. CorGAN [ 42 ] synthesizes discrete and continuous health care records for model training. From a broader perspective, generative AI is projected to build and use next-generation synthetic gene networks for various AI applications in health care, including medical diagnostics, drug discovery, and medical research [ 73 ]. The growth in the use of synthetic data by generative AI also creates new concerns about data integrity and AI hallucination. Nevertheless, given that health care is a heavily regulated field in terms of patient privacy and safety, researchers even claim that synthetic medical data might be promising to overcome data sharing obstacles for health care AI and free developers from sensitive patient information [ 74 ]. These applications indicate that there is a fine line between harmful AI hallucinations or deepfakes and beneficial synthetic data use by generative AI in health care. Nevertheless, even the benevolent use of synthetic medical data faces privacy and security challenges as well as integrity challenges. Deep-faked patient face images could violate patient privacy and lead to the leakage or exploitation of PHI [ 75 ]. How to navigate this fine line is both a policy and research blind spot. Currently, there are just insufficient use cases, especially for rare use cases, to establish clinical reference standards such as clinical quality measures and evaluation metrics to assess risks and benefits.

Similar to generative AI applications in other fields, almost all types of generative AI in health care face confidentiality threats. Deidentified data may become identifiable during the data collection and processing phase, and confidential proprietary medical information, such as drug development and treatment plans, may be inferred during the data collection and processing phase [ 76 ], leading to data and privacy breaches. Research has found that genomic databases are prone to privacy violations. For example, legit researchers obtain or recover the whole or partial genomic sequence of a target individual (privacy violation through reference), link the sequence to a target individual (ie, reidentifying), and identify the group of interest of a target individual (privacy violation through membership reference) when processing data from multiple sources. In addition, the growth of synthetic medical data in health AI systems raises concerns about the vulnerabilities of such systems and the challenges of the current regulations and policies.

Table 2 summarizes the data sources and security or privacy threats for each type of generative AI in health care in the data collection and processing phase.

a AI: artificial intelligence.

b CT: computed tomography.

c MRI: magnetic resonance imaging.

d EHR: electronic health record.

e NIH: National Institutes of Health.

Again, it should be noted that although all AI and machine learning systems face many similar threats, as listed in Table 2 , generative AI amplifies them because of its generating nature and data source volume and complexity. For example, generative medical research AI may update knowledge and literature databases with “wrong inputs” based on wrong findings in these databases or with synthesized but hallucinated findings. Similarly, generative virtual health assistants may put dangerous advice into knowledge databases based on erroneous data from sources or again put synthesized but hallucinated advice into such databases.

Model Training and Building Phase

Generative AI also encounters integrity issues, leading to phenomena such as AI hallucinations during model training and development phases. This is especially true for generative AI in health care. Prior research found that generative AI created nonfactual or unfaithful data and outputs [ 72 , 77 ]. The growing use of highly synthetic data or images by generative AI, such as CorGAN, exacerbates the situation as it becomes increasingly challenging for human professionals to detect unfaithful data and outputs [ 69 ]. This can be a serious integrity and authenticity issue, as both patients and clinicians expect factual, scientific answers or outputs with consistency from such models. Technically speaking, similar to all other AI models, generative AI models in health care, particularly those based on deep learning, are often seen as “black boxes” [ 78 ]. The lack of interpretability and explainability can be a significant challenge in health care, where understanding the reasoning behind a diagnosis or treatment recommendation is crucial for integrity and accountability.

Adversarial training is a method to check for the integrity and accountability of AI models. The method uses carefully crafted adversarial examples to attack the training model to check for the integrity and robustness of outputs [ 57 , 79 ]. It is an active AI research area in the health care field. Adversarial training is used to check for fake or realistic features in synthetic medical images created by GANs to avoid fabrication and misleading in the model training process. By contrast, malicious parties also intensively explore this method and use adversarial examples to attack training models to generate incorrect outcomes [ 57 ]. Technically, all types of generative AI using GANs and LLMs, particularly those in health care, can be attacked with adversarial examples that compromise the integrity of the training model. For example, adversaries can use image-scaling attacks to feed human-invisible data into an AI model to force it to make a mistake [ 67 , 68 ].

Another example is to feed an AI model with carefully crafted relabeled data to create the wrong classification [ 80 ]. When being trained with adversarial examples, a diagnostic AI could make an incorrect diagnosis, a conversational virtual assistant could offer harmful advice to patients, and a clinical decision support AI could make the wrong recommendations, to list a few. Moreover, feeding an AI model with adversarial training examples and other poisonous data can also deteriorate the performance of AI, eventually making the AI model useless and thus unavailable. In general, adversarial attacks can pose long-term risks, such as thwarting AI innovation in health care because of concerns about misdiagnosis, mistreatment, and patient safety.

Implementation Phase

In practice, generative AI systems in health care have been found to experiencing integrity threats, such as generating disinformation and misinformation, and making biased decisions [ 81 ]. AI hallucination is a newly-coined term describing the phenomenon wherein generative AI generates fake information that appears authentic [ 82 ]. If generative AI in health care is used for diagnostics, personalized medicine, or clinical assistance, AI hallucination can be extremely dangerous and may even harm patients’ lives [ 83 ]. As discussed before, because GANs and LLMs need large annotated medical data for training, the difficulty of acquiring such data (eg, unwillingness to share because of legal compliance requirements and data paucity resulting from rare medical conditions) leads to the proliferation of synthetic medical data creation. The relationship between AI hallucination by GANs and LLMs and synthetic data use is an unknown territory in research and practice, leading to unknown vulnerabilities such as adversarial attacks.

Privacy attacks are a grave concern at this stage. The use of GANs for creating synthetic EHRs and its associated privacy challenges are analyzed by Ghosheh et al [ 46 ]. Such privacy challenges are as follows: (1) risk of reidentification—although the data are synthetic, there might be a risk of reidentifying individuals if the synthetic data closely resemble real patient data; (2) data leakage—ensuring that the synthetic data do not leak sensitive information from the original data set; (3) model inversion attacks—potential for attackers to use the GAN model to infer sensitive information about the original data set. In this attack, attackers aim to reconstruct the training data using their ability to constantly query the model [ 84 ]; (4) membership inference attacks—an attacker gains access to a set of real patient records and tries to determine whether any of the real patients are included in the training set of the GAN model [ 85 ]; and (5) attribute disclosure attacks—an attacker can infer additional attributes about a patient by learning a subset of other attributes about the same patient [ 86 ].

Generative medical diagnosis and drug discovery AI involving genomic databases and chemical compound or protein structure databases are extremely susceptible to privacy attacks. Fernandes et al [ 87 ] pointed out that genomic data such as DNA data are susceptible to inference attacks, reidentification attacks, membership attacks, and recovery attacks. It is extremely concerning when such attacks target high-profile individuals. Moreover, generative AI enhances the ability to profile patients, thereby increasing the risk of privacy violations and attacks, although this capability is not unique to AI.

In addition to AI-specific security and privacy threats, AI systems interfacing with other hardware and software may face new security and privacy threats that have never existed before [ 57 ]. Malicious use and exploitation may also threaten the integrity of AI systems. Similar to other AI systems, health care AI systems, especially generative AI systems, are susceptible to code extraction and information extraction (eg, black-box, gray-box, and white-box attacks), leading to security and privacy breaches [ 57 ]. The excessive use of prompts may reveal copyright-protective data, proprietary research findings (eg, chemical compounds of a new drug), and training models or algorithms.

Table 3 summarizes the previously discussed security and privacy threats associated with each category of generative AI systems throughout their life cycle in health care.

Again, it should be noted that some of these threats are unique to generative AI systems, but many of the threats are prevalent in broader AI systems in health care and other fields.

Recommendations

As security and privacy threats exist in the life cycle of various generative AI systems in health care, from data collection through model building to clinical implementation, a systematic approach to safeguard them is critical. This section provides some recommendations on safeguards. In doing so, we rely on the National Institute of Standards and Technology Privacy Framework and the National Institute of Standards and Technology AI Risk Management Framework as well as the regulatory guidance discussed in the Literature Review section. It should be noted that although the security and privacy threats discussed in this study are significant and some are unique in the context of generative AI in health care, many are also common in other types of AI models and other AI application contexts. Hence, many of the recommendations we propose in the subsequent section can be applied to AI in non–health care contexts.

Development Protocols of Risk Assessment for Generative AI in Health Care

AI risks, including those of generative AI in health care, can emerge in a variety of ways at any phase of an AI project. Health care organizations need to learn from managing risks for other technologies to develop risk assessment protocols for generative AI in health care, along with risk assessment metrics.

AI Risk Assessment Protocols

To systematically manage AI risks, health care organizations must develop risk assessment protocols that include risk assessment procedures and methodologies by following industrial standards and frameworks as well as best practices [ 63 ]. A total of 3 main risk assessment activities are involved in the protocol development: risk identification, risk prioritization, and risk controls. All 3 activities must be conducted throughout the life cycle of a generative AI system in health care.

In the data collection and processing phase, health care organizations can use several methods to identify, prioritize, and control AI risks. As discussed before, health care data are messy and tend to have organic biases (eg, a hospital specializes in serving a particular patient demographic, attending to gender-specific health requirements or offering dedicated care for rare diseases). When collecting data or using GANs to generate synthetic data, the health care field needs to be extremely diligent. One recommendation is to establish data collection or generation policies and procedures. The separation of clinical and nonclinical data is necessary, given the significantly different risks in these 2 types of data. Similarly, the establishment of the metrics and methods to check training data on biases for clinical and nonclinical data is also important. Data provenance and authentication metrics can be used to prevent collecting data from untrustworthy sources; detecting and filtering methods can be used to identify and filter poisoned data; and data standardization improves the quality of data collection [ 57 ]. As the frontline defense, these prevention mechanisms can prevent integrity and availability attacks during this phase. Nevertheless, regardless of the mechanisms, data collected from medical sources or generated by GANs should reflect the comprehensive overview of a medical domain and the complexity of the physical and digital dimensions in such a domain to prevent biases and test for risks.

In the model training and building phase, detecting and filtering are also important for identifying and removing adversary training examples. Robustness, generalizability, and other vulnerability tests (eg, black-box and white-box tests) can further prevent integrity and availability attacks and data breaches [ 88 ]. Input reconstruction is another mechanism to pinpoint sources of adversary training [ 89 ]. Modifying training processes and models as well as training methods may also help to control AI risks in this phase [ 57 ]. Given the complexity and variety of AI models in reasoning and learning, we suggest a taxonomy approach. For example, a deep learning model can carry significantly different risks than a probabilistic learning model. By building a taxonomy of AI models and their risks, researchers can systematically identify and control security and privacy risks based on the AI model.

In the model implementation phase, routine verification and validation are key to identifying and controlling AI risks [ 63 ]. The implementation contexts of generative AI also matter. In some cases, verification and validation are about not only factual accuracy but also communications and perceptions as well as cultures. A medical chatbot that was thoroughly tested in adult populations may not be very useful in teenage populations. Gesture and face recognition AI for medical diagnosis may need to be culturally sensitive to be useful. When generative AI is integrated and interacts with other systems, for example, to create multiagent systems or medical robotics (eg, companion robots), security tests along with social, philosophical, and ethical tests are a must.

AI Risk Assessment Metrics

Given the complexity of AI security and privacy risks, health care organizations should develop risk assessment metrics for each of the 3 phases of the life cycle of a generative AI project. The following subsections highlight some measures for AI risk assessment metrics.

Security Objectives

AI risk assessment metrics should include well-established security and privacy objectives such as confidentiality, integrity, availability, nonrepudiation, authentication, and privacy protection. In the data collection and processing phase, collection technologies should be evaluated regardless of software- or hardware-based collection to ensure that they meet the security and privacy objectives. The use of synthetic medical data should follow the same security and privacy objectives to ensure that such data capture the factual and scientific truth. In the model training and building phase, vulnerability tests should be conducted to identify known and unknown threats based on security objectives. For example, availability attacks such as denial of service can be used to flood conversational health AI applications to assess their resilience and availability before deployment, and integrity attacks with poisoned data can be used to test the stability of model performance and generalizability [ 57 ]. In the implementation phase, all security objectives should be routinely assessed.

Generative AI–Specific Metrics

Ai inscrutability.

AI inscrutability refers to the lack of understandability of an AI model and its outcomes [ 63 ]. Although AI inscrutability is not directly related to security and privacy, it adds obfuscations to AI risk assessment to identify threats and vulnerabilities as well as biases owing to the lack of transparency and explainability in AI, especially in generative AI based on deep learning. Although we have identified AI inscrutability as a key metric for generative AI assessment, we acknowledge that the challenge of inscrutability is not unique to generative AI and has been a long-standing issue in the broader field of AI, particularly in health care. Various algorithms used in patient matching, diagnosis, and other proprietary applications often lack transparency because of their closed nature or intellectual property constraints. Therefore, many of them, even those that are not based on generative techniques, face similar scrutiny regarding their lack of transparency. Hence, the call for greater openness and explainability applies broadly across AI applications in health care, reflecting a growing demand for accountable and interpretable AI systems.

Nevertheless, the problem of inscrutability becomes pronounced in the context of generative AI because of its complex and often opaque decision-making processes, which can amplify the challenges already faced in health care AI. Generative AI models, especially when based on deep learning, can operate as “black boxes,” making it even more difficult for practitioners to understand how conclusions or recommendations are derived. This opacity is a critical concern in health care, where explainability and trust as well as accountability are paramount for clinical acceptance and ethical practice.

To address these concerns, there is a need for concerted efforts toward developing more interpretable AI models and regulatory frameworks that mandate transparency in AI applications, including those used in patient care. These efforts should be complemented by initiatives to educate health care professionals about the workings and limitations of AI tools, enabling them to make informed decisions while using these technologies in clinical settings. Therefore, although the inscrutability of generative AI presents specific challenges owing to the complexity and novelty of these models, it is a continuation of the broader issue of transparency in health care AI. Recognizing this, our discussion of AI inscrutability not only highlights the unique aspects of generative AI but also situates it within the ongoing discourse on the need for greater transparency and accountability in all AI applications in health care.

AI Trustworthiness

AI trustworthiness is defined as the degree to which stakeholders of an AI system have confidence in its various attributes [ 63 , 90 ]. Trust has been a significant factor in IT adoption. The fundamental argument is that if an IT system automatically runs behind the scenes to assist the work and decisions of human users, a trusting relationship must be established for users to interact with and rely on the system [ 91 ]. Nevertheless, trust is a complex concept and is built upon human users’ interaction and consequent assessment of the system from cognitive, emotional, and social dimensions [ 91 - 93 ]. Since the emergence of AI, AI trustworthiness has caught significant attention in research, given the foreseeable complexity of human-AI interaction. The rise of generative AI has stimulated more discussions on this topic. The current consensus is that AI trustworthiness itself is a complex measurement with multiple dimensions, such as reliability, resilience, accuracy, and completeness [ 63 , 90 ]. Many other AI metrics or factors, such as transparency, explainability, robustness, fairness, and user interactions or perceptions, can be the antecedents of AI trustworthiness. AI trustworthiness can also be context dependent. For example, explainability and interaction experience can be the determinants of the AI trustworthiness of a chatbot application on the patient portal, whereas reliability, accuracy, and completeness are significant factors in the AI trustworthiness of a radiology diagnosis AI for radiologists. Given the complexity of measuring AI trustworthiness, we recommend developing context-specific AI trustworthiness metrics. Similar to AI inscrutability, although AI trustworthiness is not a direct measure of security and privacy risks, it helps reduce the probability and magnitude of such risks throughout the life cycle of generative AI in health care. For instance, accuracy and reliability help to improve the integrity of an AI system.

AI Responsibility

AI responsibility is another key measure in AI risk assessment. Again, although this measure does not directly evaluate security and privacy risks, it endorses responsible AI practices that facilitate the discovery of the negative consequences and risks of AI, including the security and privacy risks of generative AI. Moreover, this measure is centered on the uniqueness of AI, especially generative AI, in “human centricity, social responsibility, and sustainability” [ 63 ]. In other words, AI responsibility is a multifaceted measure depending on many other metrics and factors such as the ethical framework (eg, biases, fairness, and transparency) and legal perspective (eg, accountability and traceability). This is also an emerging concept that is under development. The development and deployment of generative AI add complexity to this measure owing to its possible, unintended, but profound negative consequences and risks to human society. In health care, there is a legal ambiguity related to AI responsibility. Hospitals are still unclear about their legal liability when facing an AI incident. Despite such legal uncertainty, responsible AI use should be the baseline. We recommend that health care organizations use AI for consultation and assistance instead of replacement, given legal ambiguity and uncertainty, while intensively exploring generative AI from the perspectives of patient centricity and social responsibility and asking serious questions. For example, a generative drug discovery AI may find a new molecular formula for a biochemical weapon. How can we responsibly use such AI without crossing the line of no harm to human beings? Such a question leads to another key measure for AI risk assessment—AI harm.

AI harm can occur to individuals, organizations, and societies. For example, AI may cause physical harm to individual patients, damage a hospital’s reputation owing to AI incidents, and even endanger society if it is weaponized (eg, being used to disrupt the global drug manufacturing and supply chain). Hence, AI harm is a risk measure highly related to AI responsibility and trustworthiness. Developing trustworthy AI and following responsible AI practices can reduce or avoid AI harm.

It is worth mentioning that some of the metrics we proposed here pass some human characteristics into AI. A crucial philosophical distinction must be made regarding the attribution of human characteristics such as trustworthiness and responsibility to generative AI systems versus the health care organizations and technology partners developing these algorithms. Although metrics aim to make models appear more trustworthy and responsible in reality, trust emerges from human-centered institutional processes, and responsibility stems from human accountability. It may be challenging to humanize AI systems and transfer attributes such as trustworthiness to the algorithms themselves. Indicators of model transparency, reliability, or accuracy may engender confidence among stakeholders, but public trust fundamentally arises from the ethical data governance, risk communication, and oversight procedures instantiated by organizations. Without robust governance and review processes overseeing development, data practices, and risk monitoring, claims of AI trustworthiness lack substantiation. Similarly, although algorithmic outputs highlighting potential issues such as biases or errors increase awareness, this does not intrinsically amount to AI responsibility. True accountability involves diligent human investigation of problems that surface, enacting appropriate recourse, and continuous authority oversight. Metrics may aim for AI to appear more responsible, but responsibility mainly manifests in organizational commitment to discovering issues, working with experts to properly assess AI harms, and instituting robust redress processes with stakeholder input. Thus, trustworthiness and responsibility are contingent on extensive institutional support structures rather than innate model capabilities. Although progress indicators may serve as signals for these desired attributes, establishing genuine public trust and accountability in health care ultimately falls on the shoulders of health care administrators, innovators, and engaged communities, rather than solely on the algorithms themselves. Clarifying this distinction enables us to properly set expectations and delineate responsibilities as generative AI becomes increasingly prevalent in critical medical settings.

Conclusions

Integrating generative AI systems into health care offers immense potential to transform medical diagnostics, research, treatment planning, and patient care. However, deploying these data-intensive technologies also introduces complex privacy and security challenges that must be proactively addressed to ensure the safe and effective use of these systems. Examining diverse applications of generative AI across medical domains (ie, medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support) helps this study uncover vulnerabilities and threats across the life cycle of these systems, from data collection to model development to clinical implementation. Although generative AI enables innovative use cases, adequate safeguards are needed to prevent breaches of PHI and to maintain public trust. Strategies such as developing AI risk assessment protocols; formulating specific metrics for generative AI such as inscrutability, trustworthiness, responsibility, and harm; and ongoing model monitoring can help mitigate risks. However, developing robust governance frameworks and updates to data privacy regulations are also required to oversee these rapidly evolving technologies. By analyzing the use cases, impacts, and risks of generative AI across diverse domains within health care, this study contributes to theoretical discussions surrounding AI ethics, security vulnerabilities, and data privacy regulations. Future research and development in generative AI systems should emphasize security and privacy to ensure the responsible and trustworthy use of these AI models in health care. Moreover, the security and privacy concerns highlighted in this analysis should serve as a call to action for both the AI community and health care organizations looking to integrate generative AI. Collaborative efforts between AI developers, health care providers, policy makers, and domain experts will be critical to unlocking the benefits of generative AI while also prioritizing ethics, accountability, and safety. By laying the groundwork to make security and privacy the central pillars of generative AI in medicine, stakeholders can work to ensure that these transformative technologies are harnessed responsibly for patients worldwide.

Conflicts of Interest

None declared.

  • Noorbakhsh-Sabet N, Zand R, Zhang Y, Abedi V. Artificial intelligence transforms the future of health care. Am J Med. Jul 2019;132(7):795-801. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. Mar 06, 2023;9:e46885. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. Feb 9, 2023;2(2):e0000198. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li X, Jiang Y, Rodriguez-Andina JJ, Luo H, Yin S, Kaynak O. When medical images meet generative adversarial network: recent development and research opportunities. Discov Artif Intell. Sep 22, 2021;1(1):1-20. [ FREE Full text ] [ CrossRef ]
  • Topol EJ. As artificial intelligence goes multimodal, medical applications multiply. Science. Sep 15, 2023;381(6663):adk6139. [ CrossRef ] [ Medline ]
  • Dwivedi YK, Kshetri N, Hughes L, Slade EL, Jeyaraj A, Kar AK, et al. Opinion paper: “so what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inf Manage. Aug 2023;71:102642. [ CrossRef ]
  • Thirunavukarasu AJ, Ting DS, Elangovan K, Gutierrez L, Tan TF, Ting DS. Large language models in medicine. Nat Med. Aug 17, 2023;29(8):1930-1940. [ CrossRef ] [ Medline ]
  • Alqahtani H, Kavakli-Thorne M, Kumar G. Applications of generative adversarial networks (GANs): an updated review. Arch Computat Methods Eng. Dec 19, 2019;28(2):525-552. [ CrossRef ]
  • Jain S, Seth G, Paruthi A, Soni U, Kumar G. Synthetic data augmentation for surface defect detection and classification using deep learning. J Intell Manuf. Nov 18, 2020;33(4):1007-1020. [ CrossRef ]
  • Arora A, Arora A. The promise of large language models in health care. Lancet. Feb 2023;401(10377):641. [ CrossRef ]
  • Zeng X, Wang F, Luo Y, Kang S, Tang J, Lightstone FC, et al. Deep generative molecular design reshapes drug discovery. Cell Rep Med. Dec 20, 2022;3(12):100794. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jiang S, Hu J, Wood KL, Luo J. Data-driven design-by-analogy: state-of-the-art and future directions. J Mech Des. 2022;144(2):020801. [ CrossRef ]
  • Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: an emerging stage for an innovative perspective. TBench. Feb 2023;3(1):100105. [ CrossRef ]
  • Nova K. Generative AI in healthcare: advancements in electronic health records, facilitating medical languages, and personalized patient care. J Adv Anal Healthc Manag. 2023;7(1):115-131. [ FREE Full text ]
  • Zhang P, Kamel Boulos MN. Generative AI in medicine and healthcare: promises, opportunities and challenges. Future Internet. Aug 24, 2023;15(9):286. [ CrossRef ]
  • Byrne DW. Artificial Intelligence for Improved Patient Outcomes: Principles for Moving Forward with Rigorous Science. Philadelphia, PA. Lippincott Williams & Wilkins; 2022.
  • Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. In: Bohr A, Memarzadeh K, editors. Artificial Intelligence in Healthcare. Amsterdam, The Netherlands. Elsevier Academic Press; 2020.
  • Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discov Today. Jan 2021;26(1):80-93. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mosqueira-Rey E, Hernández-Pereira E, Alonso-Ríos D, Bobes-Bascarán J, Fernández-Leal Á. Human-in-the-loop machine learning: a state of the art. Artif Intell Rev. Aug 17, 2022;56(4):3005-3054. [ CrossRef ]
  • Martín-Noguerol T, Oñate Miranda MO, Amrhein TJ, Paulano-Godino F, Xiberta P, Vilanova JC, et al. The role of Artificial intelligence in the assessment of the spine and spinal cord. Eur J Radiol. Apr 2023;161:110726. [ CrossRef ] [ Medline ]
  • Ellis RJ, Sander RM, Limon A. Twelve key challenges in medical machine learning and solutions. Intell Based Med. 2022;6:100068. [ CrossRef ]
  • Martinelli DD. Generative machine learning for de novo drug discovery: a systematic review. Comput Biol Med. Jun 2022;145:105403. [ CrossRef ] [ Medline ]
  • Kasirzadeh A, Gabriel I. In conversation with artificial intelligence: aligning language models with human values. Philos Technol. Apr 19, 2023;36(2):1-24. [ CrossRef ]
  • van Bussel MJ, Odekerken-Schröder GJ, Ou C, Swart RR, Jacobs MJ. Analyzing the determinants to accept a virtual assistant and use cases among cancer patients: a mixed methods study. BMC Health Serv Res. Jul 09, 2022;22(1):890. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Xu L, Sanders L, Li K, Chow JC. Chatbot for health care and oncology applications using artificial intelligence and machine learning: systematic review. JMIR Cancer. Nov 29, 2021;7(4):e27850. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Summerfield C. Natural General Intelligence: How Understanding the Brain Can Help Us Build AI. Oxford, UK. Oxford University Press; 2022.
  • Gesk TS, Leyer M. Artificial intelligence in public services: when and why citizens accept its usage. Gov Inf Q. Jul 2022;39(3):101704. [ CrossRef ]
  • Wang Z, Qinami K, Karakozis IC, Genova K, Nair P, Hata K. Towards fairness in visual recognition: effective strategies for bias mitigation. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020 Presented at: CVPR '20; June 13-19, 2020, 2020;8916-8925; Seattle, WA. URL: https://ieeexplore.ieee.org/document/9156668 [ CrossRef ]
  • Cai Z, Xiong Z, Xu H, Wang P, Li W, Pan Y. Generative adversarial networks: a survey toward private and secure applications. ACM Comput Surv. Jul 13, 2021;54(6):1-38. [ CrossRef ]
  • Huang C, Kairouz P, Chen X, Sankar L, Rajagopal R. Context-aware generative adversarial privacy. Entropy. Dec 01, 2017;19(12):656. [ CrossRef ]
  • Tripathy A, Wang Y, Ishwar P. Privacy-preserving adversarial networks. In: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing. 2019 Presented at: ALLERTON '19; September 24-27, 2019, 2019;495-505; Monticello, IL. URL: https://ieeexplore.ieee.org/document/8919758 [ CrossRef ]
  • Chen CS, Chang SF, Liu CH. Understanding knowledge-sharing motivation, incentive mechanisms, and satisfaction in virtual communities. Soc Behav Pers. May 01, 2012;40(4):639-647. [ CrossRef ]
  • Liu S, Shrivastava A, Du J, Zhong L. Better accuracy with quantified privacy: representations learned via reconstructive adversarial network. arXiv Preprint posted online January 25, 2019. 2019. [ FREE Full text ] [ CrossRef ]
  • Tseng BW, Wu PY. Compressive privacy generative adversarial network. IEEE Trans Inf Forensics Secur. 2020;15:2499-2513. [ CrossRef ]
  • Shokri R, Shmatikov V. Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015 Presented at: CCS '15; October 12-16, 2015, 2015;1310-1321; Denver, CO. URL: https://dl.acm.org/doi/10.1145/2810103.2813687 [ CrossRef ]
  • McMahan B, Moore E, Ramage D, Hampson S. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017. Presented at: AISTATS '17; April 20-22, 2017, 2017; Fort Lauderdale, FL. URL: https://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com
  • Nasr M, Shokri R, Houmansadr A. Machine learning with membership privacy using adversarial regularization. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018 Presented at: CCS '18; October 15-19, 2018, 2018;634-646; Toronto, ON. URL: https://dl.acm.org/doi/10.1145/3243734.3243855 [ CrossRef ]
  • Abadir PM, Chellappa R, Choudhry N, Demiris G, Ganesan D, Karlawish J, et al. The promise of AI and technology to improve quality of life and care for older adults. Nat Aging. Jun 25, 2023;3(6):629-631. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kim BN, Dolz J, Jodoin PM, Desrosiers C. Privacy-net: an adversarial approach for identity-obfuscated segmentation of medical images. IEEE Trans Med Imaging. Jul 27, 2021;40(7):1737-1749. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Choi E, Biswal S, Malin B, Duke J, Stewart WF, Sun J. Generating multi-label discrete patient records using generative adversarial networks. In: Proceedings of the 2017 Machine Learning for Health Care Conference. 2017 Presented at: MLHC '17; August 18-19, 2017, 2017;1-20; Boston, MA. URL: https://proceedings.mlr.press/v68/choi17a/choi17a.pdf
  • Yale A, Dash S, Dutta R, Guyon I, Pavao A, Bennett KP. Generation and evaluation of privacy preserving synthetic health data. Neurocomput. Nov 2020;416:244-255. [ CrossRef ]
  • Torfi A, Fox EA. CorGAN: correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. arXiv Preprint posted online January 25, 2020. 2020. [ FREE Full text ]
  • Lee D, Yu H, Jiang X, Rogith D, Gudala M, Tejani M, et al. Generating sequential electronic health records using dual adversarial autoencoder. J Am Med Inform Assoc. Jul 01, 2020;27(9):1411-1419. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rane N. ChatGPT and similar generative artificial intelligence (AI) for smart industry: role, challenges and opportunities for industry 4.0, industry 5.0 and society 5.0. SSRN J. 2023. [ FREE Full text ] [ CrossRef ]
  • Bale AS, Dhumale R, Beri N, Lourens M, Varma RA, Kumar V, et al. The impact of generative content on individuals privacy and ethical concerns. Int J Intell Syst Appl Eng. 2023;12(1):697-703. [ FREE Full text ]
  • Ghosheh GO, Li J, Zhu T. A survey of generative adversarial networks for synthesizing structured electronic health records. ACM Comput Surv. Jan 22, 2024;56(6):1-34. [ CrossRef ]
  • Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: a systematic review. Neurocomput. Jul 2022;493:28-45. [ CrossRef ]
  • Fui-Hoon Nah F, Zheng R, Cai J, Siau K, Chen L. Generative AI and ChatGPT: applications, challenges, and AI-human collaboration. J Inf Technol Case Appl Res. Jul 21, 2023;25(3):277-304. [ CrossRef ]
  • Ali M, Naeem F, Tariq M, Kaddoum G. Federated learning for privacy preservation in smart healthcare systems: a comprehensive survey. IEEE J Biomed Health Inform. Feb 2023;27(2):778-789. [ CrossRef ]
  • Khan S, Saravanan V, Lakshmi TJ, Deb N, Othman NA. Privacy protection of healthcare data over social networks using machine learning algorithms. Comput Intell Neurosci. Mar 24, 2022;2022:9985933-9985938. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sun H, Zhu T, Zhang Z, Jin D, Xiong P, Zhou W. Adversarial attacks against deep generative models on data: a survey. IEEE Trans Knowl Data Eng. Apr 1, 2023;35(4):3367-3388. [ CrossRef ]
  • The legal issues presented by generative AI. MIT Sloan School of Management. URL: https://mitsloan.mit.edu/ideas-made-to-matter/legal-issues-presented-generative-ai [accessed 2024-01-29]
  • EU AI Act: first regulation on artificial intelligence. European Parliament. 2023. URL: https:/​/www.​europarl.europa.eu/​topics/​en/​article/​20230601STO93804/​eu-ai-act-first-regulation-on-artificial-intelligence [accessed 2024-02-16]
  • Blueprint for an AI bill of rights: making automated systems work for the American people. The White House. URL: https://www.whitehouse.gov/ostp/ai-bill-of-rights/ [accessed 2024-02-19]
  • Ahmad K, Maabreh M, Ghaly M, Khan K, Qadir J, Al-Fuqaha A. Developing future human-centered smart cities: critical analysis of smart city security, Data management, and Ethical challenges. Comput Sci Rev. Feb 2022;43:100452. [ CrossRef ]
  • Liu Q, Li P, Zhao W, Cai W, Yu S, Leung VC. A survey on security threats and defensive techniques of machine learning: a data driven view. IEEE Access. 2018;6:12103-12117. [ CrossRef ]
  • Hu Y, Kuang W, Qin Z, Li K, Zhang J, Gao Y, et al. Artificial intelligence security: threats and countermeasures. ACM Comput Surv. Nov 23, 2021;55(1):1-36. [ CrossRef ]
  • Tang X, Yin P, Zhou Z, Huang D. Adversarial perturbation elimination with GAN based defense in continuous-variable quantum key distribution systems. Electronics. May 27, 2023;12(11):2437. [ CrossRef ]
  • Gu S, Rigazio L. Towards deep neural network architectures robust to adversarial examples. arXiv Preprint posted online December 11, 2014. 2014. [ FREE Full text ]
  • Brown H, Lee K, Mireshghallah F, Shokri R, Tramèr F. What does it mean for a language model to preserve privacy? In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. Presented at: FAccT '22; June 21-24, 2022, 2022; Seoul, Republic of Korea. URL: https://dl.acm.org/doi/fullHtml/10.1145/3531146.3534642 [ CrossRef ]
  • Albahri A, Duhaim AM, Fadhel MA, Alnoor A, Baqer NS, Alzubaidi L, et al. A systematic review of trustworthy and explainable artificial intelligence in healthcare: assessment of quality, bias risk, and data fusion. Inf Fusion. Aug 2023;96:156-191. [ CrossRef ]
  • Hacker P, Engel A, Mauer M. Regulating ChatGPT and other large generative AI models. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 2023 Presented at: FAccT '23; June 12-15, 2023, 2023;1112-1113; Chicago, IL. URL: https://dl.acm.org/doi/abs/10.1145/3593013.3594067 [ CrossRef ]
  • Artificial Intelligence Risk Management Framework (AIRMF1.0). National Institute of Standards and Technology. 2023. URL: https://doi.org/10.6028/NIST.AI.100-1 [accessed 2023-09-20]
  • Learned K, Durbin A, Currie R, Kephart ET, Beale HC, Sanders LM, et al. Barriers to accessing public cancer genomic data. Sci Data. Jun 20, 2019;6(1):98. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Park Y, Son Y, Shin H, Kim D. This ain't your dose: sensor spoofing attack on medical infusion pump. In: Proceedings of the 10th USENIX Workshop on Offensive Technologies. 2016. Presented at: WOOT '16; August 8-9, 2016, 2016; Austin, TX. URL: https://www.usenix.org/system/files/conference/woot16/woot16-paper-park_0.pdf
  • Shoukry Y, Martin P, Tabuada P, Srivastava M. Non-invasive spoofing attacks for anti-lock braking systems. In: Proceedings of the 15th International Workshop on Cryptographic Hardware and Embedded Systems. 2013 Presented at: CHES '13; August 20-23, 2013, 2013;55-72; Santa Barbara, CA. URL: https://link.springer.com/chapter/10.1007/978-3-642-40349-1_4 [ CrossRef ]
  • Quiring E, Klein D, Arp D, Johns M, Rieck K. Adversarial preprocessing: understanding and preventing image-scaling attacks in machine learning. In: Proceedings of the 29th USENIX Security Symposium. 2020 Presented at: USS '20; August 12-14, 2020, 2020;1363-1380; Boston, MA. URL: https://www.usenix.org/conference/usenixsecurity20/presentation/quiring
  • Xiao Q, Chen Y, Shen C, Chen Y, Li K. Seeing is not believing: camouflage attacks on image scaling algorithms. In: Proceedings of the 28th USENIX Security Symposium. 2019. Presented at: USENIXS '19; August 14-16, 2019, 2019; Santa Clara, CA. URL: https://www.usenix.org/conference/usenixsecurity19/presentation/xiao
  • Reichman B, Jing L, Akin O, Tian Y. Medical image tampering detection: a new dataset and baseline. In: Proceedings of the 2021 Workshops and Challenges on Pattern Recognition. 2021 Presented at: ICPR '21; January 10-15, 2021, 2021;266-277; Virtual Event. URL: https://link.springer.com/chapter/10.1007/978-3-030-68763-2_20 [ CrossRef ]
  • Harrer S. Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine. EBioMedicine. Apr 2023;90:104512. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen RJ, Lu MY, Chen TY, Williamson DF, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng. Jun 15, 2021;5(6):493-497. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Han C, Rundo L, Araki R, Nagano Y, Furukawa Y, Mauri G, et al. Combining noise-to-image and image-to-image GANs: brain MR image augmentation for tumor detection. IEEE Access. 2019;7:156966-156977. [ CrossRef ]
  • Lu TK, Khalil AS, Collins JJ. Next-generation synthetic gene networks. Nat Biotechnol. Dec 9, 2009;27(12):1139-1150. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Synthetic data is enabling better healthcare tools - here’s how. Particle Health. URL: https://www.particlehealth.com/blog/synthetic-data-healthcare-tools [accessed 2024-01-29]
  • Chen D, Yu N, Zhang Y, Fritz M. GAN-Leaks: a taxonomy of membership inference attacks against generative models. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 2020 Presented at: CCS '20; November 9-13, 2020, 2020;343-362; Virtual Event. URL: https://dl.acm.org/doi/10.1145/3372297.3417238 [ CrossRef ]
  • Wang Z, Song M, Zhang Z, Song Y, Wang Q, Qi H. Beyond inferring class representatives: user-level privacy leakage from federated learning. In: Proceedings of the 2019 IEEE Conference on Computer Communications. 2019 Presented at: IEEE INFOCOM '19; April 29-May 2, 2019, 2019;2512-2520; Virtual Event. URL: https://dl.acm.org/doi/abs/10.1109/infocom.2019.8737416 [ CrossRef ]
  • Xie Q, Schenck EJ, Yang HS, Chen Y, Peng Y, Wang F. Faithful AI in medicine: a systematic review with large language models and beyond. Res Sq. Dec 04, 2023.:2023. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McCoy LG, Brenna CT, Chen SS, Vold K, Das S. Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based. J Clin Epidemiol. Feb 2022;142:252-257. [ CrossRef ] [ Medline ]
  • Mahmood F, Chen R, Durr NJ. Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Trans Med Imaging. Dec 2018;37(12):2572-2581. [ CrossRef ]
  • Shafahi A, Huang W, Najibi M, Suciu O, Studer C, Dumitras TA, et al. Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018 Presented at: NIPS'18; December 3-8, 2018, 2018;6106-6116; Montréal, QC. URL: https://dl.acm.org/doi/10.5555/3327345.3327509
  • Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, et al. Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. J Med Internet Res. Jun 30, 2023;25:e47479. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Athaluri S, Manthena SV, Kesapragada VK, Yarlagadda V, Dave T, Duddumpudi RT. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus. Apr 2023;15(4):e37432. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI Chatbot for medicine. N Engl J Med. Mar 30, 2023;388(13):1233-1239. [ CrossRef ]
  • Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015 Presented at: CCS '15; October 12-16, 2015, 2015;1322-1323; Denver, CO. [ CrossRef ]
  • Shokri R, Stronati M, Song C, Shmatikov V. Membership inference attacks against machine learning models. In: Proceedings of the 38th IEEE Symposium on Security and Privacy. 2017 Presented at: SSP '17; May 22-24, 2017, 2017;3-18; San Jose, CA. URL: https://www.computer.org/csdl/proceedings-article/sp/2017/07958568/12OmNBUAvVc [ CrossRef ]
  • Matwin S, Nin J, Sehatkar M, Szapiro T. A review of attribute disclosure control. In: Advanced Research in Data Privacy. Thousand Oaks, CA. Springer; 2015.
  • Fernandes M, Decouchant J, Couto FM. Security, privacy, and trust management in DNA computing. Adv Comput. 2023.:129. [ CrossRef ]
  • Mopuri KR, Uppala PK, Babu VR. Ask, acquire, and attack: data-free UAP generation using class impressions. In: proceedings of the 15th European Conference on Computer Vision. 2018 Presented at: ECCV '18; September 8-14, 2018, 2018;20-35; Munich, Germany. URL: https://link.springer.com/chapter/10.1007/978-3-030-01240-3_2 [ CrossRef ]
  • Song Y, Kim T, Nowozin S, Ermon S, Kushman N. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv Preprint posted online October 30, 2017. 2017. [ FREE Full text ] [ CrossRef ]
  • Mattioli J, Sohier H, Delaborde A, Amokrane-Ferka K, Awadid A, Chihani Z, et al. Towards a holistic approach for AI trustworthiness assessment based upon aids for multi-criteria aggregation. In: Proceedings of the Safe AI 2023-The AAAI’s Workshop on Artificial Intelligence Safety. 2023. Presented at: SafeAI '23; February 13-14, 2023, 2023; Washington, DC. URL: https://hal.science/hal-04086455
  • Chen Y, Zahedi FM, Abbasi A, Dobolyi D. Trust calibration of automated security IT artifacts: a multi-domain study of phishing-website detection tools. Inf Manag. Jan 2021;58(1):103394. [ CrossRef ]
  • Lankton N, McKnight DH, Tripp J. Technology, humanness, and trust: rethinking trust in technology. J Assoc Inf Syst. Oct 2015;16(10):880-918. [ FREE Full text ] [ CrossRef ]
  • Mcknight DH, Carter M, Thatcher JB, Clay PF. Trust in a specific technology: an investigation of its components and measures. ACM Trans Manag Inf Syst. Jul 2011;2(2):1-25. [ CrossRef ]

Abbreviations

Edited by T de Azevedo Cardoso, G Eysenbach; submitted 22.09.23; peer-reviewed by P Williams, M Noman; comments to author 27.11.23; revised version received 12.12.23; accepted 31.01.24; published 08.03.24.

©Yan Chen, Pouyan Esmaeilzadeh. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 08.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

  • Industry trends
  • Built-in security

Evolving Microsoft Security Development Lifecycle (SDL): How continuous SDL can help you build more secure software

  • By David Ornstein, Principal Software Engineering Manager
  • By Tony Rice, Principal Security PM Manager, Customer Security and Trust

The software developers and systems engineers at Microsoft work with large-scale, complex systems, requiring collaboration among diverse and global teams, all while navigating the demands of rapid technological advancement, and today we’re sharing how they’re tackling security challenges in the white paper: “Building the next generation of the Microsoft Security Development Lifecycle (SDL)” , created by pioneers of future software development practices.

Two decades of evolution

It’s been 20 years since we introduced the Microsoft Security Development Lifecycle (SDL) —a set of practices and tools that help developers build more secure software, now used industry-wide. Mirroring the culture of Microsoft to uphold security and born out of the Trustworthy Computing initiative, the aim of SDL was—and still is—to embed security and privacy principles into technology from the start and prevent vulnerabilities from reaching customers’ environments.

In 20 years, the goal of SDL hasn’t changed. But the software development and cybersecurity landscape has—a lot.

With cloud computing, Agile methodologies, and continuous integration/continuous delivery (CI/CD) pipeline automation, software is shipped faster and more frequently. The software supply chain has become more complex and vulnerable to cyberattacks. And new technologies like AI and quantum computing pose new challenges and opportunities for security.

SDL is now a critical pillar of the Microsoft Secure Future Initiative , a multi-year commitment that advances the way we design, build, test, and operate our Microsoft Cloud technology to ensure that we deliver solutions meeting the highest possible standard of security.

Side view of a man, with monitors in the background, and a graphic design overlay

Next generation of the Microsoft SDL

Learn how we're tackling security challenges.

Continuous evaluation

Microsoft has been evolving the SDL to what we call “continuous SDL”. In short, Microsoft now measures security state more frequently and throughout the development lifecycle. Why? Because times have changed, products are no longer shipped on an annual or biannual basis. With the cloud and CI/CD practices, services are shipped daily or sometimes multiple times a day.

Data-driven methodology

To achieve scale across Microsoft, we automate measurement with a data-driven methodology when possible. Data is collected from various sources, including code analysis tools like CodeQL. Our compliance engine uses this data to trigger actions when needed.

CodeQL : A static analysis engine used by developers to perform security analysis on code outside of a live environment.

While some SDL controls may never be fully automated, the data-driven methodology helps deliver better security outcomes. In pilot deployments of CodeQL, 92% of action items were addressed and resolved in a timely fashion. We also saw a 77% increase in CodeQL onboarding amongst pilot services.

Transparent, traceable evidence

Software supply chain security has become a top priority due to the rise of high-profile attacks and the increase in dependencies on open-source software. Transparency is particularly important, and Microsoft has pioneered traceability and transparency in the SDL for years. Just as one example, in response to Executive Order 14028 , we added a requirement to the SDL to generate software bills of material (SBOMs) for greater transparency.

But we didn’t stop there.

To provide transparency into how fixes happen, we now architect the storage of evidence into our tooling and platforms. Our compliance engine collects and stores data and telemetry as evidence. By doing so, when the engine determines that a compliance requirement has been met, we can point to the data used to make that determination. The output is available through an interconnected “graph”, which links together various signals from developer activity and tooling outputs to create high-fidelity insights. This helps us give customers stronger assurances of our security end-to-end.

Design, Architecture, and Governance step by step delivery

Modernized practices

Beyond making the SDL automated, data-driven, and transparent, Microsoft is also focused on modernizing the practices that the SDL is built on to keep up with changing technologies and ensure our products and services are secure by design and by default. In 2023, six new requirements were introduced, six were retired, and 19 received major updates. We’re investing in new threat modeling capabilities, accelerating the adoption of new memory-safe languages, and focusing on securing open-source software and the software supply chain.

We’re committed to providing continued assurance to open-source software security, measuring and monitoring open-source code repositories to ensure vulnerabilities are identified and remediated on a continuous basis. Microsoft is also dedicated to bringing responsible AI into the SDL, incorporating AI into our security tooling to help developers identify and fix vulnerabilities faster. We’ve built new capabilities like the AI Red Team to find and fix vulnerabilities in AI systems.

By introducing modernized practices into the SDL, we can stay ahead of attacker innovation, designing faster defenses that protect against new classes of vulnerabilities.

How can continuous SDL benefit you?

Continuous SDL can help you in several ways:

  • Peace of mind : You can continue to trust that Microsoft products and services are secure by design, by default, and in deployment. Microsoft follows the continuous SDL for software development to continuously evaluate and improve its security posture.
  • Best practices : You can learn from Microsoft’s best practices and tools to apply them to your own software development. Microsoft shares its SDL guidance and resources with the developer community and contributes to open-source security initiatives.
  • Empowerment : You can prepare for the future of security. Microsoft invests in new technologies and capabilities that address emerging threats and opportunities, such as post-quantum cryptography, AI security, and memory-safe languages.

Where can you learn more?

For more details and visual demonstrations on continuous SDL, read the full white paper by SDL pioneers Tony Rice and David Ornstein.

Learn more about the Secure Future Initiative and how Microsoft builds security into everything we design, develop, and deploy.

IMAGES

  1. (PDF) Database Security Issues: A Review

    research paper of database security

  2. Research Paper On Applications Of Cyber Security

    research paper of database security

  3. Properties of Database Security

    research paper of database security

  4. (PDF) Security Of Database Management Systems

    research paper of database security

  5. 👍 Database security research paper. Database security research paper

    research paper of database security

  6. What Is Database Security: Standards, Threats, Protection

    research paper of database security

VIDEO

  1. Introduction to databases

  2. Lecture 10 Database Application

  3. lecture 7 Database Application

  4. #1 Certificate To Be a Data Analyst or Data Scientist

  5. Introduction to Temporal Databases

  6. AL ICT Model Paper Question 05 : DataBase Question

COMMENTS

  1. Data Security and Privacy: Concepts, Approaches, and Research

    Data are today an asset more critical than ever for all organizations we may think of. Recent advances and trends, such as sensor systems, IoT, cloud computing, and data analytics, are making possible to pervasively, efficiently, and effectively collect data. However for data to be used to their full power, data security and privacy are critical. Even though data security and privacy have been ...

  2. Database Security Threats and Challenges

    Most database security features have to be developed to secure the database environment. The aim of the paper is to underline the types of threats and challenges and their impact on sensitive data and to present different safety models. The assumption underpinning this study is that it understands the weaknesses, threats and challenges faced by ...

  3. 2425 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATABASE SECURITY. Find methods information, sources, references or conduct a literature review on ...

  4. Database Security: An Overview and Analysis of Current Trend

    This paper talks about the basics of database including its meaning, characteristics, role etc. with special focus on different security challenges in the database. Moreover, this paper highlights the basics of security management, tools in this regard. Hence different areas of database security have mentioned in this paper in a simple sense.

  5. Advancing database security: a comprehensive systematic ...

    This SMS study aimed to identify the most up-to-date research in database security and the different challenges faced by users/clients using various databases from a software engineering perspective. In total, 20 challenges were identified related to database security. ... 1.2 Motivation for the paper. Several research in the literature seeks ...

  6. Security Analysis, Threats, & Challenges in Database

    This paper addresses the relational database threats and security techniques considerations in relation to situations: threats, countermeasures (computer-based controls) and database security methods.

  7. Database Security: Attacks and Solutions

    This research paper coheres databases and its security in any organization. Issues of unauthorized access, deception, vulnerability, authentication and fabrication has been discussed along with the solutions to these attacks. ... key management system and comprehensive protection will positively impact and would tell the importance and delicacy ...

  8. Privacy Prevention of Big Data Applications: A Systematic Literature

    Information security is a Big Data issue. The research community is focusing on the development in the period of Big Data, computer science, and increasing business applications of quick and efficient algorithms for Big Data security intelligence, with the primary aim of ensuring a safe environment free of unlawful access (Cheng et al., 2017).

  9. (PDF) Database Security

    In this paper, we first survey the most relevant concepts underlying the notion of database security and summarize the most well-known techniques. We focus on access control systems, on which a ...

  10. Data security governance in the era of big data: status, challenges

    Global status of data security governance. Countries and economic communities across the globe have devised countermeasures to cope with emerging big data security issues, and prepare for upcoming problems through enhancing data security governance. 1.1. Stepping up legislative efforts in protecting personal data.

  11. Database Security Research Papers

    Hybrid Encryption Technique Using RSA with SHA-1 Algorithm in Data-At-Rest and Data-In-Motion Level. Data base security is the mechanisms that protect the data base against intentional or accidental threats. It is also a specialty within the broader discipline of computer security using encryption techniques.

  12. database security Latest Research Papers

    One way to maintain the security of the database is to use encryption techniques. The method used to secure the database is encryption using the ROTI3 and Caesar Cipher methods. Both of these methods have advantages in processing speed. For thisreason, the author will compare the use of the two algorithms above in terms of the encryption and ...

  13. Cyber risk and cybersecurity: a systematic review of data availability

    Finally, this research paper highlights the need for open access to cyber-specific data, without price or permission barriers. ... Moreno et al. developed a database of 300 security-related accidents from European and American sources. The database contained cybersecurity-related events in the chemical and process industry.

  14. The Impact of Artificial Intelligence on Data System Security: A

    This paper aims at identifying research trends in the field through a systematic bibliometric literature review (LRSB) of research on AI and system security. the review entails 77 articles published in the Scopus ® database, presenting up-to-date knowledge on the topic. the LRSB results were synthesized across current research subthemes ...

  15. Security in database systems: A research perspective

    Computers & Security, 11 (1992) 41-56 Security in Database Systems A Research Perspective* Teresa F. Lunt Computer Science Laboratory, SRI International, Menlo Park, CA 99025, USA Database security has been the subject of active research for the past several years. In the last five years, rapid progress has been made in defining what security ...

  16. Security and privacy protection in cloud computing ...

    Research on access control technology based on trust relationship. With the development of research on the trust model, the trust relationship among the data provider, cloud platform and user in a cloud computing system is different. (6) Research and implement a cross-domain, cross group, hierarchical dynamic fine-grained access control system.

  17. Journal of Cybersecurity

    Call for Papers. Journal of Cybersecurity is soliciting papers for a special collection on the philosophy of information security. This collection will explore research at the intersection of philosophy, information security, and philosophy of science. Find out more.

  18. Big Data Security and Privacy Protection

    In view of the wide application and popularization of large data, more and more data security and privacy issues have brought great challenges to the development of large data. Starting from the characteristics of big data, this paper analyses various risks of information security, and puts forward the corresponding development strategy of big data security. The results show that the ...

  19. data security Latest Research Papers

    And Control . Characteristic Measure. This paper aims to study the Countermeasures of big data security management in the prevention and control of computer network crime in the absence of relevant legislation and judicial practice. Starting from the concepts and definitions of computer crime and network crime, this paper puts forward the ...

  20. Database Security: An Essential Guide

    Database security refers to the range of tools, controls and measures designed to establish and preserve database confidentiality, integrity and availability. Confidentiality is the element that's compromised in most data breaches. The data in the database. The database management system (DBMS). Any associated applications.

  21. Full article: SQL queries over encrypted databases: a survey

    To address the current gaps in encrypted database research, we need a clear classification of existing SQL query schemes. Additionally, we need to establish a comprehensive evaluation model to fully assess the SQL query schemes from the aspects of SQL functionality, performance, and security level. ... This paper primarily focuses on the semi ...

  22. Membership Inference Attacks and Privacy in Topic Modeling

    Recent research shows that large language models are susceptible to privacy attacks that infer aspects of the training data. However, it is unclear if simpler generative models, like topic models, share similar vulnerabilities. In this work, we propose an attack against topic models that can confidently identify members of the training data in Latent Dirichlet Allocation. Our results suggest ...

  23. Electronics

    Federated learning (FL) is increasingly challenged by security and privacy concerns, particularly vulnerabilities exposed by malicious participants. There remains a gap in effectively countering threats such as model inversion and poisoning attacks in existing research. To address these challenges, this paper proposes the Effective Private-Protected Federated Learning Aggregation Algorithm ...

  24. Experts Week: Research Data Security

    Equal Opportunity and Nondiscrimination at Princeton University: Princeton University believes that commitment to principles of fairness and respect for all is favorable to the free and open exchange of ideas, and the University seeks to reach out as widely as possible in order to attract the ablest individuals as students, faculty, and staff. In applying this policy, the University is ...

  25. Journal of Medical Internet Research

    This Viewpoint paper aims to explore various categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support, while identifying security and privacy threats within each phase of the life cycle of such systems (ie, data collection, model ...

  26. (PDF) Database Security

    The database administrator (DBA) management framework is made to watch out for the security of information/data. This paper presents various security aspects of the database administrator (DB ...

  27. Beyond the Eye of the Storm

    This paper makes the case that there is much to learn about how climate change may affect U.S. national security. The authors highlight areas where new research is needed and illustrate how climate change may affect the Department of the Air Force.

  28. RESEARCH PAPER: Fortinet: Optimizing Business Outcomes Through Network

    Will Townsend manages the networking and security practices for Moor Insights & Strategy focused on carrier infrastructure providers, carrier services, enterprise networking and security. He brings over 30 years of technology industry experience in a variety of product, marketing, channel, business development and sales roles to his advisory ...

  29. Evolving Microsoft Security Development Lifecycle

    The software developers and systems engineers at Microsoft work with large-scale, complex systems, requiring collaboration among diverse and global teams, all while navigating the demands of rapid technological advancement, and today we're sharing how they're tackling security challenges in the white paper: "Building the next generation of the Microsoft Security Development Lifecycle ...

  30. Research paper A comprehensive review study of cyber-attacks and cyber

    Cyber-security includes practical measures to protect information, networks and data against internal or external threats. Cyber-security professionals protect networks, servers, intranets, and computer systems. Cyber-security ensures that only authorized individuals have access to that information (Ahmed Jamal et al., 2021). For better ...