• Survey paper
  • Open access
  • Published: 06 June 2019

Intelligent video surveillance: a review through deep learning techniques for crowd analysis

  • G. Sreenu   ORCID: orcid.org/0000-0002-2298-9177 1 &
  • M. A. Saleem Durai 1  

Journal of Big Data volume  6 , Article number:  48 ( 2019 ) Cite this article

73k Accesses

259 Citations

11 Altmetric

Metrics details

Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV cameras are implemented in all places where security having much importance. Manual surveillance seems tedious and time consuming. Security can be defined in different terms in different contexts like theft identification, violence detection, chances of explosion etc. In crowded public places the term security covers almost all type of abnormal events. Among them violence detection is difficult to handle since it involves group activity. The anomalous or abnormal activity analysis in a crowd video scene is very difficult due to several real world constraints. The paper includes a deep rooted survey which starts from object recognition, action recognition, crowd analysis and finally violence detection in a crowd environment. Majority of the papers reviewed in this survey are based on deep learning technique. Various deep learning methods are compared in terms of their algorithms and models. The main focus of this survey is application of deep learning techniques in detecting the exact count, involved persons and the happened activity in a large crowd at all climate conditions. Paper discusses the underlying deep learning implementation technology involved in various crowd video analysis methods. Real time processing, an important issue which is yet to be explored more in this field is also considered. Not many methods are there in handling all these issues simultaneously. The issues recognized in existing methods are identified and summarized. Also future direction is given to reduce the obstacles identified. The survey provides a bibliographic summary of papers from ScienceDirect, IEEE Xplore and ACM digital library.

Bibliographic Summary of papers in different digital repositories

Bibliographic summary about published papers under the area “Surveillance video analysis through deep learning” in digital repositories like ScienceDirect, IEEExplore and ACM are graphically demonstrated.

ScienceDirect

SceinceDirect lists around 1851 papers. Figure  1 demonstrates the year wise statistics.

figure 1

Year wise paper statistics of “surveillance video analysis by deep learning”, in ScienceDirect

Table  1 list title of 25 papers published under same area.

Table  2 gives the list of journals in ScienceDirect where above mentioned papers are published.

Keywords always indicate the main disciplines of the paper. An analysis is conducted through keywords used in published papers. Table  3 list the frequency of most frequently used keywords.

ACM digital library includes 20,975 papers in the given area. The table below includes most recently published surveillance video analysis papers under deep learning field. Table  4 lists the details of published papers in the area.

IEEE Xplore

Table  5 shows details of published papers in the given area in IEEEXplore digital library.

Violence detection among crowd

The above survey presents the topic surveillance video analysis as a general topic. By going more deeper into the area more focus is given to violence detection in crowd behavior analysis.

Table  6 lists papers specific to “violence detection in crowd behavior” from above mentioned three journals.

Introduction

Artificial intelligence paves the way for computers to think like human. Machine learning makes the way more even by adding training and learning components. The availability of huge dataset and high performance computers lead the light to deep learning concept, which extract automatically features or the factors of variation that distinguishes objects from one another. Among the various data sources which contribute to terabytes of big data, video surveillance data is having much social relevance in today’s world. The widespread availability of surveillance data from cameras installed in residential areas, industrial plants, educational institutions and commercial firms contribute towards private data while the cameras placed in public places such as city centers, public conveyances and religious places contribute to public data.

Analysis of surveillance videos involves a series of modules like object recognition, action recognition and classification of identified actions into categories like anomalous or normal. This survey giving specific focus on solutions based on deep learning architectures. Among the various architectures in deep learning, commonly used models for surveillance analysis are CNN, auto-encoders and their combination. The paper Video surveillance systems-current status and future trends [ 14 ] compares 20 papers published recently in the area of surveillance video analysis. The paper begins with identifying the main outcomes of video analysis. Application areas where surveillance cameras are unavoidable are discussed. Current status and trends in video analysis are revealed through literature review. Finally the vital points which need more consideration in near future are explicitly stated.

Surveillance video analysis: relevance in present world

The main objectives identified which illustrate the relevance of the topic are listed out below.

Continuous monitoring of videos is difficult and tiresome for humans.

Intelligent surveillance video analysis is a solution to laborious human task.

Intelligence should be visible in all real world scenarios.

Maximum accuracy is needed in object identification and action recognition.

Tasks like crowd analysis are still needs lot of improvement.

Time taken for response generation is highly important in real world situation.

Prediction of certain movement or action or violence is highly useful in emergency situation like stampede.

Availability of huge data in video forms.

The majority of papers covered for this survey give importance to object recognition and action detection. Some papers are using procedures similar to a binary classification that whether action is anomalous or not anomalous. Methods for Crowd analysis and violence detection are also included. Application areas identified are included in the next section.

Application areas identified

The contexts identified are listed as application areas. Major part in existing work provides solutions specifically based on the context.

Traffic signals and main junctions

Residential areas

Crowd pulling meetings

Festivals as part of religious institutions

Inside office buildings

Among the listed contexts crowd analysis is the most difficult part. All type of actions, behavior and movement are needed to be identified.

Surveillance video data as Big Data

Big video data have evolved in the form of increasing number of public cameras situated towards public places. A huge amount of networked public cameras are positioned around worldwide. A heavy data stream is generated from public surveillance cameras that are creatively exploitable for capturing behaviors. Considering the huge amount of data that can be documented over time, a vital scenario is facility for data warehousing and data analysis. Only one high definition video camera can produce around 10 GB of data per day [ 87 ].

The space needed for storing large amount of surveillance videos for long time is difficult to allot. Instead of having data, it will be useful to have the analysis result. That will result in reduced storage space. Deep learning techniques are involved with two main components; training and learning. Both can be achieved with highest accuracy through huge amount of data.

Main advantages of training with huge amount of data are listed below. It’s possible to adapt variety in data representation and also it can be divided into training and testing equally. Various data sets available for analysis are listed below. The dataset not only includes video sequences but also frames. The analysis part mainly includes analysis of frames which were extracted from videos. So dataset including images are also useful.

The datasets widely used for various kinds of application implementation are listed in below Table  7 . The list is not specific to a particular application though it is specified against an application.

Methods identified/reviewed other than deep learning

Methods identified are mainly classified into two categories which are either based on deep learning or not based on deep learning. This section is reviewing methods other than deep learning.

SVAS deals with automatic recognition and deduction of complex events. The event detection procedure consists of mainly two levels, low level and high level. As a result of low level analysis people and objects are detected. The results obtained from low level are used for high level analysis that is event detection. The architecture proposed in the model includes five main modules. The five sections are

Event model learning

Action model learning

Action detection

Complex event model learning

Complex event detection

Interval-based spatio-temporal model (IBSTM) is the proposed model and is a hybrid event model. Other than this methods like Threshold models, Bayesian Networks, Bag of actions and Highly cohesive intervals and Markov logic networks are used.

SVAS method can be improved to deal with moving camera and multi camera data set. Further enhancements are needed in dealing with complex events specifically in areas like calibration and noise elimination.

Multiple anomalous activity detection in videos [ 88 ] is a rule based system. The features are identified as motion patterns. Detection of anomalous events are done either by training the system or by following dominant set property.

The concept of dominant set where events are detected as normal based on dominant behavior and anomalous events are decided based on less dominant behavior. The advantage of rule based system is that easy to recognize new events by modifying some rules. The main steps involved in a recognition system are

Pre processing

Feature extraction

Object tracking

Behavior understanding

As a preprocessing system video segmentation is used. Background modeling is implemented through Gaussian Mixture Model (GMM). For object recognition external rules are required. The system is implemented in Matlab 2014. The areas were more concentration further needed are doubtful activities and situations where multiple object overlapping happens.

Mining anomalous events against frequent sequences in surveillance videos from commercial environments [ 89 ] focus on abnormal events linked with frequent chain of events. The main result in identifying such events is early deployment of resources in particular areas. The implementation part is done using Matlab, Inputs are already noticed events and identified frequent series of events. The main investigation under this method is to recognize events which are implausible to chase given sequential pattern by fulfilling the user identified parameters.

The method is giving more focus on event level analysis and it will be interesting if pay attention at entity level and action level. But at the same time going in such granular level make the process costly.

Video feature descriptor combining motion and appearance cues with length invariant characteristics [ 90 ] is a feature descriptor. Many trajectory based methods have been used in abundant installations. But those methods have to face problems related with occlusions. As a solution to that, feature descriptor using optical flow based method.

As per the algorithm the training set is divided into snippet set. From each set images are extracted and then optical flow are calculated. The covariance is calculated from optical flow. One class SVM is used for learning samples. For testing also same procedure is performed.

The model can be extended in future by handling local abnormal event detection through proposed feature which is related with objectness method.

Multiple Hierarchical Dirichlet processes for anomaly detection in Traffic [ 91 ] is mainly for understanding the situation in real world traffic. The anomalies are mainly due to global patterns instead of local patterns. That include entire frame. Concept of super pixel is included. Super pixels are grouped into regions of interest. Optical flow based method is used for calculating motion in each super pixel. Points of interest are then taken out in active super pixel. Those interested points are then tracked by Kanade–Lucas–Tomasi (KLT) tracker.

The method is better the handle videos involving complex patterns with less cost. But not mentioning about videos taken in rainy season and bad weather conditions.

Intelligent video surveillance beyond robust background modeling [ 92 ] handle complex environment with sudden illumination changes. Also the method will reduce false alerts. Mainly two components are there. IDS and PSD are the two components.

First stage intruder detection system will detect object. Classifier will verify the result and identify scenes causing problems. Then in second stage problematic scene descriptor will handle positives generated from IDS. Global features are used to avoid false positives from IDS.

Though the method deals with complex scenes, it does not mentioning about bad weather conditions.

Towards abnormal trajectory and event detection in video surveillance [ 93 ] works like an integrated pipeline. Existing methods either use trajectory based approaches or pixel based approaches. But this proposal incorporates both methods. Proposal include components like

Object and group tracking

Grid based analysis

Trajectory filtering

Abnormal behavior detection using actions descriptors

The method can identify abnormal behavior in both individual and groups. The method can be enhanced by adapting it to work in real time environment.

RIMOC: a feature to discriminate unstructured motions: application to violence detection for video surveillance [ 94 ]. There is no unique definition for violent behaviors. Those kind of behaviors show large variances in body poses. The method works by taking the eigen values of histograms of optical flow.

The input video undergoes dense sampling. Local spatio temporal volumes are created around each sampled point. Those frames of STV are coded as histograms of optical flow. Eigen values are computed from this frame. The papers already published in surveillance area span across a large set. Among them methods which are unique in either implementation method or the application for which it is proposed are listed in the below Table  8 .

The methods already described and listed are able to perform following steps

Object detection

Object discrimination

Action recognition

But these methods are not so efficient in selecting good features in general. The lag identified in methods was absence of automatic feature identification. That issue can be solved by applying concepts of deep learning.

The evolution of artificial intelligence from rule based system to automatic feature identification passes machine learning, representation learning and finally deep learning.

Real-time processing in video analysis

Real time Violence Detection Framework for Football Stadium comprising of Big Data Analysis and deep learning through Bidirectional LSTM [ 103 ] predicts violent behavior of crowd in real time. The real time processing speed is achieved through SPARK frame work. The model architecture includes Apache spark framework, spark streaming, Histogram of oriented Gradients function and bidirectional LSTM. The model takes stream of videos from diverse sources as input. The videos are converted in the form of non overlapping frames. Features are extracted from this group of frames through HOG FUNCTION. The images are manually modeled into different groups. The BDLSTM is trained through all these models. The SPARK framework handles the streaming data in a micro batch mode. Two kinds of processing are there like stream and batch processing.

Intelligent video surveillance for real-time detection of suicide attempts [ 104 ] is an effort to prevent suicide by hanging in prisons. The method uses depth streams offered by an RGB-D camera. The body joints’ points are analyzed to represent suicidal behavior.

Spatio-temporal texture modeling for real-time crowd anomaly detection [ 105 ]. Spatio temporal texture is a combination of spatio temporal slices and spatio temporal volumes. The information present in these slices are abstracted through wavelet transforms. A Gaussian approximation model is applied to texture patterns to distinguish normal behaviors from abnormal behaviors.

Deep learning models in surveillance

Deep convolutional framework for abnormal behavior detection in a smart surveillance system [ 106 ] includes three sections.

Human subject detection and discrimination

A posture classification module

An abnormal behavior detection module

The models used for above three sections are, Correspondingly

You only look once (YOLO) network

Long short-term memory (LSTM)

For object discrimination Kalman filter based object entity discrimination algorithm is used. Posture classification study recognizes 10 types of poses. RNN uses back propagation through time (BPTT) to update weight.

The main issue identified in the method is that similar activities like pointing and punching are difficult to distinguish.

Detecting Anomalous events in videos by learning deep representations of appearance and motion [ 107 ] proposes a new model named as AMDN. The model automatically learns feature representations. The model uses stacked de-noising auto encoders for learning appearance and motion features separately and jointly. After learning, multiple one class SVM’s are trained. These SVM predict anomaly score of each input. Later these scores are combined and detect abnormal event. A double fusion framework is used. The computational overhead in testing time is too high for real time processing.

A study of deep convolutional auto encoders for anomaly detection in videos [ 12 ] proposes a structure that is a mixture of auto encoders and CNN. An auto encoder includes an encoder part and decoder part. The encoder part includes convolutional and pooling layers, the decoding part include de convolutional and unpool layers. The architecture allows a combination of low level frames withs high level appearance and motion features. Anomaly scores are represented through reconstruction errors.

Going deeper with convolutions [ 108 ] suggests improvements over traditional neural network. Fully connected layers are replaced by sparse ones by adding sparsity into architecture. The paper suggests for dimensionality reduction which help to reduce the increasing demand for computational resources. Computing reductions happens with 1 × 1 convolutions before reaching 5 × 5 convolutions. The method is not mentioning about the execution time. Along with that not able to make conclusion about the crowd size that the method can handle successfully.

Deep learning for visual understanding: a review [ 109 ], reviewing the fundamental models in deep learning. Models and technique described were CNN, RBM, Autoencoder and Sparse coding. The paper also mention the drawbacks of deep learning models such as people were not able to understand the underlying theory very well.

Deep learning methods other than the ones discussed above are listed in the following Table  9 .

The methods reviewed in above sections are good in automatic feature generation. All methods are good in handling individual entity and group entities with limited size.

Majority of problems in real world arises among crowd. Above mentioned methods are not effective in handling crowd scenes. Next section will review intelligent methods for analyzing crowd video scenes.

Review in the field of crowd analysis

The review include methods which are having deep learning background and methods which are not having that background.

Spatial temporal convolutional neural networks for anomaly detection and localization in crowded scenes [ 114 ] shows the problem related with crowd analysis is challenging because of the following reasons

Large number of pedestrians

Close proximity

Volatility of individual appearance

Frequent partial occlusions

Irregular motion pattern in crowd

Dangerous activities like crowd panic

Frame level and pixel level detection

The paper suggests optical flow based solution. The CNN is having eight layers. Training is based on BVLC caffe. Random initialization of parameters is done and system is trained through stochastic gradient descent based back propagation. The implementation part is done by considering four different datasets like UCSD, UMN, Subway and finally U-turn. The details of implementation regarding UCSD includes frame level and pixel level criterion. Frame level criterion concentrates on temporal domain and pixel level criterion considers both spatiial and temporal domain. Different metrics to evaluate performance includes EER (Equal Error Rate) and Detection Rate (DR).

Online real time crowd behavior detection in video sequences [ 115 ] suggests FSCB, behavior detection through feature tracking and image segmentation. The procedure involves following steps

Feature detection and temporal filtering

Image segmentation and blob extraction

Activity detection

Activity map

Activity analysis

The main advantage is no need of training stage for this method. The method is quantitatively analyzed through ROC curve generation. The computational speed is evaluated through frame rate. The data set considered for experiments include UMN, PETS2009, AGORASET and Rome Marathon.

Deep learning for scene independent crowd analysis [ 82 ] proposes a scene independent method which include following procedures

Crowd segmentation and detection

Crowd tracking

Crowd counting

Pedestrian travelling time estimation

Crowd attribute recognition

Crowd behavior analysis

Abnormality detection in a crowd

Attribute recognition is done thorugh a slicing CNN. By using a 2D CNN model learn appearance features then represent it as a cuboid. In the cuboid three temporal filters are identified. Then a classifier is applied on concatenated feature vector extracted from cuboid. Crowd counting and crowd density estimation is treated as a regression problem. Crowd attribute recognition is applied on WWW Crowd dataset. Evaluation metrics used are AUC and AP.

The analysis of High Density Crowds in videos [ 80 ] describes methods like data driven crowd analysis and density aware tracking. Data driven analysis learn crowd motion patterns from large collection of crowd videos through an off line manner. Learned pattern can be applied or transferred in applications. The solution includes a two step procedure. Global crowded scene matching and local crowd patch matching. Figure  2 illustrates the two step procedure.

figure 2

a Test video, b results of global matching, c a query crowd patch, d matching crowd patches [ 80 ]

The database selected for experimental evaluation includes 520 unique videos with 720 × 480 resolutions. The main evaluation is to track unusual and unexpected actions of individuals in a crowd. Through experiments it is proven that data driven tracking is better than batch mode tracking. Density based person detection and tracking include steps like baseline detector, geometric filtering and tracking using density aware detector.

A review on classifying abnormal behavior in crowd scene [ 77 ] mainly demonstrates four key approaches such as Hidden Markov Model (HMM), GMM, optical flow and STT. GMM itself is enhanced with different techniques to capture abnormal behaviours. The enhanced versions of GMM are

GMM and Markov random field

Gaussian poisson mixture model and

GMM and support vector machine

GMM architecture includes components like local descriptor, global descriptor, classifiers and finally a fusion strategy. The distinction between normal and and abnormal behaviour is evaluated based on Mahalanobis distance method. GMM–MRF model mainly divided into two sections where first section identifies motion pttern through GMM and crowd context modelling is done through MRF. GPMM adds one extra feture such as count of occurrence of observed behaviour. Also EM is used for training at later stage of GPMM. GMM–SVM incorporate features such as crowd collectiveness, crowd density, crowd conflict etc. for abnormality detection.

HMM has also variants like

HM and OSVMs

Hidden Markov Model is a density aware detection method used to detect motion based abnormality. The method generates foreground mask and perspective mask through ORB detector. GM-HMM involves four major steps. First step GMBM is used for identifying foreground pixels and further lead to development of blobs generation. In second stage PCA–HOG and motion HOG are used for feature extraction. The third stage applies k means clustering to separately cluster features generated through PCA–HOG and motion–HOG. In final stage HMM processes continuous information of moving target through the application of GM. In SLT-HMM short local trajectories are used along with HMM to achieve better localization of moving objects. MOHMM uses KLT in first phase to generate trajectories and clustering is applied on them. Second phase uses MOHMM to represent the trajectories to define usual and unusual frames. OSVM uses kernel functions to solve the nonlinearity problem by mapping high dimensional features in to a linear space by using kernel function.

In optical flow based method the enhancements made are categorized into following techniques such as HOFH, HOFME, HMOFP and MOFE.

In HOFH video frames are divided into several same size patches. Then optical flows are extracted. It is divided into eight directions. Then expectation and variance features are used to calculate optical flow between frames. HOFME descriptor is used at the final stage of abnormal behaviour detection. As the first step frame difference is calculated then extraction of optical flow pattern and finally spatio temporal description using HOFME is completed. HMOFP Extract optical flow from each frame and divided into patches. The optical flows are segmented into number of bins. Maximum amplitude flows are concatenated to form global HMOFP. MOFE method convert frames into blobs and optical flow in all the blobs are extracted. These optical flow are then clustered into different groups. In STT, crowd tracking and abnormal behaviour detection is done through combing spatial and temporal dimensions of features.

Crowd behaviour analysis from fixed and moving cameras [ 78 ] covers topics like microscopic and macroscopic crowd modeling, crowd behavior and crowd density analysis and datasets for crowd behavior analysis. Large crowds are handled through macroscopic approaches. Here agents are handled as a whole. In microscopic approaches agents are handled individually. Motion information to represent crowd can be collected through fixed and moving cameras. CNN based methods like end-to-end deep CNN, Hydra-CNN architecture, switching CNN, cascade CNN architecture, 3D CNN and spatio temporal CNN are discussed for crowd behaviour analysis. Different datasets useful specifically for crowd behaviour analysis are also described in the chapter. The metrics used are MOTA (multiple person tracker accuracy) and MOTP (multiple person tracker precision). These metrics consider multi target scenarios usually present in crowd scenes. The dataset used for experimental evaluation consists of UCSD, Violent-flows, CUHK, UCF50, Rodriguez’s, The mall and finally the worldExpo’s dataset.

Zero-shot crowd behavior recognition [ 79 ] suggests recognizers with no or little training data. The basic idea behind the approach is attribute-context cooccurrence. Prediction of behavioural attribute is done based on their relationship with known attributes. The method encompass different steps like probabilistic zero shot prediction. The method calculates the conditional probability of known to original appropriate attribute relation. The second step includes learning attribute relatedness from Text Corpora and Context learning from visual co-occurrence. Figure  3 shows the illustration of results.

figure 3

Demonstration of crowd videos ranked in accordance with prediction values [ 79 ]

Computer vision based crowd disaster avoidance system: a survey [ 81 ] covers different perspectives of crowd scene analysis such as number of cameras employed and target of interest. Along with that crowd behavior analysis, people count, crowd density estimation, person re identification, crowd evacuation, and forensic analysis on crowd disaster and computations on crowd analysis. A brief summary about benchmarked datasets are also given.

Fast Face Detection in Violent Video Scenes [ 83 ] suggests an architecture with three steps such as violent scene detector, a normalization algorithm and finally a face detector. ViF descriptor along with Horn–Schunck is used for violent scene detection, used as optical flow algorithm. Normalization procedure includes gamma intensity correction, difference Gauss, Local Histogram Coincidence and Local Normal Distribution. Face detection involve mainly two stages. First stage is segmenting regions of skin and the second stage check each component of face.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] provides a solution which consists of two phases. Feature extraction and anomaly classification. Feature extraction is based on flow. Different steps involved in the pipeline are input video is divided into frames, frames are divided into super pixels, extracting histogram for each super pixel, aggregating histograms spatially and finally concatenation of combined histograms from consecutive frames for taking out final feature. Anomaly can be detected through existing classification algorithms. The implementation is done through UCSD dataset. Two subsets with resolution 158 × 238 and 240 × 360 are present. The normal behavior was used to train k means and KUGDA. The normal and abnormal behavior is used to train linear SVM. The hardware part includes Artix 7 xc7a200t FPGA from Xilinx, Xilinx IST and XPower Analyzer.

Deep Metric Learning for Crowdedness Regression [ 84 ] includes deep network model where learning of features and distance measurements are done concurrently. Metric learning is used to study a fine distance measurement. The proposed model is implemented through Tensorflow package. Rectified linear unit is used as an activation function. The training method applied is gradient descent. Performance is evaluated through mean squared error and mean absolute error. The WorldExpo dataset and the Shanghai Tech dataset are used for experimental evaluation.

A Deep Spatiotemporal Perspective for Understanding Crowd Behavior [ 61 ] is a combination of convolution layer and long short-term memory. Spatial informations are captured through convolution layer and temporal motion dynamics are confined through LSTM. The method forecasts the pedestrian path, estimate the destination and finally categorize the behavior of individuals according to motion pattern. Path forecasting technique includes two stacked ConvLSTM layers by 128 hidden states. Kernel of ConvLSTM size is 3 × 3, with a stride of 1 and zeropadding. Model takes up a single convolution layer with a 1 × 1 kernel size. Crowd behavior classification is achieved through a combination of three layers namely an average spatial pooling layer, a fully connected layer and a softmax layer.

Crowded Scene Understanding by Deeply Learned Volumetric Slices [ 85 ] suggests a deep model and different fusion approaches. The architecture involves convolution layers, global sum pooling layer and fully connected layers. Slice fusion and weight sharing schemes are required by the architecture. A new multitask learning deep model is projected to equally study motion features and appearance features and successfully join them. A new concept of crowd motion channels are designed as input to the model. The motion channel analyzes the temporal progress of contents in crowd videos. The motion channels are stirred by temporal slices that clearly demonstrate the temporal growth of contents in crowd videos. In addition, we also conduct wide-ranging evaluations by multiple deep structures with various data fusion and weights sharing schemes to find out temporal features. The network is configured with convlutional layer, pooling layer and fully connected layer with activation functions such as rectified linear unit and sigmoid function. Three different kinds of slice fusion techniques are applied to measure the efficiency of proposed input channels.

Crowd Scene Understanding from Video A survey [ 86 ] mainly deals with crowd counting. Different approaches for crowd counting are categorized into six. Pixel level analysis, texture level analysis, object level analysis, line counting, density mapping and joint detection and counting. Edge features are analyzed through pixel level analysis. Image patches are analysed through texture level analysis. Object level analysis is more accurate compared to pixel and texture analysis. The method identifies individual subjects in a scene. Line counting is used to take the count of people crossed a particular line.

Table  10 will discuss some more crowd analysis methods.

Results observed from the survey and future directions

The accuracy analysis conducted for some of the above discussed methods based on various evaluation criteria like AUC, precision and recall are discussed below.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] compare different methods as shown in Fig.  4 . KUGDA is a classifier proposed in Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ].

figure 4

Comparing KUGDA with K-means [ 54 ]

Fast Face Detection in Violent Video Scenes [ 83 ] uses a ViF descriptor for violence scene detection. Figure 5 shows the evaluation of an SVM classifier using ROC curve.

figure 5

Receiver operating characteristics of a classifier with ViF descriptor [ 83 ]

Figure  6 represents a comparison of detection performance which is conducted by different methods [ 80 ]. The comparison shows the improvement of density aware detector over other methods.

figure 6

Comparing detection performance of density aware detector with different methods [ 80 ]

As an analysis of existing methods the following shortcomings were identified. Real world problems are having following objectives like

Time complexity

Bad weather conditions

Real world dynamics

Overlapping of objects

Existing methods were handling the problems separately. No method handles all the objectives as features in a single proposal.

To handle effective intelligent crowd video analysis in real time the method should be able to provide solutions to all these problems. Traditional methods are not able to generate efficient economic solution in a time bounded manner.

The availability of high performance computational resource like GPU allows implementation of deep learning based solutions for fast processing of big data. Existing deep learning architectures or models can be combined by including good features and removing unwanted features.

The paper reviews intelligent surveillance video analysis techniques. Reviewed papers cover wide variety of applications. The techniques, tools and dataset identified were listed in form of tables. Survey begins with video surveillance analysis in general perspective, and then finally moves towards crowd analysis. Crowd analysis is difficult in such a way that crowd size is large and dynamic in real world scenarios. Identifying each entity and their behavior is a difficult task. Methods analyzing crowd behavior were discussed. The issues identified in existing methods were listed as future directions to provide efficient solution.

Abbreviations

Surveillance Video Analysis System

Interval-Based Spatio-Temporal Model

Kanade–Lucas–Tomasi

Gaussian Mixture Model

Support Vector Machine

Deep activation-based attribute learning

Hidden Markov Model

You only look once

Long short-term memory

Area under the curve

Violent flow descriptor

Kardas K, Cicekli NK. SVAS: surveillance video analysis system. Expert Syst Appl. 2017;89:343–61.

Article   Google Scholar  

Wang Y, Shuai Y, Zhu Y, Zhang J. An P Jointly learning perceptually heterogeneous features for blind 3D video quality assessment. Neurocomputing. 2019;332:298–304 (ISSN 0925-2312) .

Tzelepis C, Galanopoulos D, Mezaris V, Patras I. Learning to detect video events from zero or very few video examples. Image Vis Comput. 2016;53:35–44 (ISSN 0262-8856) .

Fakhar B, Kanan HR, Behrad A. Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos. J Vis Commun Image Represent. 2018;55:489–503 (ISSN 1047-3203) .

Luo X, Li H, Cao D, Yu Y, Yang X, Huang T. Towards efficient and objective work sampling: recognizing workers’ activities in site surveillance videos with two-stream convolutional networks. Autom Constr. 2018;94:360–70 (ISSN 0926-5805) .

Wang D, Tang J, Zhu W, Li H, Xin J, He D. Dairy goat detection based on Faster R-CNN from surveillance video. Comput Electron Agric. 2018;154:443–9 (ISSN 0168-1699) .

Shao L, Cai Z, Liu L, Lu K. Performance evaluation of deep feature learning for RGB-D image/video classification. Inf Sci. 2017;385:266–83 (ISSN 0020-0255) .

Ahmed SA, Dogra DP, Kar S, Roy PP. Surveillance scene representation and trajectory abnormality detection using aggregation of multiple concepts. Expert Syst Appl. 2018;101:43–55 (ISSN 0957-4174) .

Arunnehru J, Chamundeeswari G, Prasanna Bharathi S. Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci. 2018;133:471–7 (ISSN 1877-0509) .

Guraya FF, Cheikh FA. Neural networks based visual attention model for surveillance videos. Neurocomputing. 2015;149(Part C):1348–59 (ISSN 0925-2312) .

Pathak AR, Pandey M, Rautaray S. Application of deep learning for object detection. Procedia Comput Sci. 2018;132:1706–17 (ISSN 1877-0509) .

Ribeiro M, Lazzaretti AE, Lopes HS. A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett. 2018;105:13–22.

Huang W, Ding H, Chen G. A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance. Signal Process. 2018;142:104–13 (ISSN 0165-1684) .

Tsakanikas V, Dagiuklas T. Video surveillance systems-current status and future trends. Comput Electr Eng. In press, corrected proof, Available online 14 November 2017.

Wang Y, Zhang D, Liu Y, Dai B, Lee LH. Enhancing transportation systems via deep learning: a survey. Transport Res Part C Emerg Technol. 2018. https://doi.org/10.1016/j.trc.2018.12.004 (ISSN 0968-090X) .

Huang H, Xu Y, Huang Y, Yang Q, Zhou Z. Pedestrian tracking by learning deep features. J Vis Commun Image Represent. 2018;57:172–5 (ISSN 1047-3203) .

Yuan Y, Zhao Y, Wang Q. Action recognition using spatial-optical data organization and sequential learning framework. Neurocomputing. 2018;315:221–33 (ISSN 0925-2312) .

Perez M, Avila S, Moreira D, Moraes D, Testoni V, Valle E, Goldenstein S, Rocha A. Video pornography detection through deep learning techniques and motion information. Neurocomputing. 2017;230:279–93 (ISSN 0925-2312) .

Pang S, del Coz JJ, Yu Z, Luaces O, Díez J. Deep learning to frame objects for visual target tracking. Eng Appl Artif Intell. 2017;65:406–20 (ISSN 0952-1976) .

Wei X, Du J, Liang M, Ye L. Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.12.002 .

Xu M, Fang H, Lv P, Cui L, Zhang S, Zhou B. D-stc: deep learning with spatio-temporal constraints for train drivers detection from videos. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.09.040 (ISSN 0167-8655) .

Hassan MM, Uddin MZ, Mohamed A, Almogren A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener Comput Syst. 2018;81:307–13 (ISSN 0167-739X) .

Wu G, Lu W, Gao G, Zhao C, Liu J. Regional deep learning model for visual tracking. Neurocomputing. 2016;175:310–23 (ISSN 0925-2312) .

Nasir M, Muhammad K, Lloret J, Sangaiah AK, Sajjad M. Fog computing enabled cost-effective distributed summarization of surveillance videos for smart cities. J Parallel Comput. 2018. https://doi.org/10.1016/j.jpdc.2018.11.004 (ISSN 0743-7315) .

Najva N, Bijoy KE. SIFT and tensor based object detection and classification in videos using deep neural networks. Procedia Comput Sci. 2016;93:351–8 (ISSN 1877-0509) .

Yu Z, Li T, Yu N, Pan Y, Chen H, Liu B. Reconstruction of hidden representation for Robust feature extraction. ACM Trans Intell Syst Technol. 2019;10(2):18.

Mammadli R, Wolf F, Jannesari A. The art of getting deep neural networks in shape. ACM Trans Archit Code Optim. 2019;15:62.

Zhou T, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: learning view synthesis using multiplane images. ACM Trans Graph. 2018;37:65

Google Scholar  

Fan Z, Song X, Xia T, Jiang R, Shibasaki R, Sakuramachi R. Online Deep Ensemble Learning for Predicting Citywide Human Mobility. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:105.

Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, Cohen-Or D. ALIGNet: partial-shape agnostic alignment via unsupervised learning. ACM Trans Graph. 2018;38:1.

Xu M, Qian F, Mei Q, Huang K, Liu X. DeepType: on-device deep learning for input personalization service with minimal privacy concern. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:197.

Potok TE, Schuman C, Young S, Patton R, Spedalieri F, Liu J, Yao KT, Rose G, Chakma G. A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. J Emerg Technol Comput Syst. 2018;14:19.

Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS. A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv. 2018;51:92.

Tian Y, Lee GH, He H, Hsu CY, Katabi D. RF-based fall monitoring using convolutional neural networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:137.

Roy P, Song SL, Krishnamoorthy S, Vishnu A, Sengupta D, Liu X. NUMA-Caffe: NUMA-aware deep learning neural networks. ACM Trans Archit Code Optim. 2018;15:24.

Lovering C, Lu A, Nguyen C, Nguyen H, Hurley D, Agu E. Fact or fiction. Proc ACM Hum-Comput Interact. 2018;2:111.

Ben-Hamu H, Maron H, Kezurer I, Avineri G, Lipman Y. Multi-chart generative surface modeling. ACM Trans Graph. 2018;37:215

Ge W, Gong B, Yu Y. Image super-resolution via deterministic-stochastic synthesis and local statistical rectification. ACM Trans Graph. 2018;37:260

Hedman P, Philip J, Price T, Frahm JM, Drettakis G, Brostow G. Deep blending for free-viewpoint image-based rendering. ACM Trans Graph. 2018;37:257

Sundararajan K, Woodard DL. Deep learning for biometrics: a survey. ACM Comput Surv. 2018;51:65.

Kim H, Kim T, Kim J, Kim JJ. Deep neural network optimized to resistive memory with nonlinear current–voltage characteristics. J Emerg Technol Comput Syst. 2018;14:15.

Wang C, Yang H, Bartz C, Meinel C. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimedia Comput Commun Appl. 2018;14:40.

Yao S, Zhao Y, Shao H, Zhang A, Zhang C, Li S, Abdelzaher T. RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;1:173.

Liu D, Cui W, Jin K, Guo Y, Qu H. DeepTracker: visualizing the training process of convolutional neural networks. ACM Trans Intell Syst Technol. 2018;10:6.

Yi L, Huang H, Liu D, Kalogerakis E, Su H, Guibas L. Deep part induction from articulated object pairs. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275027 .

Zhao N, Cao Y, Lau RW. What characterizes personalities of graphic designs? ACM Trans Graph. 2018;37:116.

Tan J, Wan X, Liu H, Xiao J. QuoteRec: toward quote recommendation for writing. ACM Trans Inf Syst. 2018;36:34.

Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X. Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst. 2018;37:5.

Yin K, Huang H, Cohen-Or D, Zhang H. P2P-NET: bidirectional point displacement net for shape transform. ACM Trans Graph. 2018;37:152.

Yao S, Zhao Y, Shao H, Zhang C, Zhang A, Hu S, Liu D, Liu S, Su L, Abdelzaher T. SenseGAN: enabling deep learning for internet of things with a semi-supervised framework. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:144.

Saito S, Hu L, Ma C, Ibayashi H, Luo L, Li H. 3D hair synthesis using volumetric variational autoencoders. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275019 .

Chen A, Wu M, Zhang Y, Li N, Lu J, Gao S, Yu J. Deep surface light fields. Proc ACM Comput Graph Interact Tech. 2018;1:14.

Chu W, Xue H, Yao C, Cai D. Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans Multimedia. 2019;21(1):246–55.

Khan MUK, Park H, Kyung C. Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur. 2019;14(2):541–56.

Tao D, Guo Y, Yu B, Pang J, Yu Z. Deep multi-view feature learning for person re-identification. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2657–66.

Zhang D, Wu W, Cheng H, Zhang R, Dong Z, Cai Z. Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2622–32.

Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans Image Process. 2018;27(10):4787–97. https://doi.org/10.1109/tip.2018.2845742 .

Article   MathSciNet   MATH   Google Scholar  

Li Y, Li X, Zhang Y, Liu M, Wang W. Anomalous sound detection using deep audio representation and a blstm network for audio surveillance of roads. IEEE Access. 2018;6:58043–55.

Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW. Convolutional neural networks based fire detection in surveillance videos. IEEE Access. 2018;6:18174–83.

Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access. 2018;6:1155–66.

Li Y. A deep spatiotemporal perspective for understanding crowd behavior. IEEE Trans Multimedia. 2018;20(12):3289–97.

Pamula T. Road traffic conditions classification based on multilevel filtering of image content using convolutional neural networks. IEEE Intell Transp Syst Mag. 2018;10(3):11–21.

Vandersmissen B, et al. indoor person identification using a low-power FMCW radar. IEEE Trans Geosci Remote Sens. 2018;56(7):3941–52.

Min W, Yao L, Lin Z, Liu L. Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle. IET Comput Vision. 2018;12(8):1133–40.

Perwaiz N, Fraz MM, Shahzad M. Person re-identification using hybrid representation reinforced by metric learning. IEEE Access. 2018;6:77334–49.

Olague G, Hernández DE, Clemente E, Chan-Ley M. Evolving head tracking routines with brain programming. IEEE Access. 2018;6:26254–70.

Dilawari A, Khan MUG, Farooq A, Rehman Z, Rho S, Mehmood I. Natural language description of video streams using task-specific feature encoding. IEEE Access. 2018;6:16639–45.

Zeng D, Zhu M. Background subtraction using multiscale fully convolutional network. IEEE Access. 2018;6:16010–21.

Goswami G, Vatsa M, Singh R. Face verification via learned representation on feature-rich video frames. IEEE Trans Inf Forensics Secur. 2017;12(7):1686–98.

Keçeli AS, Kaya A. Violent activity detection with transfer learning method. Electron Lett. 2017;53(15):1047–8.

Lu W, et al. Unsupervised sequential outlier detection with deep architectures. IEEE Trans Image Process. 2017;26(9):4321–30.

Feizi A. High-level feature extraction for classification and person re-identification. IEEE Sens J. 2017;17(21):7064–73.

Lee Y, Chen S, Hwang J, Hung Y. An ensemble of invariant features for person reidentification. IEEE Trans Circuits Syst Video Technol. 2017;27(3):470–83.

Uddin MZ, Khaksar W, Torresen J. Facial expression recognition using salient features and convolutional neural network. IEEE Access. 2017;5:26146–61.

Mukherjee SS, Robertson NM. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimedia. 2015;17(11):2094–107.

Hayat M, Bennamoun M, An S. Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell. 2015;37(4):713–27.

Afiq AA, Zakariya MA, Saad MN, Nurfarzana AA, Khir MHM, Fadzil AF, Jale A, Gunawan W, Izuddin ZAA, Faizari M. A review on classifying abnormal behavior in crowd scene. J Vis Commun Image Represent. 2019;58:285–303.

Bour P, Cribelier E, Argyriou V. Chapter 14—Crowd behavior analysis from fixed and moving cameras. In: Computer vision and pattern recognition, multimodal behavior analysis in the wild. Cambridge: Academic Press; 2019. pp. 289–322.

Chapter   Google Scholar  

Xu X, Gong S, Hospedales TM. Chapter 15—Zero-shot crowd behavior recognition. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017:341–369.

Rodriguez M, Sivic J, Laptev I. Chapter 5—The analysis of high density crowds in videos. In: Group and crowd behavior for computer vision. Cambridge: Academic Press. 2017. pp. 89–113.

Yogameena B, Nagananthini C. Computer vision based crowd disaster avoidance system: a survey. Int J Disaster Risk Reduct. 2017;22:95–129.

Wang X, Loy CC. Chapter 10—Deep learning for scene-independent crowd analysis. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017. pp. 209–52.

Arceda VM, Fabián KF, Laura PL, Tito JR, Cáceres JG. Fast face detection in violent video scenes. Electron Notes Theor Comput Sci. 2016;329:5–26.

Wang Q, Wan J, Yuan Y. Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2633–43.

Shao J, Loy CC, Kang K, Wang X. Crowded scene understanding by deeply learned volumetric slices. IEEE Trans Circuits Syst Video Technol. 2017;27(3):613–23.

Grant JM, Flynn PJ. Crowd scene understanding from video: a survey. ACM Trans Multimedia Comput Commun Appl. 2017;13(2):19.

Tay L, Jebb AT, Woo SE. Video capture of human behaviors: toward a Big Data approach. Curr Opin Behav Sci. 2017;18:17–22 (ISSN 2352-1546) .

Chaudhary S, Khan MA, Bhatnagar C. Multiple anomalous activity detection in videos. Procedia Comput Sci. 2018;125:336–45.

Anwar F, Petrounias I, Morris T, Kodogiannis V. Mining anomalous events against frequent sequences in surveillance videos from commercial environments. Expert Syst Appl. 2012;39(4):4511–31.

Wang T, Qiao M, Chen Y, Chen J, Snoussi H. Video feature descriptor combining motion and appearance cues with length-invariant characteristics. Optik. 2018;157:1143–54.

Kaltsa V, Briassouli A, Kompatsiaris I, Strintzis MG. Multiple Hierarchical Dirichlet Processes for anomaly detection in traffic. Comput Vis Image Underst. 2018;169:28–39.

Cermeño E, Pérez A, Sigüenza JA. Intelligent video surveillance beyond robust background modeling. Expert Syst Appl. 2018;91:138–49.

Coşar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Brémond F. Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circuits Syst Video Technol. 2017;27(3):683–95.

Ribeiro PC, Audigier R, Pham QC. Romaric Audigier, Quoc Cuong Pham, RIMOC, a feature to discriminate unstructured motions: application to violence detection for video-surveillance. Comput Vis Image Underst. 2016;144:121–43.

Şaykol E, Güdükbay U, Ulusoy Ö. Scenario-based query processing for video-surveillance archives. Eng Appl Artif Intell. 2010;23(3):331–45.

Castanon G, Jodoin PM, Saligrama V, Caron A. Activity retrieval in large surveillance videos. In: Academic Press library in signal processing. Vol. 4. London: Elsevier; 2014.

Cheng HY, Hwang JN. Integrated video object tracking with applications in trajectory-based event detection. J Vis Commun Image Represent. 2011;22(7):673–85.

Hong X, Huang Y, Ma W, Varadarajan S, Miller P, Liu W, Romero MJ, del Rincon JM, Zhou H. Evidential event inference in transport video surveillance. Comput Vis Image Underst. 2016;144:276–97.

Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H. Abnormal event detection based on analysis of movement information of video sequence. Optik. 2018;152:50–60.

Ullah H, Altamimi AB, Uzair M, Ullah M. Anomalous entities detection and localization in pedestrian flows. Neurocomputing. 2018;290:74–86.

Roy D, Mohan CK. Snatch theft detection in unconstrained surveillance videos using action attribute modelling. Pattern Recogn Lett. 2018;108:56–61.

Lee WK, Leong CF, Lai WK, Leow LK, Yap TH. ArchCam: real time expert system for suspicious behaviour detection in ATM site. Expert Syst Appl. 2018;109:12–24.

Dinesh Jackson Samuel R, Fenil E, Manogaran G, Vivekananda GN, Thanjaivadivel T, Jeeva S, Ahilan A. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput Netw. 2019;151:191–200 (ISSN 1389-1286) .

Bouachir W, Gouiaa R, Li B, Noumeir R. Intelligent video surveillance for real-time detection of suicide attempts. Pattern Recogn Lett. 2018;110:1–7 (ISSN 0167-8655) .

Wang J, Xu Z. Spatio-temporal texture modelling for real-time crowd anomaly detection. Comput Vis Image Underst. 2016;144:177–87 (ISSN 1077-3142) .

Ko KE, Sim KB. Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng Appl Artif Intell. 2018;67:226–34.

Dan X, Yan Y, Ricci E, Sebe N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst. 2017;156:117–27.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015.

Guo Y, Liu Y, Oerlemans A, Lao S, Lew MS. Deep learning for visual understanding: a review. Neurocomputing. 2016;187(26):27–48.

Babaee M, Dinh DT, Rigoll G. A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 2018;76:635–49.

Xue H, Liu Y, Cai D, He X. Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing. 2016;204:70–6.

Dong Z, Jing C, Pei M, Jia Y. Deep CNN based binary hash video representations for face retrieval. Pattern Recogn. 2018;81:357–69.

Zhang C, Tian Y, Guo X, Liu J. DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput Vis Image Underst. 2018;167:37–49.

Zhou S, Shen W, Zeng D, Fang M, Zhang Z. Spatial–temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun. 2016;47:358–68.

Pennisi A, Bloisi DD, Iocchi L. Online real-time crowd behavior detection in video sequences. Comput Vis Image Underst. 2016;144:166–76.

Feliciani C, Nishinari K. Measurement of congestion and intrinsic risk in pedestrian crowds. Transp Res Part C Emerg Technol. 2018;91:124–55.

Wang X, He X, Wu X, Xie C, Li Y. A classification method based on streak flow for abnormal crowd behaviors. Optik Int J Light Electron Optics. 2016;127(4):2386–92.

Kumar S, Datta D, Singh SK, Sangaiah AK. An intelligent decision computing paradigm for crowd monitoring in the smart city. J Parallel Distrib Comput. 2018;118(2):344–58.

Feng Y, Yuan Y, Lu X. Learning deep event models for crowd anomaly detection. Neurocomputing. 2017;219:548–56.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

VIT, Vellore, 632014, Tamil Nadu, India

G. Sreenu & M. A. Saleem Durai

You can also search for this author in PubMed   Google Scholar

Contributions

GS and MASD selected and analyzed different papers for getting more in depth view about current scenarios of the problem and its solutions. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to G. Sreenu .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Sreenu, G., Saleem Durai, M.A. Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6 , 48 (2019). https://doi.org/10.1186/s40537-019-0212-5

Download citation

Received : 07 December 2018

Accepted : 28 May 2019

Published : 06 June 2019

DOI : https://doi.org/10.1186/s40537-019-0212-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Video surveillance
  • Deep learning
  • Crowd analysis

literature review on video analysis

  • DOI: 10.1007/978-3-642-41512-8_2
  • Corpus ID: 54706173

A Literature Review on Video Analytics of Crowded Scenes

  • Myo Thida , Y. Yong , +2 authors P. Remagnino
  • Published in Intelligent Multimedia… 2013
  • Computer Science, Environmental Science

57 Citations

Crowded scene analysis: a survey, integrating computer vision algorithms and ontologies for spectator crowd behavior analysis, smart video surveillance of pedestrians: fixed, aerial, and multi-camera methods, recent trends in crowd analysis: a review, computer vision based crowd disaster avoidance system: a survey, the utility of datasets in crowd modelling and analysis: a survey, abnormal event detection in crowded video scenes, physics inspired methods for crowd video surveillance and analysis: a survey, low latency tracking and anomaly detection in pedestrian crowds from video data.

  • Highly Influenced

Crowd Gathering Detection Based on the Foreground Stillness Model

95 references, a review of physics-based methods for group and crowd analysis in computer vision, activity representation in crowd, a multiview approach to tracking people in crowded scenes using a planar homography constraint, tracking in unstructured crowded scenes, new features and insights for pedestrian detection, learning video manifold for segmenting crowd events and abnormality detection, improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture, a survey on visual surveillance of object motion and behaviors, spatio-temporal motion pattern modeling of extremely crowded scenes, multi person tracking within crowded scenes, related papers.

Showing 1 through 3 of 0 Related Papers

University of Houston Libraries

What is a literature review.

Brought to you by the University of Houston Libraries.

As part of your dissertation, thesis, or research paper, you may be asked to include a “review of the literature” or “literature review.” You may even be asked to write a literature review as a standalone assignment. But what exactly does that mean? To answer that question, let’s first talk about what we mean by “The literature.”

“The literature” refers to a published collection of written knowledge on or related to a particular subject. This may include things like scholarly articles, books, reports, or other types of written works. Their format all depends on the topic of your paper, dissertation, or thesis.

A literature review is not just a summary of these writings; it’s also a critical analysis of the state of research on your chosen topic. A good literature review provides context for your own research. It summarizes the state of existing research on your topic; helps identify gaps in the literature; provides a theoretical foundation for your research; and situates your own work within the existing body of written knowledge. It helps readers and other scholars understand why your research matters. To create a good literature review, you’ll need to determine its scope. Consider the concepts, theories, and studies you’ll want to include. You can search different library and non-library resources to find written works to include in your literature review, focusing on the key concepts, theories, and authors important to your research.

As you research, you’ll eventually reach a point where you start seeing the same articles, books, or other sources showing up again and again in your search results or in the works cited sections of sources you read. This is usually a good sign! It means you’ve likely reached a point where you can stop actively searching for new material and start constructing your review.

You’ll need to read and evaluate all of your sources to determine whether or not to include them in your literature review. When including them, think beyond just summarizing to synthesizing what you’ve learned. How do these sources fit together? Do any contradict one another? Do certain studies build on others? How are they setting a path forward for your own research?

Considering all of these questions will help you create a literature review that situates your own research within the scholarly conversation of your subject area.

Remember that help is always available for you as you work on your literature review. You can contact the UH Libraries for research help or the UH Writing Center for writing help

  • What is a Literature Review transcript

A Literature Review of Video-Sharing Platform Research in HCI

literature review on video analysis

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, supplementary material.

  • Leonindhira A Zefanya A Tita D Adjisani K Zuhri S (2024) Interaksi Lobi dan Negosiasi dalam Pemasaran Live Streaming Media Sosial Tiktok: Studi Kasus Keputusan Pembelian Parfum @pucelleid Journal of Internet and Software Engineering 10.47134/pjise.v1i3.2579 1 :3 (10) Online publication date: 2-Jun-2024 https://doi.org/10.47134/pjise.v1i3.2579
  • Vaiani L Cagliero L Garza P (2024) Emotion Recognition from Videos Using Multimodal Large Language Models Future Internet 10.3390/fi16070247 16 :7 (247) Online publication date: 13-Jul-2024 https://doi.org/10.3390/fi16070247
  • Yu X Hoggenmüller M Tran T Wang Y Tomitsch M (2024) Understanding the Interaction between Delivery Robots and Other Road and Sidewalk Users: A Study of User-generated Online Videos ACM Transactions on Human-Robot Interaction 10.1145/3677615 Online publication date: 17-Jul-2024 https://doi.org/10.1145/3677615
  • Show More Cited By

Index Terms

Human-centered computing

Collaborative and social computing

Collaborative and social computing theory, concepts and paradigms

Social media

Human computer interaction (HCI)

HCI theory, concepts and models

Recommendations

Literature reviews in hci: a review of reviews.

This paper analyses Human-Computer Interaction (HCI) literature reviews to provide a clear conceptual basis for authors, reviewers, and readers. HCI is multidisciplinary and various types of literature reviews exist, from systematic to critical reviews ...

Building Credibility, Trust, and Safety on Video-Sharing Platforms

Video-sharing platforms (VSPs) such as YouTube, TikTok, and Twitch attract millions of users and have become influential information sources, especially among the young generation. Video creators and live streamers make videos to engage viewers and form ...

Social Media as a Design and Research Site in HCI: Mapping Out Opportunities and Envisioning Future Uses

In this workshop, we will explore the emergent methodological space of social media based HCI design and research. We will gather scholars and practitioners from different areas within HCI to discuss how social media platforms might support their ...

Information

Published in.

cover image ACM Conferences

LMU Munich, Germany60028717

Author Picture

Tampere University, Finland60011170

Google Research, USA60006191

Author Picture

University of Cambridge, UK60031101

University of Namibia, Namibia60072704

Author Picture

Massachusetts Institute of Technology, USA60022195

Author Picture

University of Glasgow, UK60001490

Author Picture

University of Nottingham, UK60015138

  • SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • literature review
  • video-sharing
  • Research-article
  • Refereed limited

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 13 Total Citations View Citations
  • 1,378 Total Downloads
  • Downloads (Last 12 months) 1,046
  • Downloads (Last 6 weeks) 124
  • Liu J Zhang Y (2024) Modeling Health Video Consumption Behaviors on Social Media: Activities, Challenges, and Characteristics Proceedings of the ACM on Human-Computer Interaction 10.1145/3653699 8 :CSCW1 (1-28) Online publication date: 26-Apr-2024 https://dl.acm.org/doi/10.1145/3653699
  • Chen Q Gan J Bellucci A Jacucci G (2024) "I Felt Everyone Was a Streamer": An Empirical Study on What Makes Avatar Collective Streaming Engaging Proceedings of the ACM on Human-Computer Interaction 10.1145/3637344 8 :CSCW1 (1-25) Online publication date: 26-Apr-2024 https://dl.acm.org/doi/10.1145/3637344
  • Lyu Y Carroll J (2024) "Because Some Sighted People, They Don't Know What the Heck You're Talking About:" A Study of Blind Tokers' Infrastructuring Work to Build Independence Proceedings of the ACM on Human-Computer Interaction 10.1145/3637297 8 :CSCW1 (1-30) Online publication date: 26-Apr-2024 https://dl.acm.org/doi/10.1145/3637297
  • Lyu Y Zhang H Niu S Cai J (2024) A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems 10.1145/3613905.3651057 (1-7) Online publication date: 11-May-2024 https://dl.acm.org/doi/10.1145/3613905.3651057
  • Hua Y Niu S Cai J Chilton L Heuer H Wohn D (2024) Generative AI in User-Generated Content Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems 10.1145/3613905.3636315 (1-7) Online publication date: 11-May-2024 https://dl.acm.org/doi/10.1145/3613905.3636315
  • Cai J Lin Y Zhang H Carroll J (2024) Third-Party Developers and Tool Development For Community Management on Live Streaming Platform Twitch Proceedings of the CHI Conference on Human Factors in Computing Systems 10.1145/3613904.3642787 (1-18) Online publication date: 11-May-2024 https://dl.acm.org/doi/10.1145/3613904.3642787
  • Hartwig K Biselli T Schneider F Reuter C (2024) From Adolescents' Eyes: Assessing an Indicator-Based Intervention to Combat Misinformation on TikTok Proceedings of the CHI Conference on Human Factors in Computing Systems 10.1145/3613904.3642264 (1-20) Online publication date: 11-May-2024 https://dl.acm.org/doi/10.1145/3613904.3642264

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

View this article in Full Text.

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

  • Open access
  • Published: 06 August 2024

An evaluation of education videos for women experiencing domestic and family violence in healthcare settings: protocol for a mixed methods systematic review

  • Kerri Gillespie 1 ,
  • Sam Adhikary 2 ,
  • Hayley Kimball 1 &
  • Grace Branjerdporn   ORCID: orcid.org/0000-0001-6578-2718 2  

Systematic Reviews volume  13 , Article number:  213 ( 2024 ) Cite this article

88 Accesses

Metrics details

Domestic and family violence (DFV) is a significant public health issue that poses a high risk to women, globally. Women experiencing DFV have higher rates of healthcare utilisation than women not experiencing DFV. Healthcare services are therefore well placed to address DFV and deliver education and awareness interventions to women. Video interventions are a strategy to deliver education to women, while overcoming barriers such as language, literacy, lack of rapport with clinician, or unwillingness to disclose. The current review will aim to further understand the characteristics, methods of evaluation, and outcomes of DFV video education interventions for perinatal women.

The review will be reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) statement. A systematic search will be conducted of the following databases: Medline, Embase, PsycINFO, PsycArticles, Scopus, and Web of Science Core Collection. Two independent reviewers will screen titles and abstracts against the inclusion criteria, followed by a full text screening of eligible articles. A third reviewer will resolve discrepancies. All study types will be included. Only studies published in English will be included. Risk of bias will be assessed using the Quality Assessment with Diverse Studies (QuADS) tool. Data will undergo an aggregate mixed method synthesis informed by The Joanna Briggs Institute, before being analysed using a thematic approach.

This systematic review will provide evidence on best practice for the creation, delivery, and evaluation of DFV video interventions for women in the peripartum.

Systematic review registration

PROSPERO registration number CRD42023475338.

Peer Review reports

Domestic and family violence (DFV) against women is considered a significant public health concern that affects around one in three women globally [ 1 ] and is the leading cause of hospitalisations for women and girls aged 15–54 years in Australia [ 2 ]. The risk of DFV is disproportionately higher in the perinatal period, with 25% of women who experience DFV reporting it to have started during pregnancy [ 3 , 4 ]. The impacts on women and children experiencing or witnessing DFV can be long-lasting and substantial. DFV can increase the risk of pregnancy complications (including miscarriage, stillbirth, pre-eclampsia, premature birth or low birth weight infants), chronic pain, gastrointestinal disorders, cognitive impairment, and mobility issues [ 5 , 6 , 7 ]. It has also been associated with an increase in long-term mental health issues in both women and children (such as anxiety and depression, post-traumatic stress disorder, eating disorders, suicide attempts, and substance abuse) [ 1 , 6 , 8 ].

Healthcare services, such as emergency departments, mental health services, specialty services, and outpatient care, are used more frequently by women suffering from DFV than those who are not [ 9 ]. These services are frequently encouraged to screen and manage DFV in patients and are well-placed to identify, and provide assistance, to women experiencing DFV. Maternity services have also been identified as having an important role in addressing DFV, as women in the perinatal period have regular appointments with these services, and.

there is an increased likelihood of continuity of care [ 10 ]. Many women choose not to disclose DFV for a number of reasons, including shame, guilt, denial, fear of perpetrator, lack of trust in clinicians, and fear of child service involvement [ 11 , 12 , 13 ]. Other barriers to disclosure or identification prevalent within healthcare settings have been identified as clinician time limitations, language barriers, lack of training, and lack of continuity of care [ 13 , 14 , 15 ]. While a number of strategies, such as routine enquiry, have been embedded within many healthcare facilities to better identify and support women experiencing DFV [ 16 ], a large proportion have still not been addressed. Previous research has revealed that even when women disclose, many clinicians are unsure how to support or refer women appropriately [ 15 , 17 ].

One strategy to target women without relying on clinician screening or disclosure by women is to deliver information to women via alternative methods such as videos. Education and awareness videos can be on display in numerous areas that are frequented by women who may be experiencing DFV. These videos can be presented in numerous languages, overcoming the barrier of requiring interpreters, and with captions for the hearing impaired. Video and audio presentations allow for engagement with women with low literacy skills. They can educate the public using easy-to-understand examples and situations with animations, real-life actors, or images, and display information regarding available support services. This form of education can reach a large audience without making women feel targeted or put on the spot by clinicians during the screening process.

Videos can be included in interventions, shown directly to women on smart devices in the clinic, or emailed to women who attend maternity services. Videos have the benefit of overcoming a number of barriers, such as language, clinician time, training, knowledge, and may be preferable for women who have not developed a rapport with their healthcare provider. If women do not wish to approach clinicians for support, videos can be used to provide useful education to women regarding what constitutes DFV, their rights, and their options for support or further information. It is anticipated that repeated exposure to awareness and education campaigns will impact women’s attitudes and knowledge of DFV, assisting them to move from the precontemplation or contemplation phase of behaviour, to contemplation or action [ 18 ]. Videos can increase women’s knowledge of the laws around DFV, the services available, and may lead to increased help-seeking in the future.

Research aims

The aim of this systematic review is to synthesise all evidence relating to video education for DFV used in healthcare settings. This will be done in order to better understand the characteristics of education videos, how they are being disseminated, how they are being evaluated, and the outcomes of these interventions.

Research questions

The primary research questions for this review are as follows:

What literature exists on the creation, delivery, and evaluation of video education interventions for women experiencing DFV?

What are the characteristics of these interventions?

How do these studies evaluate the benefits or acceptability of these interventions?

What are the outcomes for women who have been exposed to video education interventions?

Study design

This protocol was registered in the International prospective register of systematic reviews (PROSPERO) database with the registration number CRD42023475338. The review will be reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) statement (see Additional file 1) [ 19 ].

Eligibility criteria

Types of studies.

This review will include all peer reviewed publications that include primary data. This review will include all quantitative (such as case–control, cross-sectional, cohort, randomised control trials and quasi-experimental) and qualitative studies (such as focus groups or individual interviews). Studies that do not include primary data (reviews, opinion and commentary papers, dissertations, posters, and conference abstracts) will be excluded. No date or location restrictions will be placed on the search. Articles published in languages other than English will be excluded.

Participants

Studies must include women attending a hospital or community health service who may be experiencing DFV or may be at risk of experiencing DFV. No age restriction will apply.

  • Intervention

Included studies must utilise a video intervention or recording that aims to increase women’s knowledge, awareness, or help-seeking relating to DFV. Studies must include a measure of impact of the intervention on participants, or feedback from women regarding the usefulness, benefits, and/or acceptability of the intervention.

Studies may include interventions to usual care, placebo, or an alternative intervention. Studies may also be conducted with no control or comparator group.

The review will report on the prevalence and characteristics of video interventions for women experiencing DFV, and the characteristics of tools or measures used to evaluate these video interventions. The review will evaluate included studies for impacts of video interventions on women’s knowledge and awareness of DFV, available DFV services, and women’s help-seeking behaviours. The review will also collate and report on participant opinions, feedback, and suggestions regarding video interventions.

Information sources

Databases to be searched will include Medline (PubMed), Embase (Elsevier), PsycINFO (EBSCOhost), PsycArticles (EBSCOhost), Scopus (Elsevier), and Web of Science Core Collection (Clarivate). The reference lists of all included papers will also be searched. As will the reference lists of other similar, completed systematic reviews to ensure that no existing papers are overlooked.

Search strategy

The primary search strategy, using title, abstract, and keywords will be [(Video* OR Video OR recording OR videotape OR “Videotape recording”) AND (“Domestic violence” OR “intimate partner violence” OR “family violence” OR DFV). Medical subject terms (MeSH headings) will be used where appropriate, and the primary search strategy will be modified to meet the specific requirements of the search syntax in each database (see Additional file 2 for full search criteria for individual databases).

Study selection

The screening process will be conducted in two stages. In the first stage, included studies will be imported into the Covidence [ 20 ] online web application for screening and removal of duplicates. Two independent reviewers will screen all papers by title and abstract against the pre-selected inclusion and exclusion criteria. Studies that meet all criteria will be included into the second stage. The second stage will involve full text screening by two independent reviewers to decide whether studies will be included in the final review. Any discrepancies between the two reviewers at either screening stage will be resolved by a third reviewer. No prioritization techniques will be included in the screening of articles.

Risk of bias assessment

All studies that are included in the final review will be assessed for quality using the Quality Assessment with Diverse Studies (QuADS) tool. This tool was chosen for its demonstrated inter-rater reliability ( k  = 0.66) and its ability to assess both qualitative and quantitative studies [ 21 ]. In the event that only qualitative studies are identified in the final review, the Joanna Briggs Institute (JBI) critical appraisal tool for qualitative research [ 22 ] will be used to assess risk of bias.

Data extraction

Two independent reviewers will extract data from study included in the final review. Once extraction has concluded, all data will be compared and contrasted, with a third reviewer resolving any conflict should reviewers disagree on any extracted findings. Data will be extracted based on pre-defined criteria recorded in a working spreadsheet. Where data is missing from an evidence source, authors of the articles will be contacted with a request for these missing data. Data to be extracted from the identified papers will include general characteristics of the study (year, location, sample size, follow-up, and duration); characteristics of participants (age, DFV status, ethnicity, number of children, gestational age), setting (inpatient, residential, or community), characteristics of the intervention (video subject matter and objectives, length, location and format of screening), evaluation (method of evaluation and tools used), and outcomes (changes in participant knowledge, awareness, or help seeking behaviours, participant feedback, participant acceptability or satisfaction, follow-up duration, and attrition).

Data synthesis

Data synthesis for this review will be informed by The Joanna Briggs Institute ‘aggregate mixed method synthesis’, which is based upon the Bayesian approach for translating quantitative data into qualitative [ 23 ]. A convergent segregated method will be utilised as we anticipate that qualitative and quantitative data will address different, but related, dimensions of the phenomenon of interest [ 24 ]. This approach will ensure a simplified method of combining data without distorting the findings of the individual studies. The systematic literature review will use a thematic approach designed by Braun and Clarke [ 25 ] for qualitative data analysis, as thematic approach organises data according to themes and is comparatively more successful in revealing commonality in literature. Hence, common themes will be identified and highlighted as results and discussion will be made. The existing gap in literature will be identified and highlighted. We do not anticipate that the review will identify a large number of high-quality, or homogenous studies. We therefore do not plan to conduct any meta-analyses.

To our knowledge, this will be the first systematic review to synthesise all available data relating to video education interventions for DFV. In a world of rapidly improving technologies, video education interventions have become cheaper and simpler to create and disseminate. Whether these are delivered on screens in clinic waiting rooms, via email, smart-devices, or social media, it is inevitable that technology-based education will predominate. Understanding the outcomes of these forms of education interventions will add to the current knowledge regarding the most appropriate tools and techniques to implement for supporting women experiencing DFV. Video interventions may play an important role alongside other existing strategies, such as routine screening, pamphlets and posters, and DFV liaison specialists. It is important to understand how these interventions may benefit women and how best to evaluate these tools. Our review will deliver important knowledge regarding the evaluation of these interventions, barriers and enablers to delivery, optimal characteristics, and women’s opinions and feedback to ensure that they are appropriate and acceptable. The review will also compare the findings with relevant studies to form a comprehensive overview of video interventions to support screening and response to DFV.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files.

Abbreviations

  • Domestic and family violence

Intimate partner violence

The Joanna Briggs Institute

Quality Assessment with Diverse Studies tool

Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols

World Health Organisation [WHO]. Violence against women Geneva: WHO; 2021. Available from: https://www.who.int/news-room/fact-sheets/detail/violence-against-women .

Emergency medicine foundation. Domestic and family violence screening in the emergency department: Emergency medicine foundation. 2020. Available from: https://emergencyfoundation.org.au/projects/domestic-and-family-violence-screening-in-the-emergency-department/ .

Campo M. Domestic and family violence in pregnancy and early parenthood. Canberra: Australian Government; 2015.

Google Scholar  

O’Reilly R, Peters K. Opportunistic domestic violence screening for pregnant and post-partum women by community based health care providers. BMC Womens Health. 2018;18:1.

Article   Google Scholar  

Karakurt G, Smith D, Whiting J. Impact of intimate partner violence on women’s mental health. J Fam Violence. 2014;29:7.

Shah PS, Shah J. Maternal exposure to domestic violence and pregnancy and birth outcomes: a systematic review and meta-analyses. J Womens Health. 2010;19:11.

Branjerdporn G, Clonan T, Boddy J, Gillespie K, O’Malley R, Baird K. Australian women’s perspectives of routine enquiry into domestic violence before and after birth. BMC Pregnancy Childbirth. 2023;23:1.

Mueller I, Tronick E. Early life exposure to violence: developmental consequences on brain and behavior. Front Behav Neurosc. 2019;13:156.

Bonomi AE, Anderson ML, Rivara FP, Thompson RS. Health care utilization and costs associated with physical and nonphysical-only intimate partner violence. Health Serv Res. 2009;44:3.

Australian Institute of Health and Welfare [AIHW]. Screening for domestic violence during pregnancy: options for future reporting in the National Perinatal Data Collection. Canberra: AIHW; 2015. Contract No.: PER 71. Available from: https://www.aihw.gov.au/getmedia/62dfd6f0-a69a-4806-bf13-bf86a3c99583/19298.pdf?v=20230605182623&inline=true .

Prosman GJ, Lo Fo Wong SH, Lagro-Janssen AL. Why abused women do not seek professional help: a qualitative study. Scand J Caring Sci. 2014;28:1.

Evans MA, Feder GS. Help-seeking amongst women survivors of domestic violence: a qualitative study of pathways towards formal and informal support. Health Expect. 2016;19:1.

Heron RL, Eisma MC. Barriers and facilitators of disclosing domestic violence to the healthcare service: a systematic review of qualitative research. Health Soc Care Community. 2021;29:3.

Creedy DK, Baird K, Gillespie K, Branjerdporn G. Australian hospital staff perceptions of barriers and enablers of domestic and family violence screening and response. BMC Health Serv Res. 2021;21:1.

Fisher CA, Rudkin N, Withiel TD, May A, Barson E, Allen B, et al. Assisting patients experiencing family violence: a survey of training levels, perceived knowledge, and confidence of clinical staff in a large metropolitan hospital. Womens Health (Lond). 2020;16:1.

Phillips J, Muller D, Lorimer C. Domestic violence: issues and policy changes. Canberra: Parliamentary Library (Australia); 2015.

Gillespie K, Branjerdporn G, Tighe K, Carrasco A, Baird K. Domestic violence screening in a public mental health service: a qualitative examination of mental health clinician responses to DFV. J Psychiatr Ment Health Nurs. 2023;30:3.

Chang JC, Dado D, Ashton S, Hawker L, Cluss PA, Buranosky R, et al. Understanding behavior change for women experiencing intimate partner violence: mapping the ups and downs using the stages of change. Patient Educ Couns. 2006;62:3.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P), et al. elaboration and explanation. BMJ. 2015;2015:350.

Veritas Health Innovation. Covidence systematic review software. Melbourne, Australia; Veritas Health Innovation: 2019. Available from: http://www.covidence.org/ .

Harrison R, Jones B, Gardner P, Lawton R. Quality assessment with diverse studies (QuADS): an appraisal tool for methodological and reporting quality in systematic reviews of mixed- or multi-method studies. BMC Health Serv Res. 2021;21:1.

The Joanna Briggs Institute [JBI]. Critical appraisal checklist for qualitative research: JBI; 2017. Available from: https://jbi.global/sites/default/files/2019-05/JBI_Critical_Appraisal-Checklist_for_Qualitative_Research2017_0.pdf .

Lizarondo L, Stern C, Carrier J, Godfre yC, Rieger K, Salmond S, et al. Chapter 8: Mixed methods systematic reviews. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis: JBI; 2020. Available from: https://synthesismanual.jbi.global .

Sandelowski M, Voils CI, Barroso J. Defining and designing mixed research synthesis studies. Res Sch. 2006;13:1.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3:2.

Download references

This research received no specific grant from any funding agency in the public commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Mater Research Institute – University of Queensland, Level 3, Aubigny Place, Raymond Terrace, South Brisbane, QLD, 4101, Australia

Kerri Gillespie & Hayley Kimball

Mater Health, Catherine’s House for Mothers, Babies and Families, South Brisbane, QLD, 4101, Australia

Sam Adhikary & Grace Branjerdporn

You can also search for this author in PubMed   Google Scholar

Contributions

The study was conceptualised by G.B. G.B. and S.A. contributed to the development of the systematic review plan and design. All authors contributed to refining the search strategy, eligibility criteria, data synthesis plan, and risk of bias assessment. K.G. wrote the draft manuscript. G.B. and S.A. reviewed and edited the final manuscript.

Corresponding author

Correspondence to Grace Branjerdporn .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13643_2024_2625_moesm1_esm.doc.

Additional file 1: PRISMA-P (Preferred Reporting Items for Systematic review and Meta-Analysis Protocols) 2015 checklist: recommended items to address in a systematic review protocol*.

Additional file 2: Search strategy.

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gillespie, K., Adhikary, S., Kimball, H. et al. An evaluation of education videos for women experiencing domestic and family violence in healthcare settings: protocol for a mixed methods systematic review. Syst Rev 13 , 213 (2024). https://doi.org/10.1186/s13643-024-02625-x

Download citation

Received : 01 January 2024

Accepted : 20 July 2024

Published : 06 August 2024

DOI : https://doi.org/10.1186/s13643-024-02625-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review on video analysis

SYSTEMATIC REVIEW article

A systematic literature review of analytics for adaptivity within educational video games.

Manuel Ninaus,,&#x;

  • 1 Department of Clinical Psychology, Institute of Psychology, University of Innsbruck, Innsbruck, Austria
  • 2 LEAD Graduate School and Research Network, University of Tübingen, Tübingen, Germany
  • 3 Leibniz-Institut für Wissensmedien, Tübingen, Germany
  • 4 Chair of Psychology of Learning with Digital Media, Faculty of Humanities, Institute of Media Research, Chemnitz University of Technology, Chemnitz, Germany

Research has shown that serious games, digital game-based learning, and educational video games can be powerful learning instruments. However, experimental and meta-research have revealed that several moderators and variables influence the resulting learning outcomes. Advances in the areas of learning and game analytics potentially allow for controlling and improving the underlying learning processes of games by adapting their mechanics to the individual needs of the learner, to properties of the learning material, and/or to environmental factors. However, the field is young and no clear-cut guidelines are yet available. To shed more light on this topic and to identify common ground for further research, we conducted a systematic and pre-registered analysis of the literature. Particular attention was paid to different modes of adaptivity, different adaptive mechanisms in various learning domains and populations, differing theoretical frameworks, research methods, and measured concepts, as well as divergent underlying measures and analytics. Only 10 relevant papers were identified through the systematic literature search, which confirms that the field is still in its very early phases. The studies on which these papers were based, however, show promise in terms of the efficacy of adaptive educational games. Moreover, we identified an increased interest in the field of adaptive educational games and in the use of analytics. Nevertheless, we also identified a clear lack of common theoretical foundations as well as the application of rather heterogenous methods for investigating the effects of adaptivity. Most problematic was the lack of sufficient information (e.g., descriptions of used games, adaptive mechanisms), which often made it difficult to draw clear conclusions. Future studies should therefore focus on strong theory building and adhere to reporting standards across disciplines. Researchers from different disciplines must act in concert to advance the current state of the field in order to maximize its potential.

Introduction

Digital game-based learning is becoming a powerful tool in education (e.g., Boyle et al., 2016 ). However, several open issues remain that require further research in order to optimize the use of game-based learning and educational video games. One unique characteristic of digital learning games is the wealth of data they produce, which can be acquired and used for (learning) analytics and adaptive systems. Adaptive learning environments are part of a new generation of computer-supported learning systems that aim to provide personalized learning experiences by capitalizing on the generation and acquisition of knowledge and other types of data regarding learner’s cognitive capabilities, knowledge levels, and preferences, among other factors (e.g., Mangaroska and Giannakos, 2019 ).

Adaptive learning is characterized by an adaptive approach to learner’s individual needs and preferences in order to optimize learning outcomes and other learning-related aspects, such as motivation. While the idea of adaptive learning is not new (e.g., mastery learning as discussed by, Bloom, 1968 ) and has received strong support from researchers in educational psychology ( Alexander, 2018 ), it is surprising how few systematic studies are available on adaptive learning with digital technologies and game-based learning in particular. For instance, a recent review of adaptive learning in digital environments in general found evidence of the effectiveness of adaptivity ( Aleven et al., 2016 ). However, in this review, only one study was identified as having used an educational video game, thus demonstrating the lack of research currently being performed on adaptive learning in the domain of game-based learning.

Most entertainment games are pre-scripted and therefore have static (game) elements such as, content, rules, and narratives ( Lopes and Bidarra, 2011 ). While “fun” is the locus of attention in entertainment games and has been investigated in learning games as well ( Nebel et al., 2017c ), educational games serve additional purposes, as they need to convey learning content appropriately to learners. According to Schrader et al. (2017) adaptivity in educational games can be defined as “a player-centred approach by adjusting game’s mechanics and representational modes to suit game’s responsiveness to player characteristics with the purpose of improving in-game behavior, learning processes, and performance” (p. 5). Hence, finding the right balance between the learner’s skills and the challenge levels of the games is a critical issue, especially as the perceived difficulty and inferred feedback after facing a task could influence learning outcomes (e.g., Nebel et al., 2017b ). Researchers agree that educational video games could utilize adaptivity to optimize knowledge and skills acquisition (e.g., Lopes and Bidarra, 2011 ; Streicher and Smeddinck, 2016 ). Potentially, all elements of a game can become adaptive elements ( Lopes and Bidarra, 2011 ). For instance, gameplay mechanics, narrative and scenarios, game content and its objectives, etc., all can contribute to offer and personalized and individualized gaming and learning experience.

It seems natural for learning material to be adapted to individual needs and preferences. In analogue learning settings, this can be achieved by individualized support from educators, teachers, etc.; in multiplayer games, social processes can trigger similar processes ( Nebel et al., 2017a ). For single-player games, however, there are several different ways to acquire the data needed to identify user’s needs or preferences (for a review see Nebel and Ninaus, 2019 ) and to change the learning environment accordingly (for a review see Aleven et al., 2016 ). Numerous studies have demonstrated that data or analytics gathered during play can be used to successfully detect various cognitive (e.g., Witte et al., 2015 ; Appel et al., 2019 ), motivational (e.g., Klasen et al., 2012 ; Berta et al., 2013 ), and emotional (e.g., Brom et al., 2016 ; Ninaus et al., 2019a ) states of users (for a review see Nebel and Ninaus, 2019 ). The analytics used in such studies range from simple pre-test measures and self-reports to more complex process measures utilizing (neuro-)physiological sensors (for a review see Ninaus et al., 2014 ; Nebel and Ninaus, 2019 ). Consequently, the current systematic review aims to identify if and how such analytics have been used to realize adaptive learning in games. In particular, we wanted to investigate the use of adaptive learning in educational games by utilizing analytics to adapt learning content to the skill level or cognitive capability of the player/learner.

While games theoretically offer many opportunities for adapting their content (e.g., visual presentation, narrative, difficulty), many factors usually need to be considered when implementing adaptivity. These factors include which analytics are used and what as well as how content is actually being adapted by the game or its underlying algorithms. Accordingly, frameworks for adaptive educational games are often guided by two questions (c.f., Shute and Zapata-Rivera, 2012 ): First, what to adapt: Which analytics and data are utilized to implement adaptivity and which elements are adapted (e.g., feedback, scaffolding, etc.)? Second, how to adapt: Which general methods are used to implement adaptivity? While different frameworks of adaptive (educational) games differ in their granularity and their specific design, they share the common goal of providing a generic approach on how to realize adaptivity (e.g., Yannakakis and Togelius, 2011 ; Shute and Zapata-Rivera, 2012 ; Schrader et al., 2017). For instance, Shute and Zapata-Rivera (2012) suggest a four-process adaptive cycle that connects the learner to appropriate educational material through the application of a user model. These generic frameworks are helpful for building adaptive systems, however, they tell very little about their effectiveness. Evaluating adaptive systems with empirical studies is therefore not only informative but absolutely necessary for advancing the field of adaptive educational games. Moreover, as the field of educational psychology aims to improve the theoretical understanding of learning ( Mayer, 2018 ), particular attention should be paid to the theoretical foundations of the adaptation mechanisms being used. This seems to be in contrast to recent trends in learning analytics (e.g., Greller and Drachsler, 2012 ) and game-learning analytics (e.g., Freire et al., 2016 ), which employ rather strong data-driven approaches. Thus, in our research, we focused especially on what is adapted, how adaptivity is implemented, and which analytics are utilized to realize adaptivity in game-based learning. In this latter respect, by identifying the analytics used for adaptivity, we sought to produce a precise overview of successful and less successful approaches in adaptive game-based learning in the interest of identifying practical needs and recommendations. That is, we aimed to provide a systematic overview of the current state of the art of adaptivity in game-based learning by analyzing the ways in which empirical research is currently being conducted in this field of research, which theoretical foundations are being used to realize adaptivity, and what is being targeted by adaptivity.

Description of Research Problem

That learning environments have the potential to act dynamically by gathering user data or pre-test values and responding by altering the learning tasks within a digital environment has been an established fact for decades (e.g., Skinner, 1958 ; Hartley and Sleeman, 1973 ; Anderson et al., 1990 ; Aleven et al., 2009 ). This approach might be particularly relevant in the field of game-based learning, as games are usually considered to involve highly dynamic environments and adaptive learning seems to be a promising avenue by which learning outcomes in digital learning can be enhanced (for a review see Aleven et al., 2016 ). However, the extent to which adaptivity has been implemented in empirical studies using game-based learning has thus far not been systematically documented. Consequently, in the current pre-registered systematic literature review (see Ninaus and Nebel, 2020 ), we pay particular attention to game-based learning environments to uncover the current state of research in this field. An increasing number of studies have demonstrated the use of various analytics to identify different mental states of the users that might be useful for adapting educational games in real time (for a review see Nebel and Ninaus, 2019 ). However, it remains unknown whether these suggested adaptive approaches have actually been implemented and evaluated in game-based learning. Thus, with this systematic literature review, we aim to address this open question by analyzing the current state of the literature. Instead of motivation or personality-based adaptations (e.g., Orji et al., 2017 ), we focused on cognition or performance-based adaptations, as learning theories heavily focus on this perspective (e.g., Sweller, 1994 ; Sweller et al., 1998 ; Mayer, 2005 ) and it allows for a more focused analysis of the current literature. Accordingly, we sought to identify successful (learning) analytics for adapting the game-based learning environment to, for instance, the skill level or cognitive capability of the learner. That is, which data about learners or their context can be utilized for understanding and optimizing learning by adapting the learning environment. Doing so might shed light on which approaches are most successful in adaptive game-based learning and thereby advance the field and provide practical recommendations for researchers and educators alike.

Study Objectives

Taken together, this paper systematically reviews the ways in which adaptivity in game-based learning is realized. For this pre-registered systematic literature, we broadly searched for empirical studies that utilized analytics to realize adaptivity in game-based learning scenarios and educational video games. Based on the previously described research gaps, we were specifically interested in the following three research questions (RQ):

(RQ1) How is research in the field of analytics for adaptation in educational video games currently conducted? For instance, which learning domains and analytics are most popular as well as most successful in this research field, and which empirical study designs are currently being employed to study the effects of adaptive elements?

(RQ2) What cognitive/theoretical frameworks within analytics for adaptation in educational video games are currently used? That is, which theoretical underpinnings are currently being used to integrate, justify, and evaluate analytics in adaptive educational video games?

(RQ3) What types of outcomes are influenced by an adaptive approach? For instance, are the analytics and adaptive mechanisms used for adapting the difficulty of a quiz in a game or is the overall pace of the gameplay altered?

To contribute a precise overview as well as to identify areas for future development in the analytics for adaptation research field, this paper follows the meta-analysis article-reporting standards proposed by the American Psychological Society (2020a) , and the PRISMA checklist ( Moher et al., 2009 ) has been used to ensure the inclusion of all relevant information.

Materials and Methods

Research design overview.

This review can be considered systematic ( Grant and Booth, 2009 ) as it includes clarifications of the research questions and a mapping of the literature. Furthermore, the information generated by the review was systematically appraised and synthesized, and a discussion on the types of conclusions that could be drawn within the limits of the review was included ( Gough et al., 2017 ). A systematic approach was chosen to more fully investigate the range of available research in the field and to produce more reliable conclusions with regard to the research questions. Although all literature reviews should be question-led ( Booth et al., 2016 ), rather specific research questions and an overarching synthesis of the findings demarcate this review from similar approaches, such as scoping reviews ( Arksey and O’Malley, 2005 ; Munn et al., 2018 ). To further enhance methodological integrity, a pre-registration was filed prior to data collection ( Ninaus and Nebel, 2020 ). Finally, the review was conceptualized with a focus on open material by, for instance, providing the developed coding table ( Moreau and Gamble, 2020 ). This, combined with an in-depth description of the approach, should enable future replications of this review, in turn facilitating the systematic identification of additional developments in the field.

Study Data Sources

Researcher description.

Both lead researchers for this study are experts on experimental investigations of learning technologies, such as educational video games, and actively seek to investigate new approaches for enhancing learning processes. Both researchers have published experimental research in this field and were therefore capable of analyzing and systematizing the sample collected for the current study. However, it should be noted that some papers eligible for inclusion in the analysis were authored by the researchers themselves. Additionally, research assistants with experience working with experimental research and publications supported the coding procedure. The team’s affiliation with the field of psychology provided the necessary skills to interpret and evaluate the quality and potential of the measures and frameworks employed as needed for addressing RQ2 and RQ3. That said, given that the overall perspective taken for this work was decidedly psychological in nature, other approaches from relevant fields, such as computer science, received less emphasis.

Study Selection

The following search strategy ( Figure 1 ) was pre-registered ( Ninaus and Nebel, 2020 ) and employed for this review. Using the Frontiers research topic “Adaptivity in Serious Games through Cognition-based Analytics” as a basic foundation ( Van Oostendorp et al., 2020 ), articles that addressed some sort of adaptivity, use of a digital educational game, or use of a variation of analytics, and were based on a learning or cognitive framework, were collected. To identify this research, the Population, Intervention, Comparison, and Outcome (PICO; Schardt et al., 2007 ) approach was used, as it results in the largest number of hits compared to other search strategies ( Methley et al., 2014 ). To address population , labels describing the desired medium (e.g., educational video games) were used. If necessary, quotation marks were used to indicate the search for a specific term instead of a term’s components (e.g., “Serious Games” to prevent a misleading hit for serious ). Regarding intervention , search terms were used that addressed the topics of our research questions, such as adaptivity and analytics. The outcome segment was represented by keywords suitable for capturing the overall cognitive and learning focus of this review (e.g., cognition, learning). Finally, the comparison component could not be applied within this review, as a specific empirical procedure was not pre-defined; instead, this component served as a subject of interest for this review. These considerations led to the following search query:

www.frontiersin.org

FIGURE 1 . Study collection flowchart; based on Moher et al. (2009) .

(Adaptivity OR Adaptive OR Adjustment) AND (“Serious Games” OR DGBL OR GBL OR “Educational Videogames” OR “Game Based Learning” OR Simulations) AND (Analytics OR Analytic) AND (Cognitive OR Cognition OR Memory OR Brain OR Learning).

Furthermore, the search engines were adjusted to search within the title, abstract, or keywords of the articles. A quick analysis of the field was conducted to identify the most useful bibliographic databases in line with the psychological and empirical focus the review. As a result, the following databases were used: Association for Computing Machinery—Special Interest Group on Computer-Human Interaction and Special Interest Group on Computer-Human Interaction Play ( dl.acm.org/sig/sigchi ), Elton Bryson Stephens Company Information Services ( search.ebscohost.com ), Web of Science ( webofknwoledge.com ), Scopus ( scopus.com ), and Education Resources Information Center ( eric.ed.gov ). Other databases with a different focus, such as the technology focused IEEE eXplore, were not used as they might result in no substantial results meeting inclusion- or exclusion criteria presented below (e.g., empirical methodology, inclusion of cognitive aspects, measured outcomes on human participants). In addition, an invitation to recommend articles suitable for the review was sent to colleagues and spread via social media. Although the database search was carried out during February 2020, further additions through these additional sources were collected until the end of July 2020.

Altogether, the search returned 496 articles ( Figure 1 ; for the full list of coded articles, see: Ninaus and Nebel, 2020 ). These articles were given an identifier consisting of their database origin and a sequential number (e.g., SCOPUS121). Entries gathered via recommendations and other channels were labeled with OTHER. This ID is used throughout this paper when works within the coding table are referenced. The collection process was followed by the first coding of the articles (see Figure 1 ; Coding I) using the pre-registered coding table columns A1 to A18.2 ( Ninaus and Nebel, 2020 ). During this phase, each entry was coded by one coder and verified by a second coder. When disagreement or uncertainty occurred between these two coders, a third coder was consulted and the issue was discussed until the conflict was resolved. For any remaining uncertainty, the rule of thumb was to include rather than exclude the articles in question. The use of at least two independent coders not only increased data quality but also ensured that none of the authors could code their own papers solely by themselves. The focus of this first coding phase was to ensure the eligibility of the search results. More specifically, only papers that presented outcome measures, were published in English, were appropriate in the context of the research questions, were peer-reviewed, and could be classified as an original research study were included in the review. Furthermore, studies that did not involve digital games, were published prior to 2000, did not document the measures that were used, could be classified as a review or a perspective article, only applied a theoretical or technical framework, or were duplicates from other research results were excluded from the review. In addition, papers for which the full text could not be acquired were excluded as well. The coding procedure was stopped if any of the pre-registered ( Ninaus and Nebel, 2020 ) exclusion criteria or inclusion criteria were met or not met, respectively. For example, the paper WOS32 was published prior to the year 2000 and thus had to be excluded. As a consequence, the columns following the publication year (A6) were not completed.

The rationale for some of these exclusion and inclusion criteria are evident, such as the exclusion of duplicates. However, six criteria should be clarified further: 1) inclusion of outcome measures. For the analysis of RQ3, the papers had to provide detailed insights into the measured outcomes as influenced through adaptive elements. If no outcomes were included (e.g., CHIPLAY1) or could not be interpreted with respect to RQ3 (e.g., WOS87), then this work was excluded; 2) publication language in English. In order for the resulting coding table to be interpretable by both the coders and the potentially broad readership, only those works published in the English language were included in the review. Papers whose abstracts were translated into English but whose main text was not were also excluded (e.g., SCOPUS128); 3) appropriateness for the research question. As complex research questions were abbreviated to short keywords during the database research, the validity of the search results had to be verified. Doing so was crucial, as some keywords generated for the current study have also been used in different, unrelated fields or have ambiguous meanings when not in context. For example, search items such as “Learning” or “Adaptivity” are also used in the field of algorithm research (e.g., EBSCO80); 4) original research study. To avoid redundancies and overrepresentations of specific approaches, only original research studies were included. Additionally, other meta- or review-like publications (e.g., EBSCO40) were excluded, as the current work sought to reach independent conclusions. Because of this, editorial pieces for journals (e.g., EBSCO86) or conferences (e.g., SCOPUS19) were also excluded. As RQ3 required experimentation and/or data collection, theoretical frameworks (e.g., SCOPUS199) or similar publications were omitted as well. Additionally, it was specified to focus on data stemming from human participants, thereby ruling out research using simulations (e.g., WOS42); 5) digital game use . As specifically stated within RQ1 and RQ2, this review addresses educational video games. The focus on digital technology was used to shed light on new approaches to adaptivity and assessment not feasible using other games, such as educational board or card games. Clearly, then, papers in which no game at all was included (e.g., OTHER7) were excluded from the review as well. Similar but not identical approaches to video games, such as simulations (e.g., CHI1), were also omitted to retain the focus solely on games; 6) publication year. Technology continues to rapidly change and evolve. Thus, comparing research on specific properties of technology is especially challenging. To face this challenge, and to remain focused on new developments within the field of adaptivity and assessment, papers published before the year 2000 were excluded. Moreover, although the assessment of study quality is part of many systematic review frameworks (e.g., Khan, 2003 ; Jesson et al., 2011 ), it is also a much debated issue within review research ( Newman and Gough, 2020 ) and was therefore not used as a selection criterion. Study quality was, however, investigated during the full-text analysis.

In cases where certain criteria could not be conclusively determined based on the information presented in article abstracts and/or titles alone, these works were not excluded in the initial step Overall, 33 articles were deemed appropriate for further full-text review (see Figure 1 ; Coding II) or could not be excluded based solely on title or abstract information. Concerning the remaining, excluded articles, 12 were duplicates of other table entries, 70 did not investigate digital games, 49 did not constitute original research, 318 contained no information relevant to the research questions, and 14 were published prior to the year 2000, resulting in an exclusion rate of 93.35% after the initial coding phase.

Subsequent to the initial coding phase, the authors and assistants coded the remaining articles and completed the pre-registered columns B to G, finalizing the coding table for publication alongside this paper. Similar to the initial phase, the second coding procedure was directed by the same criteria described above, excluding or including published works based on a full-text analysis. Consequently, two papers were identified as duplicates, four did not investigate digital games, 10 did not include original research, and six contained no information relevant to the research questions—thus, these 22 articles were also excluded. Ultimately, then, the second coding procedure resulted in the inclusion and coding of 10 papers (2.01% of the complete sample) for subsequent analysis in this systematic literature review ( Table 1 ). One paper (OTHER1 and OTHER2) included multiple experiments and was therefore coded into separate rows in order to investigate the experiments individually. As noted above, three of the sample papers (30.00% of the final sample) were authored or co-authored by the authors of the present review.

www.frontiersin.org

TABLE 1 . Reviewed studies.

Papers Reviewed

In addition to the overview table ( Table 1 ), a short summary is presented below, as the final sample was small enough to permit a brief discussion of each paper. The papers will be discussed in no particular order.

Operation ARA: A Computerized Learning Game That Teaches Critical Thinking and Scientific Reasoning (OTHER1 and OTHER2)

The paper by Halpern et al. (2012) includes two separate experiments analyzing the impact of a serious game with respect to scientific reasoning. The authors assessed the student’s level of knowledge with scores on multiple-choice tests and as a form of adaptivity, assigning the students based on this classification into three different tutoring conditions. Within the first experiment, this adaptive approach was accompanied by other adjustments that supposedly support learning and then compared to a control group. The second experiment addressed the tutoring component in more detail. For this, three variations (including the adaptive version) were compared to a control group. Overall, the authors used a pre-post design and a sample size of over 300. As a result, the authors concluded that their game was useful for learning as intended; however, conclusions specifically regarding adaptivity can only be cautiously drawn, as the results were either confounded with other variables or only reached significance for very specific comparisons.

Implementation of an Adaptive Training and Tracking Game in Statistics Teaching (SCOPUS2)

Groeneveld (2014) used a popular approach of difficulty assessment and adaptation (i.e., the Elo-algorithm, e.g., Klinkenberg et al., 2011 ; Nyamsuren et al., 2017 ) to match student’s skills as well as item difficulty in a statistics learning tool. Groeneveld aimed to reach a 75% success rate of solving tasks among the students. The tool was revealed to be useful in a real-life application that included over 400 students. However, no simultaneous control group was implemented and no specific process data on how the adaptive algorithm influenced learning processes could be gathered.

A Pilot Study on the Feasibility of Dynamic Difficulty Adjustment in Game-Based Learning Using Heart-Rate (OTHER11)

Ninaus et al. (2019b) used physiological measurements (i.e., heart rate) to assess player arousal and defined thresholds to adapt game difficulty according to the Yerkes-Dodson Law ( Yerkes and Dodson, 1908 ). Overall, the authors clearly framed their research as a pilot study, justifying the low count of 15 participants for their main experiment. This also explains the lack of dedicated learning measurements. Nonetheless, their results indicated that the adaptive approach resulted in a more difficult, challenging, and fascinating game experience.

Gamification and Smart Feedback: Experiences With a Primary School Level Math App (SCOPUS1)

Based on the theoretical framework of competence-based knowledge space theory ( Doignon, 1994 ; Albert and Lukas, 1999 ), Kickmeier-Rust et al. (2014) built a digital agent that provided feedback in a gamified math-learning environment. When pre-defined thresholds of user skill levels were reached, the agent provided adaptive information. In an experiment that included 40 second-grade students, Kickmeier-Rust and colleagues were not able to determine any statistically significant benefits of their method.

Competitive Agents and Adaptive Difficulty Within Educational Video Games (OTHER12)

In the experiment by Nebel and colleagues (2020), two game versions that additively regulated social competition were compared to a non-adaptive game scenario. The authors based their assumptions on, among other theories, cognitive load theory ( Sweller, 1994 ; Sweller et al., 2011 ). Yet, they did not specify how adaptive variations might interact with the implications from their theoretical foundation. Overall, the experiment demonstrated empirical support for each adaptive game version, with the version including an artificial and adaptive opponent exhibiting significant advantages. However, as the three game versions differed in more than one feature, the results were confounded to a certain degree and, as a consequence, associating certain outcomes to specific properties would be challenging.

Logs Analysis of Adapted Pedagogical Scenarios Generated by a Simulation Serious Game Architecture (WOS57)

Callies et al. (2020) used a Bayesian network to estimate user knowledge and included a planning algorithm to adjust the learning sequence in a real-estate learning simulation with respect to each user. In particular, feedback, challenge, and learning context were adjusted. An evaluation study was conducted, and qualitative considerations supported the feasibility of the chosen approach, although no quantitative information on learning could be evaluated using inferential statistics.

Sensor Based Adaptive Learning—Lessons Learned (SCOPUS11)

This paper reports the findings of ongoing research from the previously discussed paper OTHER11 ( Ninaus et al., 2019b ). However, more details on potential assessment (e.g., body temperature, CO 2 data) and adaptive mechanisms (e.g., alerts, recommendations) were provided alongside preliminary results, which indicated that participants became aware of the adaptation.

Effect of Personalized Gameful Design on Student Engagement (OTHER15)

For this research, Mora et al. (2018) used the SPARC model, previously reported by the lead author ( Mora et al., 2016 ), to gamify software that teaches statistical computing. Within the experiment, users were categorized according to self-evaluations based on the Hexad User Type scale ( Tondello et al., 2016 ) and assigned to four different implementations of rules and rewards. The inferential statistical analysis could not identify significant deviations induced through this approach. Following our rule of thumb (see Study Selection ), we retained this study as part of our final sample even though it was not completely clear whether the focal instrument could be considered a game.

Predicting Learning in a Multi-Component Serious Game (SCOPUS8)

This research by Forsyth et al. (2020) used the same software as that employed in previous experiments (OTHER1 and OTHER2; Halpern et al., 2012 ) with the aim of gaining insights into knowledge formation divided into deep and shallow learning ( Marton and Säljö, 1976 ). For this, students were assigned to different tutoring conditions based on an assessment of prior knowledge with multiple choice tests. The results suggested that some principles, such as generation, might be suitable predictors for learning within the learning environment under study.

Improving Student Problem Solving in Narrative-Centered Learning Environments: A Modular Reinforcement Learning Framework (OTHER13)

Based on theories such as seductive details ( Harp and Mayer, 1998 ) or modular reinforcement learning ( Sutton and Barto, 2018 ), Rowe and Lester (2015) scaffolded adaptive events in microbiology learning and compared them to a non-adaptive version. In summary, the researchers could not identify significant learning improvements with regard to microbiology learning outcomes.

Data-Analytic Strategies

During the second phase of full article screening, the full text of the articles was reviewed to search for information needed to address RQ1 to RQ3 as well as to generate an overview that could be systematized and presented within this paper. For this, a content analysis approach ( Lamnek and Krell, 2016 ) was followed. More specifically, to answer RQ1 to RQ3, frequency analysis ( Lamnek and Krell, 2016 ) was used and necessary categories, such as type of research or significance of research, were created. Through this, quantifiable information, such as whether a specific model was used more frequently than others, could be inferred. The codes for these categories were built using the prior knowledge of the researchers in the field of experimental research (e.g., study types) or as derived verbatim from the articles (e.g., names of specific theories). Where applicable, existing codes within the literature were used. For example, game genre or subject discipline were coded using labels from previous meta-analytic and review-like work ( Herz, 1997 ; Connolly et al., 2012 ). Thus, frequencies could potentially be compared to other reviews in the field. Additionally, other relevant information was narratively systematized and is discussed within the Narrative Content Analysis section. For this, no specific qualitative or quantitative approach was used but was instead reliant on an in-depth discussion and interpretation by the authors, one aligned with the overall aim of this review. In line with the systematic approach, differences were discussed and resolved through consensus.

Methodological Integrity

Several aspects of methodological integrity that needed to be discussed have already been addressed within previous sections (e.g., researcher’s perspective). Following the Journal Article Reporting Standards by the American Psychological Association (2020b) , additional, complementary information was needed. First, to validate the utility of the findings and the general approach to addressing the study problem, a section specifically devoted to this issue is included in the discussion. Second, to firmly base the findings within the evidence (i.e., the papers), the codes should be closely aligned with the sampled literature and sufficient, supportive excerpts should be provided. However, copyright protection of the original articles prevented the inclusion of exhaustive direct quotations. Third, consistency within the coding process was supported by pre-defined entry options that were prepared for several columns within the coding table. This was especially useful during the first coding phase when the general inclusion or exclusion criteria were verified. For example, game type (column A11) was coded as either digital , non-digital , or unknown based on abstract . Similar, appropriateness for the review (A15.1) was coded as 1 = should be considered, 2 = should NOT be considered, or 3 = not sure. When a paper was coded as 3 , the entry was reviewed by a second coder. For the full-text review, such pre-defined entries were less applicable in certain cases, as the codes themselves were of interest in regard to the initial research questions. For example, column B2, What are the used theoretical frameworks? needed to be completed during the review process, as the answer to this question was naturally unavailable prior to the review. Overall, the process of pre-registering the coding table, research method, and research questions ensured a high level of methodological integrity throughout the review. Any deviations or extensions of the a priori formulated research plan (which is not unusual for qualitative or mixed method research; Lamnek and Krell, 2016 ) is clearly indicated throughout the paper. For example, analyses beyond that covering RQ1–3 were included within the Narrative subsections of the results section. Finally, the integrity of research like systematic literature reviews is limited by the integrity of the reviewed material. To ensure basic scientific quality, papers that have not yet undergone peer-review were omitted during the first coding phase. In addition, concerns or potential critical issues were identified within the Papers Reviewed section.

The findings described below were based on the final sample. Unless stated otherwise, phrases such as “20% of the sample” refer to the final studies reviewed, not to the initial sample after the literature search. In addition, percentages are reported in relation to the final coded sample (11) and not in relation to the ultimately included papers or manuscripts (10).

RQ1—How is Research in the Field of Analytics for Adaptation in Educational Video Games Currently Conducted?

The current systematic literature review identified 10 relevant papers in total. One paper included two studies and was therefore considered as separate entries—that is, as two entries in the coding table. To better understand how research in the field of analytics for adaptation in educational video games is currently conducted, we provide a comprehensive overview of the games and methods currently being used before we report the specifics of the actual implementation of adaptivity.

In the identified 11 studies, eight different games were used. That is, the same games or at least the same game environments were used in multiple studies or papers. The game genres used in the studies did not vary considerably, with simulation games (4) and role-playing games (4) being the most popular. The three other studies employed games that did not fit the predefined genres (cf. Connoly et al., 2012 ) and were thus classified as “other.” For instance, in OTHER15, a gameful learning experience was designed using Trello boards and the SPARC model ( Mora et al., 2016 ). OTHER12 utilized a game-like quiz, while SCOPUS1 used a game-like calculation app. However, we must note that even for those games that were classified into a predefined genre, the decision to do so was not always clear cut and was at least debatable, as these games were not always sufficiently described.

The studies and games covered different subject disciplines , with games covering Science being the most popular (4). Mathematics was the subject discipline for two games. Business and Technology were covered by one study each. Moreover, three studies did not clearly fit into any of the predefined subject disciplines: OTHER12 covered general factual knowledge on animals, whereas OTHER11 and SCOPUS11, which used the same game, covered procedural knowledge for emergency personnel. In this context, the studies mostly targeted higher education content (6) and continuing education (3). Primary and secondary school content was targeted by one study each.

In the identified sample of papers, the majority utilized quantitative data (9). Only one study used qualitative data, while two studies used a combination of both. The overall mean sample size of 120.73 could be sufficient to detect differences of medium effect size between two independent samples. For instance, a two-tailed t -test with alpha = 0.05, power = 0.8, and d = 0.5 requires a sample size of 128 according to g*power ( Faul et al., 2007 ). However, the large standard deviation of 120.46 highlights major differences between the individual studies. Furthermore, the studies utilized very different designs, requiring more or less statistical power. To this end, an a priori power analysis could not be identified as a normal procedure within the sample.

Most of the identified studies (8) evaluated their adaptive approach with participants from university/higher education (i.e., university students). Although WOS57 did not report specific information on participants, they were recruited with advertisements on a university campus, which suggests a high likelihood that most of the sample consisted of university students as well. Corresponding with the reported target content (see Games ), one study was performed in primary school and one in secondary school. The majority of studies were conducted in a real-world setting (7). Four studies were performed in laboratory settings. The substantial proportion of studies performed in the field indicates the dominance of an applied approach to the field of adaptive educational video games.

In six of the 11 studies, the authors utilized a control group or control condition to evaluate the effects of adaptivity. However, we should note that in SCOPUS2, the authors also made an attempt to descriptively compare their investigated student sample with data they had on students from previous years using the same system but without adaptive components. This comparison was accomplished only descriptively and, overall, the comparison was not sufficiently described, which led to it being classified as not having used a dedicated control group. In any case, the number of studies utilizing a control group was equivalent to the number of studies that did not, underscoring the urgency of increasing empirical standards in this field of research. However, almost all studies used an experimental or quasi-experimental study design (9). One study was mostly correlational, while another employed a qualitative study design. Furthermore, seven studies only used one measurement point (i.e., post-test measure), while three studies utilized pre- as well as post-test measurements to evaluate their adaptive approach. In SCOPUS1, neither a classic pre- nor post-test was reported, but instead the authors observed primary class children for two learning sessions with or without the adaptive component (i.e., feedback) of the game in question. Only two studies solely used descriptive statistical analyses . One study used a combination of descriptive statistics and correlation but without inferential testing. Hence, the large majority (7) ran descriptive as well as inferential statistical analyses to support their conclusions.

Overall, the general goal of the adaptive mechanism integrated into the games included in the sample was, in most studies, to optimize learning (7). The rest of the studies focused on instigating a change in either a behavioral, motivational, cognitive, or social variable. Hence, there is a clear focus on directly affecting learning in and of itself in the investigated sample of studies. The strategies by which the authors of the studies sought to achieve adaptivity also varied considerably (see also RQ2 on which theoretical frameworks were used). Hence, providing frequencies on the different approaches was not possible. Instead, a few examples are included here to demonstrate the types of approaches used (for more detailed information, see Ninaus and Nebel, 2020 ). For instance, the adaptive mode used in OTHER11 was aimed at keeping the players in the game loop for as long as possible. In contrast, SCOPUS2 sought to maintain the chance of being correct in the game at 75% by adapting the game’s difficulty. Others tried to adapt the feedback provided by the game (e.g., SCOPUS1) or used natural language processing to develop questions that were posed to the players on the basis of their prior knowledge (e.g., SCOPUS8).

To realize adaptivity within a learning environment, different sources of data can be used. Although assessment and adaptivity could be potentially realized using only the game system itself, the majority of the sample (80%) utilized additional surveys and questionnaires. Only two papers could be identified with a potentially less intrusive approach of exclusively utilizing in-game measures. In this vein, only two papers used physiological measures instead of behavioral indicators or survey data to realize adaptivity. Using these data, different adaptive elements were realized in the sample by either between (8) or within (3) subject designs. The vast majority of studies were aimed at adapting the difficulty of the game (8). Two other studies used adaptive scaffolding to optimize the learning outcomes. One other study (WOS57) investigated pedagogical scenario adaptation (i.e., automatically generated vs. scripted). These elements were, in most of the studies, adapted in real time (8), followed by between learning sessions (3). However, this differentiation was not always clear as relevant information was in some instances missing. In seven studies, processing of data was done using a user model. Two studies processed the data without a user model, while another two used the raw data only.

RQ2—What Cognitive/Theoretical Frameworks Within Analytics for Adaptation in Educational Video Games are Currently Used?

Apart from OTHER1, OTHER2, OTHER11, and SCOPUS11, which were either two experiments within the same paper or articles authored by almost identical authors, each experiment used a unique theoretical approach. Thus, frequency analysis would have been ineffectual. Instead, a few examples can be used to illustrate the encountered theoretical approaches. For instance, cognitive load theory (OTHER12), learner models with Bayesian networks (WOS57), competence-based knowledge space theory (SCOPUS1), or modular reinforcement learning (OTHER13) were applied. This indicates emphasis on data-driven methodology or institutional preferences rather than on slowly evolving and unifying theoretical frameworks.

RQ3—What Kind of Outcomes are Influenced Through the Adaptive Approach?

Every game within our final sample was originally intended as a learning game. Additionally, every application within the sample was intended to increase learning outcomes. Other possible combinations—for example, entertainment-focused commercial games used within an educational context or educational games aimed at improving metacognition or motivation—were not observed. Instead, 60% of the papers reported that the adaptive mechanism was mainly intended to increase learning outcomes. One study aimed to improve user experience (SCOPUS11), while another sought to explore motivational aspects (WOS57); the remaining two studies, on the other hand, indicated mixed goals (OTHER11, OTHER15). In sum, learning improvements can be identified as the main target of adaptive approaches. A different distribution can be observed within the report of statistically significant findings: 50% of the final sample revealed significant findings, 10% reported mixed results, and 40% did not generate statistically significant outcomes. For this frequency analysis, however, it should be noted that statistical significance alone neither necessarily indicates a relevant effect size nor confirms a sufficient methodological approach. In addition, the potential threat of publication bias cannot be ruled out. No study, at least, revealed negative outcomes, and even if not statistically significant or mixed, the majority of outcomes were indicated to be positive.

Narrative Content Analysis

Disciplines and publication.

Although the inclusion criteria allowed for the inclusion of research ranging back to the year 2000, the oldest eligible article was published in 2012, with 50% of the final sample being published from 2018 to 2020. This indicates a substantial increase in research interest in the field. Psychology or computer science were identified as the disciplinary affiliation of the primary author for 80% of the articles, with only 20% originating from the field of education. This might indicate a lack of sufficient support through educational research. The final sample contained only research from North America or Europe, raising questions regarding the availability or visibility of research from other productive regions, such as Asia. Similarly, only 50% of the final sample could be gathered through database research, raising concerns about sufficient visibility or insufficient standardized keywords in the research field. This could potentially be explained by the fact that apart from two papers published in the International Journal of Game-Based Learning ( Felicia, 2020 ), all eligible papers were published in different outlets and conferences. This suggests the lack of an overarching community or publication strategy but also indicates that the field addressed by the systematic review remains in its infancy.

Research Strategy and Limitations

Most frequently, the collected articles comprised exploratory research, as only research questions like “Does an online personalized gameful learning experience have a greater impact on student’s engagement than a generic gameful learning experience?” ( Mora et al., 2018 , p. 1926) or open-ended questions were included. For example, formulations such as “The primary research question that the current paper addresses is …[…]” ( Forsyth et al., 2020 , p. 254; emphasis added) entailed unspecified additional observations. Rarely, hypothesis testing research was identified: “Hypothesis 4: Learners playing against adaptive competitive elements demonstrate higher retention scores than players competing against human opponents” ( Nebel et al., 2020 , p. 5). Although exploratory studies are very valuable in early research, their outcomes are subject to more methodological limitations than theoretically and empirically supported hypothesis-testing research. Additional limitations are imposed through frequent applications of quasi-experimental factors using split groups—e.g., “we split the participants into three roughly equal groups based on pretest scores” ( Forsyth et al., 2020 , p. 268)—or with separations based on other sample properties—e.g., “Students belonged to the CAS or CAT group according to the native language recorded in their academic profile” ( Mora et al., 2018 , p. 1928).

Occasionally, the authors used limited statistical methods or reported disputable findings if no significant result could be produced: “[…] the descriptive statistics suggest that personalization of gameful design for student engagement in the learning process seems to work better than generic approaches, since the metrics related to behavioral and emotional engagement were higher for the personalized condition in average” ( Mora et al., 2018 , p. 1932). This was observed even in cases in which the authors were aware of their shortcomings—e.g., “[…] there is a danger of mistaking a correlation for causality […]” ( Groeneveld, 2014 , p. 57)— or if critical methodological difficulties, such as alpha-error inflation, were not considered—“We then computed correlations between all of the measures for the cognitive processes and behaviors (i.e., time-on-task, generation, discrimination, and scaffolding) and the proportional learning gains for the two topics (experimental, sampling) for each of the four groupings” ( Forsyth et al., 2020 , p. 265). However, rarely, the necessary corrections were applied: “[…] Sidak corrections will be applied to the pairwise comparisons between these groups” ( Nebel et al., 2020 , p. 8). In contrast, even for process data for which a myriad of potential comparisons could be made, statistical criteria, such as significance levels, were handled incautiously: “A two-tailed t -test indicated that students in the Induced Planner condition (M = 13.7, SD = 10.9) conducted marginally fewer tests than students in the Control Planner condition (M = 19.5, SD = 14.4), t (59) = 1.80, p < 0.08” ( Rowe and Lester, 2015 , p. 8). Justifications for such methodological issues relied on the goal justifies the means approach: “Although the assumption of independence was violated, the goal of the correlations was to simply serve as a criterion for selecting predictor variables to include in follow-up analyses” ( Forsyth et al., 2020 , p. 265). As a consequence, the validity of the gathered insights are questionable in light of the methods by which they were generated.

Similar to other emerging fields, yet rarely, the sampled studies employed standardized measurements or comparable indicators: “Second, the high success rate in the final exams is reassuring, but can hardly be considered evidence” ( Groeneveld, 2014 , p. 57). However, some authors tried to overcome such methodological issues by pre-testing the measures themselves: “There were two versions of our measure of learning […] Reliability was established using over 200 participants recruited through Amazon Mechanical Turk” ( Forsyth et al., 2020 , p. 263). However, such pre-testing regarding the effectiveness of the learning mechanism or the suitability of the measures themselves was scarce. Some authors discussed this issue a posteriori: “In hindsight, the lack of a condition effect on learning is unsurprizing. A majority of the AESs provided scaffolding for student’s inquiry behaviors, rather than microbiology content exposure, which was the focus of the pre- and post-tests” ( Rowe and Lester, 2015 , p. 7).

Potential for Improvements

Despite severe methodological challenges, the sampled researchers highlighted various areas of improvement within the field. For instance, in cases where the game was not created by the researchers themselves or not specifically for the addressed research questions, limited insights into the different processes were acknowledged: “[…] we cannot disentangle one theoretical process from each other without restructuring the entire game. For these reasons, we acknowledge that our findings may not be as generalizable as we would hope in regards to the literature of the learning sciences” ( Forsyth et al., 2020 , p. 274). Another potential area for improvement was the method of adaptation itself. Often, predefined and global thresholds are used to adjust adaptivity, neglecting potential differences between the users: “Feedback is triggered when a certain pre-defined probability threshold is reached for a skill/skill state” ( Kickmeier-Rust et al., 2014 , p. 40). Rarely, these thresholds are based on other research or pilot studies: “The 5 bpm threshold was defined based on previous pilot tests with the same game” ( Ninaus et al., 2019b , p. 123). In contrast, the thresholds are frequently based on assumptions made by the authors themselves: “Students have, regardless their ability level, a 75% chance of correctly solving a problem, which is motivating and stimulating […]” ( Groeneveld, 2014 , p. 54). In addition to the potential for technical improvements, theoretical work could be enhanced as well, especially as exhaustive motivation regarding the use of adaptive features is not presented but their usefulness is rather assumed or briefly mentioned: “This feature also guarantees learner engagement throughout the duration of the game session” ( Callies et al., 2020 , p. 1196) or “Diverse psychological viewpoints agree that people are not equal, therefore, they cannot be motivated effectively in the same way” ( Mora et al., 2018 , p. 1925). Rarely, full chapters discussing which processes might be influenced through adaptive elements are included: “[…] adaptive mechanisms […] offer several benefits in educational settings. For example, […]” ( Nebel et al., 2020 , p. 3/4); alternatively, references to methodological approaches are included: “[…] Evidence-Centered Design […] requires that each hypothetical cognitive process and behavior to be carefully aligned with the measures. For this reason, we needed to identify general processes or actual behaviors with theoretical underpinnings […]” ( Forsyth et al., 2020 , p. 259).

Recommendations

In addition to potential improvements that were more or less explicitly stated or can only be inferred with sufficient knowledge of empirical research, some authors provided clear recommendations. For example, some authors claimed that data extraction and investigation should be intensified: “Thus, we suggest, as demonstrated in this study, that tools [should] be designed to facilitate data extraction and detect learning patterns” ( Callies et al., 2020 , p. 1196). Furthermore, the complexity of the research field was emphasized: “From our research we learned that quick and easy results are often neither realistic nor meaningful. […] Using user interaction data for learning analytics is also complex and becomes even more challenging when physiological data are used […]” ( Fortenbacher et al., 2019 , p. 197). Within several papers, it was reported that adaptive systems need more time or cases in order to function sufficiently: “[…] for example, students with a very low error rates, with highly unsystematic errors, or students who performed a very small number of tasks, did not received formative feedback because in those cases the system is unable to identify potential problems […]” ( Kickmeier-Rust et al., 2014 , p. 45). As a consequence, their full potential could not be assessed within the corresponding studies. As a potential solution, other researchers used pre-test samples to train their algorithms: “[…] we conducted a pair of classroom studies to collect training data for inducing a tutorial planner” ( Rowe and Lester, 2015 , p. 5). Finally, some authors were aware of methodological limitations and the need for better studies in the future: “Although the results are encouraging, we recognize that they are not the sort of well-controlled studies that are needed to make strong claims […]” ( Halpern et al., 2012 , p. 99) or “However, these results need to be treated with great caution. Future studies with larger sample sizes and more dedicated study designs need to investigate this in more detail” ( Ninaus et al., 2019b , p. 126).

Situatedness

Overall, the studies were overwhelmingly conceptualized within an academic context. The researchers worked within various universities, and their samples frequently included students from such institutions. This, consequently, limited their research perspective (e.g., lack of pedagogical input) as well as related applications (e.g., learning impairments, limited technical equipment). In addition, the authors were frequently affiliated with faculty from the same fields, resulting in scarce interdisciplinary discussions within the final sample.

Despite the numerous research studies demonstrating the potential of determining user states via various analytics relevant for learning (e.g., Klasen et al., 2012 ; Berta et al., 2013 ; Brom et al., 2016 ; Appel et al., 2019 ; Ninaus et al., 2019a ; for a review see; Witte et al., 2015 ; Nebel and Ninaus, 2019 ) and supporting their use for adaptation, few studies have actually followed through with this approach as indicated by the current literature review. Consequently, our aim to identify how such analytics have been used to realize adaptive learning in games was compromised by the low number of existing studies in this area. Nevertheless, the current systematic literature provided valuable insights into the nascent field of adaptive educational video games and its use of analytics.

Overall, the existing research on adaptive educational games appears to be somewhat heterogenous in terms of the conceptual approaches applied. However, we did identify clear patterns concerning game genres and subject disciplines. First, there seems to be a clear focus on simulations or simulation games, as well as on role-playing games. Although we can only speculate as to the reasons for this, it would seem that this pattern is completely in line with the overall field of educational games. In a recent review on the effects of serious games and educational games by Boyle et al. (2016) , simulation games and role-playing games were also the most popular game genres. Likewise, we would argue that simulations might be easier to design than modeling aspects of learning via various game features in such a way that they respond accordingly to the adaptive mechanisms implemented. Moreover, more research is needed to better understand how individual game features affect performance and learning outcomes in general. Importantly, the games in question were not always sufficiently described. That is, it was sometimes unclear which core game loop drove the game and to which genre the game best fit. However, a lack of sufficient details on actual gameplay or of overall information on the games within empirical studies is not unique to the field of adaptive games. As there is no consensus about how to report and describe educational games in the scientific literature, the whole field of serious games or educational games is impacted. The game attributes taxonomy suggested by Bedwell et al. (2012) might serve as one way to achieve consistent reporting standards.

Second, the majority of studies focused mostly on natural sciences, such as math. So-called “softer” disciplines, such as social sciences, were not found in the final sample. In our opinion, these disciplines might be more difficult to operationalize and evaluate. Consequently, implementing adaptive mechanisms within these disciplines is at once more complex and less reliable.

The target audience of the games varied in the investigated sample, from primary and secondary school pupils to university students and vocational training students. Interestingly, however, most of the studies were performed in real-world settings, thus allowing for high ecological validity. Nevertheless, their overall research designs varied tremendously. While most studies employed experimental or quasi-experimental research designs, only three studies evaluated their adaptive mechanisms with pre- as well as post-tests. It was unfortunate that two studies, which constituted 18% of our total sample, used descriptive statistical analyses alone, further emphasizing how the field remains in its early phases. Moreover, the lack of control groups or specific manipulations of individual elements, as well as the absence of process data, in several studies made clear inferences about the impact of the adaptive systems used especially difficult, if not impossible, to generate. However, it should be noted that creating an appropriate control group for studies of adaptive mechanisms is no trivial endeavor, as learning content between adaptive and non-adaptive learning differs by nature.

In most studies, the general goal of adaptive mechanisms was to optimize or improve learning. While a few studies also investigated adaptive mechanisms on behavioral, motivational, cognitive, or social variables, there was a clear focus on learning or knowledge acquisition. This pattern might have originated from the search query we used, as we intentionally focused on cognitive or learning outcomes. More varied was the pattern in which adaptivity was realized. We could not identify a clear trend with regard to the different mechanisms targeted by the implemented adaptivity. The realization of the adaptive approaches, however, was mostly based on surveys or questionnaires. Only four papers used in-game metrics (SCOPUS8 and WOS57) or physiological signals (SCOPUS11 and OTHER11) directly to adapt the games. Thus, there seems to be room for future improvements, especially minding the various methods of assessing process data within games (for a review see Nebel and Ninaus, 2019 ).

Overall, there was no clear or coherent pattern of theoretical or cognitive frameworks used within analytics for adaptation in educational video games. That is, almost all studies used unique theoretical approaches to justify their adaptive mechanisms. It seems that the use of general learning theories was mostly neglected in the identified sample. Only OTHER12 and OTHER13 shared similar ideas based on cognitive load theories (e.g., Harp and Mayer, 1998 ; Mayer, 2005 ). Some of the presented ideas were technologically impressive but lacked a clear theoretical background. We would suggest that interdisciplinary collaborations might overcome this lack of theory-driven research and help to advance the field of adaptive educational games, which in turn might also increase the effectiveness of adaptive mechanisms. Researchers from different fields should act in concert to fully utilize current possibilities in adaptive game-based learning from a technological as well as theoretical perspective. Besides new sensor technologies to make data acquisition easier and learning analytics algorithms that permit deeper insights into the learning process and that can potentially identify misconceptions among learners, a strong theoretical foundation is also required—not only of general learning principles (e.g., Mayer, 2005 ) but also learning domain-specific processes (e.g., embodied learning approaches in mathematics; see Fischer et al., 2011 ).

As mentioned above, all games or studies were aimed at increasing learning outcomes. However, at the same time, not all of these studies actually evaluated learning outcomes alongside user experiences or motivational outcomes. Only about half of the studies found positive effects of adaptation within their evaluated games. At the same time, no negative effects due to adaptivity were reported. That is, many results did not reach statistical significance, which might be attributed to the varying sample sizes used in individual studies. Hence, this literature review cannot make clear conclusions as to the efficacy of analytics for adaptation in educational video games. However, a recent and more general review on adaptive learning technologies in general reached a more positive verdict on the effectiveness of adaptation ( Aleven et al., 2016 ). We therefore remain cautiously optimistic as to the effectiveness of analytics for adaptation in educational video games.

As the current systematic literature review only identified a rather low number of empirical studies, its results needed to be treated with caution as they might not be representative. However, we need to note that identifying only a rather small number of eligible studies is not completely unusual for the field of serious and educational games, in particular when reviewing a subdiscipline of serious games or focusing on specific constructs [c.f. Lau et al., 2017 (9 studies); Eichenberg and Schott, 2017 (15 studies); Perttula et al., 2017 (19 studies)]. Moreover, other recent and more general systematic literature reviews on adaptive learning systems did also only identify a very small number of empirical studies utilizing games or game-like environments (e.g., Aleven et al., 2016 ; Martin et al., 2020 ). That is, our current results seem to be in line with other systematic reviews on adaptive learning systems, suggesting that the field of adaptive educational games and its use of analytics is indeed in its very early phases.

Overall Conclusion

Overall, the presented review contributes a previously lacking overview of and deep exploration into the extant research in the field of analytics for adaptation within educational videogames. Increasing attention to this research area was evident, whereas the overall quantity of relevant experimental research was rather low. In this vein, narrative and frequency analysis could confirm existing opinions about the lack of theory-driven approaches ( Van Oostendorp et al., 2020 ) on a systematic level, although the existence of heterogenous approaches and methodological limitations could prevent further systematization of the dimensions of adaptive systems. This finding, however, is not entirely surprising, not only because of the different disciplines involved but also because some of the contributing research areas struggle with similar challenges themselves. For instance, Human-Computer-Interaction research that investigates video games often encounters substantial methodological and statistical challenges ( Vornhagen et al., 2020 ), whereas educational psychology faces related issues, such as the replication crisis ( Maxwell et al., 2015 ) or infrequent improvements to theoretical frameworks ( Alexander, 2018 ; Mayer, 2018 ). Nonetheless, the results of this review can be used to address these critical issues and improve future empirical research in the field. In addition, the empirical evidence, albeit limited, is promising and could encourage future investigations and practical applications of the resulting adaptive systems. Although positive effects were achieved despite the fact that the researchers often worked without pedagogical, instructional, or educational theories or conducted limited exploratory investigations the highly needed quantity of research simply does not exist yet. This was concluded with reasonable certainty after conducting the systematic review. However, the sample supported only a somewhat superficial systematization and assumption of resulting effects. Therefore, it remains to be seen how such conclusions might change as the field matures and improves. This argumentation holds true for subsequent systematic work in the field. Using this article as a starting point, the pre-registered and open-data information can be used to improve the process and gather more fine-grained insights or even yield different conclusions. Furthermore, future systematic reviews following an identical methodical but different theoretical focus could systematize and contrast important literature in adjacent fields (e.g., Bellotti et al., 2009 ). In addition, different research methods, such as further quantitative or qualitative investigations of the main sample, might enrich the gathered insights. However, in light of the methodological heterogeneity of the current investigation and its small sample size, pursuing such an approach at present is unlikely.

Utility of the Findings and Approach in Responding to the Initial RQs

The approach and its findings can be considered to be successful and insightful with regard to major aspects of the initial research questions. A clear picture of current research was gathered (RQ1), and crucial gaps and heterogenous approaches were clearly identified (RQ2). In this latter respect, the systematic approaches also increased the validity of the conclusions, thereby supporting previous considerations within the field. However, some aspects could not be completely assessed by the pre-registered categories (RQ3), consequently reducing the systematized information collected by this review and its subsequent conclusions. In order to compensate for this limitation, which was discovered after pre-registration, an additional, non-systematic narrative analysis was conducted. Taken together, the approach can be considered fruitful, even though the coding table could be further optimized. In addition, the findings are capable of addressing the initial research questions, even though the answers sometimes contained less information than first assumed during the conceptualization of the review.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: The datasets (i.e., complete coding tables) for this study can be found in the OSF: https://osf.io/dvzsa/ .

Author Contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

The publication of this article was funded by Chemnitz University of Technology.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We want to sincerely thank our student assistants Stefanie Arnold, Gina Becker, Tina Heinrich, Felix Krieglstein, Selina Meyer, and Fangjia Zhai for supporting the initial screening of articles for this review.

D. Albert, and J. Lukas (Editors) (1999). Knowledge spaces: theories, empirical research, and applications . Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Google Scholar

Aleven, V., Mclaren, B. M., Sewall, J., and Koedinger, K. R. (2009). A new paradigm for intelligent tutoring systems: example-tracing tutors. Int. J. Artif. Intell. Educ. 19 (2), 105–154.

Aleven, V., McLaughlin, E. A., Glenn, R. A., and Koedinger, K. R. (2016). “Instruction based on adaptive learning technologies,” in Handbook of research on learning and instruction . 2nd Edn, Editors R. E. Mayer, and P. Alexander (New York, NY: Routledge ), 552–560.

Alexander, P. A. (2018). Past as prologue: educational psychology’s legacy and progeny. J. Educ. Psychol. 110 (2), 147–162. doi:10.1037/edu0000200

CrossRef Full Text | Google Scholar

American Psychological Association (2020a). Jars—QUAL table 2 qualitative meta-analysis article reporting standards. Available at: https://apastyle.apa.org/jars/qual-table-2.pdf (Accessed September 15, 2020).

American Psychological Association (2020b). JARS–Qual | table 1 information recommended for inclusion in manuscripts that report primary qualitative research. Available at: https://apastyle.apa.org/jars/qual-table-1.pdf (Accessed September 15, 2020).

Anderson, J. R., Boyle, C. F., Corbett, A. T., and Lewis, M. W. (1990). Cognitive modeling and intelligent tutoring. Artif. Intell. 42 (1), 7–49. doi:10.1016/0004-3702(90)90093-f

Appel, T., Sevcenko, N., Wortha, F., Tsarava, K., Moeller, K., Ninaus, M., et al. (2019). Predicting cognitive load in an emergency simulation based on behavioral and physiological measures. 2019 International Conference on Multimodal Interaction (ICMI '19), Tübingen, Germany . ( Tübingen, Germany: University of Tübingen ), 154–163. doi:10.1145/3340555.3353735

Arksey, H., and O’Malley, L. (2005). Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 8 (1), 19–32. doi:10.1080/1364557032000119616

Bedwell, W. L., Pavlas, D., Heyne, K., Lazzara, E. H., and Salas, E. (2012). Toward a taxonomy linking game attributes to learning. Simulat. Gaming 43 (6), 729–760. doi:10.1177/1046878112439444

Bellotti, F., Berta, R., De Gloria, A., and Primavera, L. (2009). Adaptive experience engine for serious games. IEEE Trans. Comput. Intell. AI Games 1 (4), 264–280. doi:10.1109/tciaig.2009.2035923

Berta, R., Bellotti, F., De Gloria, A., Pranantha, D., and Schatten, C. (2013). Electroencephalogram and physiological signal analysis for assessing flow in games. IEEE Trans. Comput. Intell. AI Games 5 (2), 164–175. doi:10.1109/tciaig.2013.2260340

Bloom, B. S. (1968). Learning for mastery. instruction and curriculum. regional education laboratory for the Carolinas and Virginia, topical papers and reprints, number 1. Eval. Comment 1 (2).

Booth, A., Sutton, A., and Papaioannou, D. (2016). Systematic approaches to a successful literature review . 2nd Edn. Thousand Oaks, CA: SAGE .

Boyle, E. A., Hainey, T., Connolly, T. M., Gray, G., Earp, J., Ott, M., et al. (2016). An update to the systematic literature review of empirical evidence of the impacts and outcomes of computer games and serious games. Comput. Educ. 94, 178–192. doi:10.1016/j.compedu.2015.11.003

Brom, C., Šisler, V., Slussareff, M., Selmbacherová, T., and Hlávka, Z. (2016). You like it, you learn it: affectivity and learning in competitive social role play gaming. Intern. J. Comput.-Support. Collab. Learn 11 (3), 313–348. doi:10.1007/s11412-016-9237-3

Callies, S., Gravel, M., Beaudry, E., and Basque, J. (2020). “Logs analysis of adapted pedagogical scenarios generated by a simulation serious game architecture,” in Natural language processing: concepts, methodologies, tools, and Applications (Hershey, PA: IGI Global 2019 ), 1178–1198.

Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., and Boyle, J. M. (2012). A systematic literature review of empirical evidence on computer games and serious games. Comput. Educ. 59 (2), 661–686. doi:10.1016/j.compedu.2012.03.004

Doignon, J.-P. (1994). “Knowledge spaces and skill assignments,” in Contributions to mathematical psychology, psychometrics, and methodology . Editors G. H. Fischer, and D. Laming (New York, NY: Springer ), 111–121.

Eichenberg, C., and Schott, M. (2017). Serious games for psychotherapy: a systematic review. Game. Health J. 6 (3), 127–135. doi:10.1089/g4h.2016.0068

Faul, F., Erdfelder, E., Lang, A. G., and Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191. doi:10.3758/bf03193146

PubMed Abstract | CrossRef Full Text | Google Scholar

Felicia, P. (2020). International journal of game-based learning (IJGBL). Available at: www.igi-global.com/journal/international-journal-game-based-learning/41019 (Accessed September 15, 2020).

Fischer, U., Moeller, K., Bientzle, M., Cress, U., and Nuerk, H. C. (2011). Sensori-motor spatial training of number magnitude representation. Psychon. Bull. Rev. 18, 177–183. doi:10.3758/s13423-010-0031-3

Forsyth, C. M., Graesser, A., and Millis, K. (2020). Predicting learning in a multi-component serious game. Technol. Knowl. Learn. 25 (2), 251–277. doi:10.1007/s10758-019-09421-w

Fortenbacher, A., Ninaus, M., Yun, H., Helbig, R., and Moeller, K. (2019). “Sensor based adaptive learning—lessons learned,” in Die 17. Fachtagung bildungstechnologien, lecture notes in informatics (LNI) . Editors N. Pinkwart, and J. Konert (Bonn, Germany: Gesellschaft für Informatik ), 193–198.

Freire, M., Serrano-Laguna, Á., Iglesias, B. M., Martínez-Ortiz, I., Moreno-Ger, P., and Fernández-Manjón, B. (2016). “Game learning analytics: learning analytics for serious games,” in Learning, design, and technology: an international compendium of theory, research, practice, and policy . Editors M. J. Spector, B. B. Lockee, and M. D. Childress (Cham, Switzerland: Springer International Publishing ), 1–29.

Gough, D., Oliver, S., and Thomas, J. (2017). “Introducing systematic reviews,” in An introduction to systematic reviews . 2nd Edn. Editors D. Gough, S. Oliver, and J. Thomas (Thousand Oaks, CA: SAGE Publications ), 1–17.

Grant, M. J., and Booth, A. (2009). A typology of reviews: an analysis of 14 review types and associated methodologies. Health Inf. Libr. J. 26 (2), 91–108. doi:10.1111/j.1471-1842.2009.00848.x

Greller, W., and Drachsler, H. (2012). Translating learning into numbers: a generic framework for learning analytics. J. Educ. Technol. Soc. 15 (3), 42–57.

Groeneveld, C. M. (2014). “Implementation of an adaptive training and tracking game in statistics teaching,” in Computer assisted assessment. Research into E-assessment . Editors M. Kalz, and E. Ras (Cham, Switzerland: Springer International Publishing ), 53–58.

Halpern, D. F., Millis, K., Graesser, A. C., Butler, H., Forsyth, C., and Cai, Z. (2012). Operation ARA: a computerized learning game that teaches critical thinking and scientific reasoning. Think. Skills Creativ. 7 (2), 93–100. doi:10.1016/j.tsc.2012.03.006

Harp, S. F., and Mayer, R. E. (1998). How seductive details do their damage: a theory of cognitive interest in science learning. J. Educ. Psychol. 90, 414–434. doi:10.1037/0022-0663.90.3.414

Hartley, J. R., and Sleeman, D. H. (1973). Towards more intelligent teaching systems. Int. J. Man Mach. Stud. 5 (2), 215–236. doi:10.1016/s0020-7373(73)80033-1

Herz, J. C. (1997). Joystick nation . Boston, MA: Little Brown and Company .

Jesson, J., Matheson, L., and Lacey, F. M. (2011). Doing your literature review: traditional and systematic techniques . Thousand Oaks, CA: SAGE Publications .

Khan, K. S., Kunz, R., Kleijnen, J., and Antes, G. 2003). Five steps to conducting a systematic review. J. R. Soc. Med. 96, 118. doi:10.1177/014107680309600304

Kickmeier-Rust, M. D., Hillemann, E.-C., and Albert, D. (2014). Gamification and smart feedback. Int. J. Game Base. Learn. 4 (3), 35–46. doi:10.4018/ijgbl.2014070104

Klasen, M., Weber, R., Kircher, T. T., Mathiak, K. A., and Mathiak, K. (2012). Neural contributions to flow experience during video game playing. Soc. Cognit. Affect Neurosci. 7 (4), 485–495. doi:10.1093/scan/nsr021

Klinkenberg, S., Straatemeier, M., and van der Maas, H. L. J. (2011). Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation. Comput. Educ. 57 (2), 1813–1824. doi:10.1016/j.compedu.2011.02.003

Lamnek, S., and Krell, C. (2016). Qualitative sozialforschung . Weinheim, Germany: Beltz Verlagsgruppe .

Lau, H. M., Smit, J. H., Fleming, T. M., and Riper, H. (2017). Serious games for mental health: are they accessible, feasible, and effective? A systematic review and meta-analysis. Front. Psychiatr. 7, 209. doi:10.3389/fpsyt.2016.00209

Lopes, R., and Bidarra, R. (2011). Adaptivity challenges in games and simulations: a survey. IEEE Trans. Comput. Intell. AI Games 3 (2), 85–99. doi:10.1109/tciaig.2011.2152841

Mangaroska, K., and Giannakos, M. (2019). Learning analytics for learning design: a systematic literature review of analytics-driven design to enhance learning. IEEE Trans. Learn. Technol. 12 (4), 516–534. doi:10.1109/tlt.2018.2868673

Martin, F., Chen, Y., Moore, R. L., and Westine, C. D. (2020). Systematic review of adaptive learning research designs, context, strategies, and technologies from 2009 to 2018. Educ. Technol. Res. Dev. 68 (4), 1903–1929. doi:10.1007/s11423-020-09793-2

Marton, F., and Säljö, R. (1976). On qualitative differences in learning: I-outcome and process*. Br. J. Educ. Psychol. 46 (1), 4–11. doi:10.1111/j.2044-8279.1976.tb02980.x

Maxwell, S. E., Lau, M. Y., and Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? Am. Psychol. 70 (6), 487–498. doi:10.1037/a0039400

Mayer, R. E. (2005). “Cognitive theory of multimedia learning,” in The Cambridge handbook of multimedia learning . Editor R. E. Mayer (Cambridge, United kingdom: Cambridge University Press ), 31–48.

Mayer, R. E. (2018). Educational psychology’s past and future contributions to the science of learning, science of instruction, and science of assessment. J. Educ. Psychol. 110 (2), 174–179. doi:10.1037/edu0000195

Methley, A. M., Campbell, S., Chew-Graham, C., McNally, R., and Cheraghi-Sohi, S. (2014). PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv. Res. 14, 579. doi:10.1186/s12913-014-0579-0

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., and Group, T. P. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6 (7), e1000097. doi:10.1371/journal.pmed.1000097

Mora, A., Planas, E., and Arnedo-Moreno, J. (2016). “Designing game-like activities to engage adult learners in higher education,” in Proceedings of the fourth international conference on technological ecosystems for enhancing multiculturality—TEEM’16 , Salamanca, Spain , November 2–4, 2016 ( ACM ), 755–762.

Mora, A., Tondello, G. F., Nacke, L. E., and Arnedo-Moreno, J. (2018). Effect of personalized gameful design on student engagement. IEEE Global Eng. Educ. Conf. (EDUCON) , 1925–1933. doi:10.1109/EDUCON.2018.8363471

Moreau, D., and Gamble, B. (Forthcoming 2020). Conducting a meta-analysis in the age of open science: tools, tips, and practical recommendations. Psychol. Methods doi:10.31234/osf.io/t5dwg

Munn, Z., Peters, M. D. J., Stern, C., Tufanaru, C., McArthur, A., and Aromataris, E. (2018). Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18 (1), 143. doi:10.1186/s12874-018-0611-x

Nebel, S., Beege, M., Schneider, S., and Rey, G. D. (2020). Competitive agents and adaptive difficulty within educational video games. Front. Educ. 5, 129. doi:10.3389/feduc.2020.00129

Nebel, S., and Ninaus, M. (2019). “New perspectives on game-based assessment with process data and physiological signals,” in Game-based assessment revisited . Editors D. Ifenthaler, and Y. Jeon (Cham, Switzerland: Springer ), 141–161.

Nebel, S., Schneider, S., Beege, M., Kolda, F., Mackiewicz, V., and Rey, G. D. (2017a). You cannot do this alone! increasing task interdependence in cooperative educational videogames to encourage collaboration. Educ. Technol. Res. Dev. 65 (4), 993–1014. doi:10.1007/s11423-017-9511-8

Nebel, S., Schneider, S., Beege, M., and Rey, G. D. (2017b). Leaderboards within educational videogames: the impact of difficulty, effort and gameplay. Comput. Educ. 113, 28–41. doi:10.1016/j.compedu.2017.05.011

Nebel, S., Schneider, S., Schledjewski, J., and Rey, G. D. (2017c). Goal-setting in educational video games. Simulat. Gaming 48 (1), 98–130. doi:10.1177/1046878116680869

Newman, M., and Gough, D. (2020). “Systematic reviews in educational research,” in Methodology, perspectives and application . Editors O. Zawacki-Richter, M. Kerres, S. Bedenlier, M. Bond, and K. Buntins (Wiesbaden, Germany: Springer VS ), 3–22.

Ninaus, M., Greipl, S., Kiili, K., Lindstedt, A., Huber, S., Klein, E., et al. (2019a). Increased emotional engagement in game-based learning—a machine learning approach on facial emotion detection data. Comput. Educ. 142, 103641. doi:10.1016/j.compedu.2019.103641

Ninaus, M., Tsarava, K., and Moeller, K. (2019b). “A pilot study on the feasibility of dynamic difficulty adjustment in game-based learning using heart-rate,” in Games and learning alliance . Editors A. Liapis, G. N. Yannakakis, M. Gentile, and M. Ninaus (Cham, Switzerland: Springer International Publishing ), 117–128.

Ninaus, M., Kober, S. E., Friedrich, E. V. C., Dunwell, I., Freitas, S. D., Arnab, S., et al. (2014). Neurophysiological methods for monitoring brain activity in serious games and virtual environments: a review. Ijtel 6 (1), 78–103. doi:10.1504/ijtel.2014.060022

Ninaus, M., and Nebel, S. (2020). A systematic literature review of analytics for adaptation within educational videogames. Available at: https://osf.io/dvzsa/ (Accessed September 15, 2020).

Nyamsuren, E., van der Vegt, W., and Westera, W. (2017). “Automated adaptation and assessment in serious games: a portable tool for supporting learning,” in Advances in computer games . Editors M. H. M. Winands, H. J. van den Herik, and W. A. Kosters (Cham, Switzerland: Springer International Publishing ), Vol. 10664, 201–212.

Orji, R., Nacke, L. E., and Di Marco, C. (2017). “Towards personality-driven persuasive health games and gamified systems,” in Proceedings of the 2017 CHI conference on human factors in computing systems , Denver,CO , May 6–11, 2017 ( ACM ), 1015–1027.

Perttula, A., Kiili, K., Lindstedt, A., and Tuomi, P. (2017). Flow experience in game based learning—a systematic literature review. Int. J. Serious Games 4 (1), 57–72. doi:10.17083/ijsg.v4i1.151

Rowe, J. P., and Lester, J. C. (2015). “Improving student problem solving in narrative-centered learning environments: a modular reinforcement learning framework,” in Artificial intelligence in education . Editors C. Conati, N. Heffernan, A. Mitrovic, and M. F. Verdejo (Cham, Switzerland: Springer International Publishing ), Vol. 9112, 419–428.

Schardt, C., Adams, M. B., Owens, T., Keitz, S., and Fontelo, P. (2007). Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med. Inform. Decis. Mak. 7 (1), 16. doi:10.1186/1472-6947-7-16

Schrader, C., Brich, J., Frommel, J., Riemer, V., and Rogers, K. (2017). “Rising to the challenge: an emotion-driven approach toward adaptive serious games,” in Serious games and edutainment applications. Cham, Switzerland: Springer , 3–28. doi:10.1007/978-3-319-51645-5_1

Shute, V. J., and Zapata-Rivera, D. (2012). “Adaptive educational systems,” in Adaptive technologies for training and education . Editors P. J. Durlach, and A. M. Lesgold (New York, NY: Cambridge University Press ), 7–27.

Skinner, B. F. (1958). Teaching machines; from the experimental study of learning come devices which arrange optimal conditions for self instruction. Science 128, 969–977. doi:10.1126/science.128.3330.969

Streicher, A., and Smeddinck, J. D. (2016). “Personalized and adaptive serious games,” in Entertainment computing and serious games . Editors R. Dörner, S. Göbel, M. Kickmeier-Rust, M. Masuch, and K. Zweig (Cham, Switzerland: Springer International Publishing ), Vol. 9970, 332–377.

Sutton, R. S., and Barto, A. G. (2018). Reinforcement learning an introduction . 2nd Edn. Cambridge, MA: MIT Press .

Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learn. Instruct. 4 (4), 295–312. doi:10.1016/0959-4752(94)90003-5

Sweller, J., Ayres, P., and Kalyuga, S. (2011). Cognitive load theory . New York, NY: Springer .

Sweller, J., van Merrienboer, J. J. G., and Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educ. Psychol. Rev. 10, 251–296. doi:10.1023/a:1022193728205

Tondello, G. F., Wehbe, R. R., Diamond, L., Busch, M., Marczewski, A., and Nacke, L. E. (2016). “The gamification user types hexad scale,” in Proceedings of the 2016 annual symposium on computer-human interaction in play , Austin, TX , October 18, 2016 ( ACM ), 229–243.

Van Oostendorp, H., Bakkes, S. C. J., Jay, T., Habgood, J., Brom, C., and Kickmeier-Rust, M. (2020). Adaptivity in serious games through cognition-based analytics | Frontiers research topic. Available at: https://www.frontiersin.org/research-topics/11739/adaptivity-in-serious-games-through-cognition-based-analytics (Accessed September 15, 2020).

Vornhagen, J. B., Tyack, A., and Mekler, E. D. (2020). Statistical significance testing at CHI PLAY: challenges and opportunities for more Transparency CHIPLAY’20 . Canada: ACM .

CrossRef Full Text

Witte, M., Ninaus, M., Kober, S. E., Neuper, C., and Wood, G. (2015). Neuronal correlates of cognitive control during gaming revealed by near-infrared spectroscopy. PLoS One 10 (8), e0134816. doi:10.1371/journal.pone.0134816

Yannakakis, G. N., and Togelius, J. (2011). Experience-driven procedural content generation. IEEE Trans. Affective Comput. 2 (3), 147–161. doi:10.1109/t-affc.2011.6

Yerkes, R. M., and Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. J. Comp. Neurol. Psychol. 18, 459–482. doi:10.1002/cne.920180503

Keywords: analytics, educational video games, serious games, game-based learning, adaptivity, learning, systematic review

Citation: Ninaus M and Nebel S (2021) A Systematic Literature Review of Analytics for Adaptivity Within Educational Video Games. Front. Educ. 5:611072. doi: 10.3389/feduc.2020.611072

Received: 28 September 2020; Accepted: 21 December 2020; Published: 29 January 2021.

Reviewed by:

Copyright © 2021 Ninaus and Nebel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Steve Nebel, [email protected]

† These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Literature Reviews

  • Introduction
  • Tutorials and resources
  • Step 1: Literature search
  • Step 2: Analysis, synthesis, critique
  • Step 3: Writing the review

If you need any assistance, please contact the library staff at the Georgia Tech Library Help website . 

Literature review tutorials

There are many helpful Literature Review video tutorials online. Here is an excellent, succinct (10 min) introduction to how to succeed at a literature review:

Literature Reviews: An Overview for Graduate Students from NC State University Libraries on Vimeo .

For a longer, high quality in-depth look at how literature reviews come together, see this set of  literature review tutorials  from RMIT University.

Literature review resources

We recommend these resources for more information.

Cover Art

This literature review tutorial is from SAGE Research Methods, which has additional resources for learning about literature reviews.

  • << Previous: Introduction
  • Next: Step 1: Literature search >>
  • Last Updated: Apr 2, 2024 11:21 AM
  • URL: https://libguides.library.gatech.edu/litreview

Video analytics using deep learning for crowd analysis: a review

  • Open access
  • Published: 29 March 2022
  • Volume 81 , pages 27895–27922, ( 2022 )

Cite this article

You have full access to this open access article

literature review on video analysis

  • Md Roman Bhuiyan   ORCID: orcid.org/0000-0002-8919-4459 1 ,
  • Junaidi Abdullah 1 ,
  • Noramiza Hashim 1 &
  • Fahmid Al Farid 1  

7954 Accesses

16 Citations

Explore all metrics

Gathering a large number of people in a shared physical area is very common in urban culture. Although there are limitless examples of mega crowds, the Islamic religious ritual, the Hajj, is considered as one of the greatest crowd scenarios in the world. The Hajj is carried out once in a year with a congregation of millions of people when the Muslims visit the holy city of Makkah at a given time and date. Such a big crowd is always prone to public safety issues, and therefore requires proper measures to ensure safe and comfortable arrangement. Through the advances in computer vision based scene understanding, automatic analysis of crowd scenes is gaining popularity. However, existing crowd analysis algorithms might not be able to correctly interpret the video content in the context of the Hajj. This is because the Hajj is a unique congregation of millions of people crowded in a small area, which can overwhelm the use of existing video and computer vision based sophisticated algorithms. Through our studies on crowd analysis, crowd counting, density estimation, and the Hajj crowd behavior, we faced the need of a review work to get a research direction for abnormal behavior analysis of Hajj pilgrims. Therefore, this review aims to summarize the research works relevant to the broader field of video analytics using deep learning with a special focus on the visual surveillance in the Hajj. The review identifies the challenges and leading-edge techniques of visual surveillance in general, which may gracefully be adaptable to the applications of Hajj and Umrah. The paper presents detailed reviews on existing techniques and approaches employed for crowd analysis from crowd videos, specifically the techniques that use deep learning in detecting abnormal behavior. These observations give us the impetus to undertake a painstaking yet exhilarating journey on crowd analysis, classification and detection of any abnormal movement of the Hajj pilgrims. Furthermore, because the Hajj pilgrimage is the most crowded domain for video-related extensive research activities, this study motivates us to critically analyze the crowd on a large scale.

Similar content being viewed by others

literature review on video analysis

Intelligent video surveillance: a review through deep learning techniques for crowd analysis

literature review on video analysis

A Review of Deep Learning Techniques for Crowd Behavior Analysis

literature review on video analysis

Intelligent Techniques for Crowd Detection and People Counting—A Systematic Study

Avoid common mistakes on your manuscript.

1 Introduction

1.1 background.

For the last 14-centuries, the Hajj has been a sacred and religious ritual for Muslims worldwide. That is precisely the context in which all Muslims in the world believe in visiting Mecca and Kaaba at a given date and time. Due to the ever-increasing number of pilgrims, the management of large crowds has become a major issue. Several research works have revealed the massive devastating effect on pilgrims due to lack of crowd management by the authorities. In the last three years, the number of casualties has increased by at least 1426 [ 1 , 2 ]. The use of efficient crowd analysis can potentially help the stakeholders to reduce a large number of casualties at the scene. Traditional approaches to crowd analysis that are merely based on CNN are inefficient. This is mainly because they cannot address the complex requirement associated with the highly dense crowd.

Based on the above consideration, the use of a modified CNN for crowd analysis and monitoring techniques in video surveillance has become important. Apart from the modified CNN, other approaches in crowd analysis have enabled the use of enhanced crowd analysis systems [ 43 ]. This is to ensure a significant reduction in the number of unexpected incidents during pilgrimages.

1.2 Motivation

Over the years, crowd analysis has shown steady improvement due to the emergence of novel approaches. Deep learning techniques have been increasingly used for many applications due to the discriminatory power and the efficient functional extraction revealed. Many approaches used in traditional crowd analysis were unsuitable for modern surveillance due to certain limitations. Ordinarily, modern surveillance systems are characterized by intense uncertainties and dynamicity in crowd motion trends, and the operating circumstances of surveillance tools. This diverse characteristic can complicate the use of current techniques in the monitoring and analysis of the dense crowd. Crowd analysis researchers should develop novel techniques to respond to the concern in the new environment where computer vision is increasingly needed to monitor and analyses many people from video feeds of the surveillance cameras in real time. This includes estimating the diversity of the crowd as well as the density distribution across the entire collection region. Identifying areas above protection can help in issuing previous alerts and could prevent crowd crushes. The estimate of the number of crowds also helps to quantify the importance of the event and the logistics and infrastructure of the event.

1.3 Challenges and gaps

In the last several years, the use of a Fully Convolutional Neural Network (FCNN) has steadily gained prominence for crowd analysis and monitoring. Nowadays, it is very important to perform video analysis and monitor the crowd density of the pilgrims and detect any abnormal movement. To achieve this, there is a need for state-of-the-art technologies such as deep learning. There is a big challenge in analyzing images or videos that involve the movement of large numbers of pilgrims with density ranging between 7 to 8 people per square meter. Besides, it is very difficult to determine density and spot suspicious activity due to the need to recognize effective monitoring features in the extremely concentrated nature of hajj. The issue of using the non-stationary tracking camera as a feed for the crowd video also needs to be addressed.

Most current works analyses the density of the crowd based on face detection to count individual people in the crowd. However, the analysis of crowds based on face detection in highly dense circumstances (> 2000 people) presents several difficult situations. The videos suffer from intense occupation, making the conventional face/individual detectors ineffective. Furthermore, the variety of angles that introduce the problem of perspective may make the capture of crowd videos difficult. These issues require the estimated model to be scale-invariant for large-scope adjustments when the group is not uniformly scaled. Besides, annotating a dense crowd is an exhausting task and it is also not easy to get or to build a good data set that represents huge crowd numbers where terrible incidents might occur from it.

1.4 Research questions

This research will attempt to answer the following key questions:

What are the main difficulties faced in crowd monitoring?

What are some challenges faced during Hajj?

What are the impacts of crowd monitoring on the pilgrims?

What are the major algorithms involved in the crowd analysis domain?

What are the most important datasets in this field of research?

1.5 Contributions

This article focuses on reviewing the latest crowd video analysis technology from the current video surveillance system. The latest approach for crowd analysis relies on the deeply discovered features from the use of the Fully Convolutional Neural Network (FCNN) architecture. We have categorized the related works into two main branches: network-based and image-based. We reviewed the Convolutional Neural Network (CNN) strategies to illustrate the shortcomings and core characteristics in each branch. In addition, we provide detailed analyses on the different approaches in each of the two branches (in terms of n Mean Absolute Error (nMAE) and a full output basis in separate data sets such as UCF, World Expo (WE), Shanghai Tech PartA (STA) and Shanghai Tech PartB (STB).

1.6 Comparison of the proposed work with existing works

Existing works.

Bendali-Braham, M. et al. [ 9 ] analyzed numerous crowd analysis publications. Crowd analysis has two main branches: statistics and behavior. Crowd behavior analysis often discusses anomaly detection. Anomalies can occur in any of the crowd behavior analysis subtopics. The aim of this study is to find unexplored or understudied crowd analysis sub-areas that might benefit from Deep Learning.

Kumar and Arunnehru [ 42 ] reviewed the literature on crowds, including methodologies for crowd surveillance and behavior analysis. The author also described the datasets and methodologies used. Various methodologies and current deep learning ideas have been evaluated. This work explains the many modern methodologies for crowd monitoring and analysis.

Albattah, W. et al. [ 3 ] proposed an image classification, crowd management, and warning system for the Hajj. Images are classified using CNN, which is a deep learning technology that has recently acquired an interest in numerous applications of image classification and speech recognition for the scientific and industrial communities. The goal is to train the CNN model on mapped picture data to classify crowds as severely packed, crowded, semi-crowded, lightly congested, and normal.

Proposed work

This study examines current methods and approaches for crowd analysis from crowd videos, with an emphasis on deep learning techniques for detecting anomalous behavior. These findings motivate us to go on a time-consuming yet fascinating journey of crowd analysis, classification, and detection of any abnormal Hajj pilgrim activity. This study also pushes us to critically evaluate the crowd on a huge scale since the Hajj pilgrimage is the most crowded arena for video-related intensive research activities.

1.7 Paper organization

The rest of the paper is organized as follows: Section 2 gives the information about the research works on the crowd analysis domain, Section 3 describes selected studies on crowd analysis Section 3.1 highlights the unsolved problem that still exists in the domain and possible future research direction. Section 4 presents different categories of CNN techniques. Section 5 concludes the review and details are shown in Fig.  1 .

figure 1

A Roadmap Showing Key Aspects of the Reviewed Works

2 Background studies on crowd analysis

In the previous section, the focus was on the introductory aspects of this work. This section focuses on existing research works related to crowd analysis. We consider crowd analysis using global regression, deep learning, scene labelling data-driven approaches, detection-based methods, CNN-based methods, optical flow detection, Object Tracking, Convolutional Neural Network (2D), 3D Convolutional Neural network, crowd anomaly detection, abnormal event detection for deep model, feature learning based on the PCANet, representation of neural event patterns with deep GMM.

2.1 Crowd analysis by global regression

Many experiments are being developed for the monitoring of pedestrian crowds by sensing or clustering of trajectories [ 11 , 77 ]. However, the techniques are restricted by serious occlusions among people. Specific methods for global count predictions were introduced using low-level-trained regression [ 13 , 14 ]. These methods are more suitable for crowded situations and are more efficient in computation.

Lempitsky et al. [ 43 ] proposed crowd detection analysis based on the regression of the pixel-level object density map. Fiaschi et al. [ 27 ] subsequently employed a random forest to reduce the density of objects to improve training efficiency. In addition to the consideration of space information, a further advantage of regression-based methods is their ability to estimate the number of objects in a video region. Taking advantage of this, the users were able to unveil an interactive object counter system, which can visualize regions, to determine the relevant feedback efficiently [ 14 ].

To solve the problem of occlusion, regression-based crowd counting techniques have been developed. The core concept behind regression approaches is to learn how low-level imaging patches are assigned to the number [ 17 , 61 ]. The removed functions include forecast, border components, textures, and gradient features like local binary pattern (LBP) and Histogram of Oriented Gradients (HOG). Returns include linear regression [ 14 ], and part linear regression [ 17 ]. These methods improve past identification-based methods while neglecting knowledge of spatial distribution by crowds. To use spatial distribution information, a density map is replaced by a solution suggested by Lempitsky et al. [ 43 ]. A linear representation of local patch functions and maps are obtained, and, by combining the entire density maps, the total number of images can then be determined. Pham et al. [ 59 ] can draw a non-linear map of local density patches using the random forests. The density map is based on the most recent approaches to regression.

2.2 Scene labelling data-driven approaches

According to the other well-published large-scale crowd application, data-driven methods are recommended in non-parameter format [ 48 , 60 , 70 ]. Such methods can quickly be enhanced as they do not require preparation. Data-driven methods move the marks from training image to the test image by looking for the most appropriate training photographs and thereby suit the test picture. A non-parameter image parsing technique, as suggested by Liu et al. [ 48 ], searches for a dense area of deformation in between images. Powered by methods of data-driven scene labelling, we get identical scenes and audience patches from the training scenes for an unknown destination location.

2.3 Detection-based methods

To identify and count the pedestrians, detection-based crowd tracking techniques were proposed [ 23 ]. Some authors suggested extracting particular features from appearance-based crowds to count crowds [ 71 ]. However, these methods led to limited recognition of large crowds. Researchers used partial methods to detect parts of crowd bodies, like the head or shoulder, to count footmen to deal with this problem [ 28 ].

2.4 Optical flow detection

Optical flow is a fully vector-based approach that estimates movement on crowd objects through picture frames with matches [ 12 ]. Optical approaches based solely on flow can be used to locate moving crowd objects independently while the subject is turning. It is an extremely dynamic and complex approximation algorithm. The space-time filtering approach uses multiple adjoining frames to remove and acquire one-off images dependent on time series. This approach cannot be used in real-time implementations for objects that are not moving at all.

2.5 Deep learning

To date, many previous experiments have applied deep learning to various monitoring systems, including the re-identification of individuals [ 44 ], and pediatrics recognition [ 58 ]. Their popularity emerges from the influence of profound patterns. Sermanet et al. [ 63 ] found that deep models have features extracted that are more effective for many applications than a handcrafted feature. The CNN based algorithm used by [ 37 ] is largely based on the assumption that a vast number of individuals who rely on the use of a single function are very difficult to obtain. They use a blend of engineered products to overcome this: HOG mostly based on head detections, Fourier measurement, and counting points of interest. The method used the Markov Random field in several dimensions. However, the method suffers from accuracy decline and changes in weather, distortion of mood, extreme occlusion, etc. Zhang et al. use deep networks to evaluate the count of people [ 83 ]. Their model is guided by image maps. The development of such maps is a complicated procedure. Wang et al. teach a deep model for crowd estimation [ 74 ]. The network measures the crowd and the distribution of crowd density. A sample block of CNN model is shown Fig.  2 [ 10 ].

figure 2

Overview of the crowd counting

2.5.1 2D convolutional neural network

Neural networks are also known as neural networks with common weight. ConvNets are the multi-layered deeper neural networks that work with data from the real world. This is achieved by using sensitive zone fields (more commonly kernels) with the same parameters, defined as weight sharing, for all of the potential entry points. The theory is that any node from a previous layer is filled with a tiny kernel window. The distribution of weights across computing devices in the CNN system decreases the diversity of free variables, improving the overall group efficiency. Weights are repeated over the input file, allowing the translations within the data to be intrinsically insensitive.

Figure 3 displays a standard frame of convolution. A few planes (referred to Feature Map (FM)) are usually used on all layers to distinguish more than one element. The layers are known as convolutional layers. The group has a traditional gradient-descent propagation method. To define and derive spatial functions, 2D CNNs are applied to a video dataset.

figure 3

2D convolution

2.5.2 3D convolutional neural network

The analysis uses the 3D Convolutional Neural Network platform for anomaly detection among crowds. However, a few fundamentals need to be understood before solving the anomaly detection. Convolution has two functions, m and n, which produces a third function [ 40 ]. This is generally known as an altered version of one of the authentic elements, which introduces the difference between the two functions equivalent to the quantity of one of their original functions. The m and n are written as m*n and are lined with a megastar or an Asterisk. The measures are defined by one reversed and transferred to the integral of the product of the two features. As such, it measures a specific kind of integral transform: (m x n) (t) = \( {\int}_{-\infty}^{\infty }m(T)n\left(t-T\right) dT \) while the symbol t used above does not represent the time area. Figure 4 illustrates the 3D convolution.

figure 4

3D convolution [ 40 ]

In the time dimension, the size of the convolution kernel is 3. The shared weights are colored to match the connection settings. 3D convolution processes the overlapping 3D cubes in the input video to get motion information, as is done in 3D convolution.

Note that only one feature can be extracted from the frame cube using a 3D convolution kernel, therefore the kernel weights may be applied again to the whole cube. CNN has a general design philosophy in which they try to produce as many feature maps as possible by drawing from several sorts of features extracted from the same kind of low-level feature maps. This is done by applying numerous 3D convolution operations to the preceding layer at the same place with distinct cores as depicted in Fig.  5 .

figure 5

Feature extraction from numerous consecutive frames

Consecutive frames may be convolved with numerous 3D convolutions to extract numerous features. Figure 1 color codes the connection set to show shared weights, making them seem to be the same color. All 6 connected sets have unique weights, and thus you get two feature maps on the right, even though they all link to the same subset of data.

2.6 Crowd anomaly detection

Event classification is usually impossible for testing, and an event variant is evaluated in standard videos by current algorithms [ 78 ]. Irregular events are detected by breaking up the proposed algorithms to identify abnormally occurring events into two forms, namely: (1) trajectory dependent techniques. However, abnormal trajectories are very rare as opposed to daily trajectories. 2) Local algorithms based on lines. Here, anomalies are viewed as chains that integrate dramatic case patterns. Exploring possible laws across ordinary paths defines odd behaviors as disobedient to such policies. Experience several skills, including the trajectory interpretation, speed, and acceleration, dependent on trajectories [ 5 ]. The function of its methodology is conducted with a cluster collection, and the ultimate cluster outcomes are achieved by considering clusters with all capabilities. Anomalies are viewed as clusters with few members and tests far from these cluster centers, via adaptive particle sampling and Kalman filtering, problems with occlusion and segmentation can be dealt with [ 18 ]. In other works, traceability was also considered at the particle and function factor level. An example using an approach to particle dynamics was proposed by [ 78 ], which derives messy invariant properties from symbolic pathways. Cui et al. [ 20 ] demonstrate the direction of interest and reflect the dynamism of the crowd through the use of potential measurements.

2.6.1 Abnormal event detection for deep model

At this stage, the method proposed by [ 26 ] is defined as an element. The 3D gradient is initially calculated for each video frame. Secondly, excessive-stage skills are routinely extracted from PCANet for video events. Deep GMM is used in regular pattern modelling [ 26 ].

2.6.2 Feature learning based on the PCANet

Spatial and temporal characteristics such as intensity, color, gradient, and optical flow are manually selected for most current techniques. The ability to gradient 3D is calculated for video events in this paper. Research on the power and effectiveness of 3D gradient functions for unusual occasional detection [ 52 ]. In the meantime, both look and motion clues for the 3D gradient prototype. This paper uses a deep neural network to abstract high-tier capabilities, primarily based on 3-d gradients. Deep mastering in many applications of computer vision has achieved significant performance in the last few years [ 15 ]. This benefits from the non-linear multi-layer variations, which can extract meaningful and discriminatory characteristics adaptively. There are no marked unusual activities for schooling in the field of anomaly detection. The training dataset is simply supplied with the handiest regular videos. Therefore, this article will learn from video opportunities PCANet capabilities, which are simple and effective, uncontrolled approaches [ 15 ].

2.6.3 Representation of Normal event patterns with deep GMM

GMMs are used in many works to examine daily cycles of events. But the simulation of complex video occasions includes significant single Gaussian additives [ 8 , 24 , 53 ]. The dynamics of these approaches eventually expand dramatically. A deep GMM is now available in this paper as a standard edition of video [ 72 ]. Figure 4 indicates the activation of deep GMM. Figure 6 showed the Deep GMM pattern.

figure 6

Visualizations of single Gaussian, GMM, and deep GMM distribution [ 88 ]

3 Selected studies on crowd analysis

This part gives a thorough overview of recently selected and examined crowd analysis investigations in specific works that have been published within the previous five years. In the current and future surveillance systems, the works have been carefully selected and adapted to satisfy the difficult requirement of much needed dense crowd analysis.

Bendali-Braham, M. et al. [ 9 ] reviewed several papers related to crowd analysis. Crowd analysis is often divided into two branches: crowd statistics and crowd behavior analysis. Anomaly detection is one of the much talks on crowd behavior analysis. Although there is no universal definition of an anomaly, each of the crowd behavior analysis subtopics may be prone to an abnormality. The goal of the study is to identify crowd analysis sub-areas that have yet to be investigated or that seem to be seldom addressed via the prism of Deep Learning.

Kumar, A. and Arunnehru, J. [ 42 ] presented literature studies for organized and unorganized crowds, as well as approaches for crowd monitoring and behavior analysis. The author also gave a description of the datasets and the techniques provided for them. [ 6 ] Different methods based on traditional techniques and on modern profound learning principles have been reviewed. This publication helps researchers to comprehend the many state-of-the-art approaches utilized for monitoring and analysis of crowds.

Albattah, W. et al. [ 3 ] presented an image classification crowd control system and an alerting system to control millions of Hajj crowds. The image classification system is heavily based on the proper dataset utilized for the formation of the CNN, which is a profound learning methodology that has lately gained attention in many applications of image classification and voice recognition for the scientific community and industry. The objective is to train and make the CNN model accessible for usage with mapped image data in the classification of crowds as heavily-crowded, crowded, and semi-crowded, light crowded and normal.

Dargan, S. et al. [ 21 ] concentrated on the basic and advanced structures, methodology, motivating factors, characteristics and limitations of deep learning concepts. In this paper, the main problems of deep learning, traditional machine learning and conventional training were also marked by considerable disparities. It is centered on studying and chronologically studying the numerous applications of deep learning, as well as methodologies and structures used in a variety of fields.

Gupta, Kumar and Garg [ 32 ] proposed to identify objects having hand-crafted functionality based on Oriented Fast and Rotated BRIEFs and Invariant Feature Transformations. Size Invariant Transforming Feature Invariant Si (SIFT) is very effective in analyzing various pictures of orientation and size. A strategy for reducing the size of the image characteristic vector is being researched. For testing the realization of the project work, K-NN, decision tree and random forest classifiers are utilized.

In Wang et al. [ 76 ], a large congested crowd counting dataset called NWPU-Crowd has been built. This was intended to tackle the problem of small datasets, limiting the need to meet the CNN algorithms that are supervised. The built dataset includes several lighting scenes and has the widest range of densities (0 to 20, 33). Besides, they developed a benchmark website, which allows researchers to submit the findings of the trial set impartially. The data characteristics are further explained based on the proposed dataset and performance based on certain mainstream state-of-the-art (SOTA) methods.

In Zeng et al. [ 82 ], the DSPNet, which encodes multidimensional features for large crowd counts, was proposed as a modern deep learning network. It is a question addressing especially the current challenge of counting numbers in highly congested scenes because of scale variant. First, an interface and context were used in the DSPNet model. The frontline is a deep-neural network, while the deep-neural centralized backend network uses a “complete integration ratio of information at different levels. The (SCA) module cleaner is capable of effectively integrating the multiscale functions and improving image representations.

Singh, K. et al. [ 68 ] proposed to detect visual anomalies in crowded scenes using the ConvNets and the classification pool capabilities, the new principles of Aggregation of Ensembles (AOE). The plan used a collection of various finely tuned Convolutional Neural Networks based on the idea that improved feature sets from different CNN architects. For the creation of versions of the SVM type, the suggested AOE definition used the finely tuned ConvNets for a fixed-function extractor. It then combined the chances of detecting deviations in the crowd frame sequences. Experimental findings suggested that the proposed aggregation of finely tuned CNNs from various architectures is more efficient, in contrast to the other existing approaches in benchmarks.

Tian et al. [ 69 ] proposed a modern understanding called the multi-density count, to compute crowds in different densities. The Density Conscious Network consists of several sub-networks pre-trained in various densities. First of all this module collects pan-density information. Secondly, the feature enhancement layer (FEL) captures global and local qualitative characteristics and produces weight for each density function. Then, the spatial background is integrated into the Fusion Function Network (FFN), which understands these unique density functions. To help measure the efficiency of the global and local forecast, Patch MAE (PMAE) and Patch RMSE (PRMSE) metrics were also introduced. Extensive testing of four crowd counting datasets, ShanghaiTech, UCF_CC_50, UCSD, and UCFQNRF, showed that PaDNet achieved state-of-the-art efficiencies and high energy levels.

Liu et al. [ 51 ] introduced the deeper and end-to-end trainable architecture that blends features obtained with several size receptive fields. In other words, the methodology adapted to the level of background data required to forecast crowd density accurately. This results in an algorithm that surpasses the latest methods of crowd counting, particularly in the case of strong viewpoints.

Hossain et al. [ 34 ] proposed to tackle crowd analysis problems with a revived scale-aware attention network. Their model automatically concentrated on some image-appropriate global and local scales using the attention framework commonly in recent profound learning architectures. Combining these attentions on national and local scales, the platform provides a range of state-of-the-art crowd data sets.

Gao, Wang, and Li [ 31 ] suggested a Perspective Crowd Counting Network (PCC Net) that consists of three parts: 1) Density Map Estimation (DME) that focuses on very local learning features for estimating the density maps; 2) Random High-Level Density Classification (R-HDC) that extracts regional characteristics for predicting coarse density points for random image patches; To express variations in viewpoints in four directions (Down, Up, Left and Right) the DULR module is also implemented in PCC Net. Five standard data sets are evaluated with the new PCC Net to produce the most innovative performance and successful analyses with four additional data sets.

Liu et al. [ 49 ] proposed the DecidNet (Detection and Density Estimation Network) as the novel end-to-end crowd counting system. It changes the acceptable counting mode for different image positions depending on its true density conditions. DecideNet starts by measuring the multitude of density and generating different maps for identification and regression. It provides an attention element to calculate the precision of the two forms of estimates in an efficient way to detect unavoidable differences in densities. The final crowd counts are collected from all forms of density maps with the help of the attention module. Experimental findings showed that the proposed approach is at the leading edge of three rigorous data sets for crowd counting.

Li, Y., Zhang, X. and Chen, D. [ 45 ] proposed CSRNet that consists of two major components: the front end for 2D retrieval of information, the convolutional Neural Network (CNN), and the back end, a dilated CNN with dilated kernels that can be used to create wider processing areas and eliminate pooling operations. Based on its pure convolutional form, CSRNet is an easily trained model. [ 76 ] Applied CSRNet on four datasets (Shanghai Tech, UCF CC 50, World EXPO 10, and UCSD), delivering comparable results. CSRNet has a mean actual error (MAE) that is 47.3% less than the previous state-of-the-art system in the Shanghai Tech Part B dataset. They expand targeted uses, such as vehicles in the TRANCOS data collection, to other objects. Results demonstrated that, with a MAE lower than that of its previous state-of-the-art approach, CSRNet greatly increased the prediction efficiency.

Liu et al. [ 50 ] proposed the Deep Recurrence Space-Aware Network Unified Neural Networks Framework that addresses the two issues with an area-oriented process of refinement in a learning spatial transformation module. In particular, this architecture includes a recurrent spatial refinement module (RSAR) which iteratively performs two components: (i) the spatial transformers network which locates the attention-related field dynamically from the crowds’ map and renders them to the correct scale and rotation for optimal estimation of the crowd; Comprehensive experiments on four challenges demonstrate the effectiveness of approach. Specifically, we will obtain an increase of 12% in the biggest data collection, WorldExpo 10, and a 22,8% change in the most challenging dataset UCF CC 50, relative to the current best-performing approaches.

Idrees et al. [ 38 ] suggested a novel method that tackles the counting, calculation of density maps, and position of people in a crowded crowd picture at the same time. The terminology depends on the essential presumption that the three issues are fundamentally associated, such that the breakdown feature is decomposable to clarify a profound CNN. Given the need to find images and annotations of good quality, UCF-QNRF data collection is initiated to fix vulnerabilities in previous data sets and manually involves 1,25,000 users with dot annotation labels. Finally, they employed estimation approaches, including those developed specially to count the crowds in comparison to the previous deep networks of CNN. This is the most complex data series with the most dynamic situations with the highest amount of crowd annotations, and the method.

Marsden et al. [ 54 ] suggested the deep residual architecture Resnet Crowd, monitoring of aggressive activity, and classification of crowd mass. A new 100 image data collection known as multitask crowd is developed to test and assess the emerging multi-objective system. The latest data collection is the first completely annotated computer vision data set to count the crowd, identify aggression and degree of the scale. The experiment shows that multitask approach improved individual task performance for all tasks, particularly violent behavior, which is up to 9% in the AUC (Area below the Curve) ROC curve. Many qualified versions of the Resnet Crowd have tested additional metrics underlining the superior generalization of the multi-target research models.

Sindagi and Patel [ 65 ] proposed a novel network of CNNs with cascades that collectively the classification and the computation of the density map. The division of the crowd into different categories amounts to a gross estimate of the total number of the picture. Hence, a high degree is integrated into the density measurement network. It allows the network layers to understand global discrimination, which helps to approximate increasingly defined density maps with smaller counting errors. Shared preparation fully takes place. Extensive experiments with publicly available highly challenging datasets have shown that the approach proposed achieves less count errors and better density maps compared to the latest methods.

In Bansal et al. [ 7 ] three common feature descriptor techniques are employed in this research for object identification system experimental work: Scale Invariant Feature Transform (SIFT), Speeded Up Robust Feature (SURF), and Oriented Fast and Rotated BRIEF (ORB). The purpose of this article is to compare the performance of these three feature extraction approaches, especially when their combination leads in a more efficient recognition of the object. The authors conducted a comparative investigation of several feature descriptor techniques and classification models for 2D object recognition in this article.

In Elbishlawi et al. [ 25 ] this article discusses approaches for assessing congested situations that are based on deep learning. The approaches evaluated fall into two categories: (1) crowd counting and (2) crowd detection and recognition. Additionally, crowd scene datasets are analyzed. Along with the surveys mentioned above, this article presents an assessment measure for crowd scene analysis methodologies. This measure quantifies the discrepancy between the estimated and real crowd counts in crowd scene footage.

A variety of CNN-based approaches were adopted in crowd counting, taking advantage of the powerful capacity of CNN to understand representations. The Wang et al. [ 74 ] methodology, as a pioneer for CNN crowd counting, introduced several convolutional layers to extract features and transmitted these features into a fully connected layer for predicting the number in extremely dense crowds. In further work [ 45 ,  46 , 57 ,  74 , 81 , 84 , and], a network was pre-trained for some scenes, and related training data was chosen to optimize the pre-trained network based on prospective information. The key downside is that it does not always have accessible viewpoint details. Zhang et al. [ 85 ] further suggested (MCNN) architecture to approximate the density diagram, as it noted that the densities and appearances of image patches differ greatly. Diverse columns are deliberately designed to know differences in density by multiple feature resolutions throughout their research. Given multiple image sizes, separating columns to identify density crowds is difficult, resulting in some inefficient divisions due to lack of identification. To simultaneously predict the density grading and construct a map based on high-level knowledge, Sindagi et al. [ 66 ] suggested a multitask system. They also suggested a 5-branch CNN contextual pyramid short for CP-CNN Sindagi and Patel, [ 65 ], which would incorporate popular contextual information to lower the number of errors and high-quality maps. However, CP-CNN cannot be used for real-time scenario analysis. Sam et al. [ 62 ], inspired by MCNN, uses a switch-CNN, in which the switch classifier is educated in choosing the best regression for an input patch. Switch-CNN will use a column network only in conjunction with that patch’s classified results during the prediction process, without including any of the qualified subnetworks. Not only at the overall image level, but even at the image patch level, there are high vector densities. There is also no identification efficiency on the single subnet, and the computational covariation change issue cannot be solved. Kang et al. [ 41 ] proposed the fusion of multiscale input density predictions, while Deb et al. [ 22 ] developed an aggregate multi-column dilated conversion network for the free counting of perspectives.

3.1 Challenges and future direction

Gao et al. [ ]

Explored the crowd counting models, to estimate density maps based on CNN. Compared the efficiency of crowd counting based on the data sets

• The research attempts to draw rational conclusions and predictions regarding the potential growth of crowd counting and, in the meantime, will provide workable solutions in other fields to the issue of counting items.

• The density maps and prediction outcomes for comparison and checking of certain standard algorithms in the NWPU validation kit. In the meantime, density charts and measurement instruments are also developed.

NWPU dataset for comparison and testing.

Wang et al. [ ]

• Created NWPU-Crowd dataset evaluated the output of state-of-the-art (SOTA) methods using this data.

• Current databases available are so limited that they cannot meet the criteria of the algorithms tracked by CNN.

• NWPU dataset

• In particular, MAE dropped in the classes by 36.6%, 25.7%, 22.2% and 12.7%, respectively.

• The NWPU crowd dataset has more crowd scenes than previous datasets.

Ilyas, Shahzad and Kim, [ ]

• Employed ML and AI for image, classical artisan crowd counting.

• With many questions, such as occlusion, noise, and unequal distribution of objects and unequal object sizes, neural networks are promising advances in the counting of and perception of smart camera crowds.

• This is a review paper. They have done some and compared the previous work.

Li et al. [ ]

• Devised the most advanced CNNs through a multi-vision framework

• With the aid of complementary data obtained from many cameras to have a clearer insight into the observation area, the issue of restricted view and occlusion in single views can be answered.

• This paper used PETS2009

• The accuracy of this paper went from 83.2% to 89.8%.

Singh et al. [ ]

• Suggested a novel AOE to improve ConvNets’ capacities and classification pools.

• The big obstacle to identifying the anomalies efficiently in crowds is to use function sets and strategies replicable in any crowded situation.

• Proposed dataset UCSD Ped-1, UCSD Ped-2.

• Accuracy state-of-the-art on UCSD Ped-1 (0.946), and UCSD Ped-2 (0.959).

Zeng et al. [ ]

• Proposed a DSPNet Framework based on both backend and frontend CNN. DSPNet conducts the entire RGB-based image data analysis to promote model learning and minimize the lack of contextual details.

• They proposed a novel DSPNet (Depth Scale Cleanser Network) that encodes multiple features and reduces the loss of context information for dense crowd counting to resolve this issue.

• Three public datasets are used UCF-QNRF, ShanghaiTech, and UCF_CC_50 datasets

• Comparison of DSPNet with state-of- the-art methods on the UCF-QNRF dataset (MAE 107.5 and RMSE 182.7)

• Comparison of DSPNet with state-of-the- art methods on the UCF_CC_50 dataset (MAE 243.3 and RMSE 307.6)

• Comparison of DSPNet with state-of-the-art methods on the ShanghaiTech dataset. ShanghaiTech dataset Part A (MAE 68.2 and RMSE 107.8) and ShanghaiTech dataset Part B (MAE 8.9 and RMSE 14.0)

Gao, Wang, and Li, [ ]

• Proposed a Perspective Crowd Counting Network (PCC Net), consisting of DME based on local map estimates and Random R-HDC.

• Due to high presentational similarity, shifts in vision, and the severity of congestion, the number of people in one image is challenging.

• Public ShanghaiTech dataset is used for this experiment.

• Accuracy MAE of 11.0 (6.2-point improvement) and MSE of 19.0 (8.4-point improvement).

Liu, Salzmann and Fua, [ ]

• Devised a large, end to end trainable architecture

• This creates an algorithm that goes beyond modern crowd counting methods, especially if the effects of perspective are high.

• ShanghaiTech dataset Result: Part A (MAE 62.3 and RMSE 100) and Part B (MAE 7.8 and RMSE 12.2).

• UCF-QNRF Dataset Result: (MAE 107 and RMSE 183).

• UCF CC 50 dataset Result: (MAE 212.2 and RMSE 243.7)

• World-Expo’10 dataset Result MAE 7.2 on average

Hossain et al. [ ]

• Consider the approximation of image density map, where each pixel size of the camera corresponds to the density of the crowd at the corresponding position of the image.

• The disparity in the scale of images is an obstacle for crowd counting.

• This research suggests a new, size-conscious network of commitment to this task in this research.

• Several datasets used.

• ShanghaiTech Part B dataset Result: (MAE 16.86 AND MSE 28.41).

• Mall dataset Result: (MAE 1.28 and MSE 1.68).

• UCF CC 50 dataset Result: (MAE 271.60 and MSE 391.00)

• This paper accuracy: MAE 16.86 and MSE 28.41.

Liu et al. [ ]

• Considered an end-to-end crowd counting system

• This system is based on three demanding crowd counting data sets to provide state-of-the-art efficiency.

• Using Mall dataset DecideNet Result: MAE 1.52 and MSE 1.90.

• For the five scenes tested, an average MAE of 9.23 was found.

• This is the best result of all the comparisons done in this section, which was shown to provide 0.17 better than the second best “Switching-CNN technique.

Li, Zhang and Chen, [ ]

• Considered an information-driven, in-depth learning framework to include a congested scene recognition network known as CSRNet

• The proposed CSRNet consists of two main components: the frontend neural network (CNN) for 2D object extraction, and a dilated, backend CNN for broader receiving areas and eliminating pooling operations by using dilated kernels. Owing to its pure convolutional form, CSRNet is an easily trained model.

• Used four datasets (ShanghaiTech dataset, the UCF CC 50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset)

• The results reveal that with 15.4% less MAE than the prior state-of-the-art methodology, CSRNet greatly increases quality in output.

Amirgholipour et al. [ ]

• Considered an A-CCNN that captures system size variability to enhance accuracy.

• There are many attempts reported, challenges in the real world, such as large variations in image size and extreme occlusion among individuals, make this a challenge quite difficult.

• Used two datasets (UCSD Dataset, UCF CC 50)

• ACCNN performs favorably in upscale and minimum subsets versus the other techniques with the lowest MAE ever, 1.04 and 1.48.

Marsden et al. [ ]

• Considered Resnet Crowd, a profound residual architecture for concurrent crowd counts, aggressive device identification, and classification of crowd mass.

• A Multi-Task Crowd data collection technique was proposed

• The trained Resnet Crowd model will also be tested using some more benchmarks to underline the superior generalization of multi-target research models.

• Used UMN dataset on the performance of Crowd Anomaly detection and obtained accuracy (AUC 0.84).

Sindagi and Patel, [ ]

• Considered to jointly learn the multitude count classification and density map estimation proposed a new end-to-end cascade network of CNNs.

• To divide the crowd count into different categories, the total numbers to the image are measured ground and integrated into the density estimation network at a high point.

• According to non-uniform spatial differences, estimating the number of audiences in heavily populated scenes is an exceedingly challenging task.

• Result comparison: Error estimations on the UCF CC 50 dataset and got the accuracy, MAE 322.8 and MSE 397.9.

• Result comparison: Error estimations on the ShanghaiTech dataset and got the accuracy, Part A (MAE 101.3 and MSE 152.4) and Part B (MAE 20.0 and MSE 31.1).

Marsden, McGuinness, et al. [ ]

• It uses a reliable and functional crowd count estimator using computer vision techniques.

• Employed the concept of a completely convolutions-based paradigm for crowd counting posed in high-density scenes

 

• Used two datasets (ShanghaiTech dataset, UCF CC 50).

• Result achieved: MAE 126.5 and MSE 173.5 using ShanghaiTech dataset Part A and ShanghaiTech dataset Part B, MAE 23.76 and MSE 33.12.

• Method to improve the cutting-edge status for using UCF CC 50 dataset, obtain accuracy MAE and MSE by 11% and 13%.

4 Categories of CNN techniques

4.1 cnn techniques.

CNN development categorization plays a significant role at their granular level. The techniques allow researchers to develop algorithms for remote control and tracking systems in military combat, emergency management, public events, etc. with the various crowd counting systems. Figure 7 presented the categories of CNN based crowd analysis techniques.

figure 7

CNN Techniques

There are two forms of public and private dataset available. Public databases are freely accessible via the internet and private databases are normally owned by their respective authors/organizations. We list 5 most common and most recognizable dataset and their basic characteristics in Table 1 .

4.2 Basic CNN techniques

This section covers crowd counting for an architecture that contains a simple CNN. Simple CNN approaches may be regarded as leaders in in-depth density analysis, utilizing the basic design in their network to produce a real-time crowd counting. Table 2 displays the basic CNN features, used databases, and architectures.

Fu et al. [ 29 ] proposed a formula for measuring two-tier density using a simple CNN model. Their first task was the estimation of the distribution of crowds (i.e. the insulation in various amounts). Through deleting related connections, calculation speed is improved. The second task was to identify racial traits by using a cascade. Likewise, an initial layer-based learning approach to divide the number of motorcycles into overlapped patches was proposed in Mundhenk et al. [ 56 ] for counting the image region. To minimize MSE, a change was made to distinguish between unallocated cars and contextual details. Wang et al. [ 74 ] developed a variant of the Argumentation Technique for the FCNN to enhance the system robustness training knowledge on diverse and varied processes. Zhang et al. [ 86 ] introduced the CNN model of tracking videos to count the number of people crossing a fence. To address the complexity of the principal problem, the original issues were split into two sub-problems (estimated crowd density and crowd speed). In Hu et al. [ 35 ], the author suggested a fundamental research approach to approximate in images the mid to the high-level audience. To approximate total density, a regression was applied to calculate the number of people in an area by including the average local densities. Throughout the process of their research, ConvNets software was used to approximate crowds throughout their respective local regions to know a function vector. In many uses, including novel and outdoor counting, the writers in [ 73 ] used a simple CNN. A layer raising or selective sampling (i.e., reducing inferior sample effects of quality) reduces computation time and improves the accuracy of counting by increasing the number and the iteratively generating of the new classifier to correct faults in the previous classifier. Four ensemble networks, based on prior errors, make every network possible.

It should be noted that the majority of strategies within this sub-category rely primarily on the calculation of density rather than crowd count. Thanks to an over-simplified design, these methods cannot be effective in strongly occult and diverse viewpoints. The level of the density estimation in these techniques redundant samples can be enhanced by elimination. The probability of errors can also be minimized by iteratively reducing errors in various network layers.

4.3 Context CNN technique

This sub-category covers crowd counting tools to boost the precision of counting, leveraging local and global contextual information. In a localized area, the spatial knowledge of a picture means an overview change of adjacent pixels (i.e., contextual information). The company makes very useful technologies in companies that contain figures, such as the number of flying drones or cars in parking areas. These approaches also help to achieve resolution and distribution of various distance-dependent images. The context-CNN with its characteristics, data sets, and architectures is seen in Table 3 .

It ought to be noted for dilated convolution in real-time, qualitative data may be used. Deeper CNNs are primarily used to improve the density map performance and to maximize the estimation accuracy by an adaptive distribution network. But at the cost of greater network sophistication, this qualitative knowledge is collected. In this sub-category, techniques for real-time applications with low complexity requirements would not be feasible.

4.4 Scale-CNN technique

Basic-CNN techniques that have developed in terms of size variations (in order to increase robustness and accuracy) are termed Scale-CNN techniques. Size variance means that the resolution affected by multiple viewpoints varies. Contextual knowledge of the image indicates a relation to the general place for the total creation of neighboring pixels (i.e., adjacent images). Strategies within this framework are often very beneficial for projects that need quantitative statistics, such as the number of flying drones or parking vehicles [ 64 , 67 ]. These methods may also be used to achieve depth ranges and distribution based on the distance of such images. Table 3 describes its features, common databases, and structures in the history of CNN.

In Chattopadhyay et al. [ 16 ] for example, authors suggested the concept of a regular item count to consider the novel concept of associative subtlety (the capacity of humans to provide rapid counting estimates/evaluations for small objects). Zhang et al. [ 87 ] suggested a focus model (high probability indicates the head position) for head position recognition. Multiscale divisions also marginalized the non-head area. Li et al. [ 45 ] merged the CNN and distributed convolution to increase the accuracy of a density map (extension kernels to bypass pooling). In different congested scenarios, an extended convolutional layer was often used to integrate contextual knowledge (Table 4 ).

Han et al. [ 33 ] suggested in silent images a spontaneous CNN Markov multitude counting system. The entire image was divided into small overlapping patches to strip functionality from patches and to reverse patch count by using fully connected NN’s. Because of overlaps, the next patches are strongly linked. To increase the overall precision of the crowd count, MRF used the association to smooth counts across the adjacent local patches. In Wang et al. [ 75 ] the authors suggested a network based on density tolerance to count the number of objects correctly. A common system, educated on one dataset and then based on another, was suggested. The degree of density was determined by choosing a network educated in numerous data sets. The system contained three networks: a medium or high-density network that changed and the two other networks that counted. Liu et al. [ 50 ], a profoundly spatially aware replicated network, proposed using a space transformer feature as a tracker for both size and rotational changes.

It ought to be noted that for dilated convolution, real-time spatial information can be used. In particular, it is possible to use a larger, dilated CNN to increase the map performance and to optimize measurement accuracy by using an adaptive density net. These conditions, however, come at the cost of increased reliability of the network. Therefore, for modular structures with low complexity specifications, approaches in this article could not be feasible.

4.5 Multi task-CNN techniques

CNN methods, which consider not just crowd counting but others, such as sorting, segments and uncertainty calculation, and crowd behavior monitoring, are the multiple-task CNN techniques. We analyze the association between these different tasks and their impact on the results in the sense of multitasking CNN.

In [ 6 ], the authors suggested an architecture from ConvNet to count the number of penguins. Because of the occlusion and the various sizes, a multitasking computing system was recommended to address foreground segmentation and depth prediction uncertainty. The multitasking methodology has been studied by Idrees et al. [ 37 ] three major problems are connected: crowd count, an estimate of density, and location. The density estimation and location enabled the counting process. The deep and shallow FCN is being suggested by Zhu et al. [ 89 ]. Features taken from a profound FCN were merged with two deconvolution layers to render the output image identical in resolution to the input image. Huang et al. [ 36 ] suggested a CNN-based crowd processing methodology instead of relying on the visual properties scaling. The problem with crowd counting was broken down into a multitask problem in their study. The multitasks included the extraction and counting of significant quantities of semantic knowledge and the mapping to the semantic model (body map and density map) of the input point. Yang et al. [ 81 ] proposed a multi-column neural network (MNCN) for resolving significant differences in scale. The multi-columns have been used with three main improvements. Second, up and down scanning was used to assess multiscale functionality. Second, for sampling errors, deconvolution has been used. Thirdly, costs per size were reduced to enhance the understanding of applications. Liu et al. [ 50 ] suggested a self-managed system of improving data for better accuracy testing.

It should be noted firstly, where a smaller patch is cut (with a lesser or equivalent number of items compared to the larger patch), training results may be enhanced. Second, interplaying tasks can improve counting precision. Third, the deconvolution should be used to render the density diagram more accurate. Finally, certain activities lead to growing the network’s overall efficiency, increased network reliability and decreased program usage in real-time.

4.6 CNN techniques image view

This group focuses on the study of an input image and also on the network architecture to increase network accuracy, which is very useful for medical imaging, drone surveillance in particular areas, and CCTV tracking. Since the slope, tilt, and location of the camera in support of the target performance play a crucial role in the development of every algorithm, we specifically split up the CNN picture view into two sub-categories: CNN and Perspective-CNN.

4.7 CNN techniques patch-based

In a patch-based approach, the CNN shall be equipped with a sliding window over the test image using the cut bits. This is especially helpful as density maps are more reliable and cannot be compromised, e.g., in the treatment of cancer. Both the cell count and cell resolution affected are important. The key objective is to create a device for better density maps at a high computational cost.

Cohen et al. [ 19 ] suggested a CNN network that was inspired by Deep. A smaller network than the measurement of the crowd count calculates the number of events in a given area. The authors suggested in the article that DecideNet should be employed by overestimating the number of crowds in scarce areas by regression-based approaches and by underestimating the number of crowds in density-dependent compact zones [ 31 , 49 ]. The authors in [ 80 ] suggested the optimized approach for the flow of data through various convolution and disengagement rates, inspired by a skip link system for crowd counting. Edges and colors with convolution layers were identified, but this low-level knowledge from an early stage may or may not have led to improve the network’s means of absolute error (MAE) output. U-Net was employed to evaluate how much information was sent to the final layer (convolution or fully linked) to provide a more efficient feature-selection mechanism. Similar to the principle of [ 31 , 49 , 81 ] proposed to work with very complex, varied images, a deep information-oriented crowd-counting approach (Digcrowd). Segmentation took place on an image to break it into one part: far-off parts. In the area of the near-view, detections are counted, and in the far-sights zone, Digcrowd maps are used to map individuals to their density index. In Shami et al. [ 84 ] the authors used an ahead detector to determine a human head’s specific size. The SVM classifier classified crowded and uncrowded patches after dividing it into several patches. On each patch, the head scale regression was done. If the head size has been calculated, a separating region of the head size decides the overall number of the heads in a specific patch. In Zhang et al. [ 87 ] by filtering the history on the headline, the authors proposed a count-net-technique. Attributes have also been collected and measured at the same time. Zhang et al. [ 87 ] introduced a density approach methodology for patch dependent CNN crowds with depth approximation-accessible kernels. The different receptor field sizes used for each CNN column are used for the treatment of various scale items (heads). However, the addition of the density map at the end may have diminished the exactness of the density estimation map. In another work, the Skip-connection CNN (SCNN) was suggested by Wang et al. [ 75 ] for crowd counting. For the extraction of different functions, the whole network used four multi-units. Three convolution layers were each composed of a multiscale array. Multi-sized units for removing the various size characteristics were used. Besides, the two patches (with separate scales) from each input image were applied to an incrimination technique (with no redundancy) [ 79 ]. In these two scales, CNN was personally equipped to tackle any dramatic shifts in size. In consideration, three returners specialized in low-, medium- and high-density pictures, Sam et al. [ 62 ] suggested a CNN transfer technology. To address some density variation issues, the entry patch was led to a separate regression by using a default (classifier).

It ought to be noted that the identification and regression on targeted image patches can be used sequentially to increase the estimation of the network accuracy. Besides, low-level network edge and color information can be filtered iteratively to reduce network computing costs.

5 Conclusion

This paper has reviewed different approaches, techniques and frameworks used for crowd analysis in video monitoring and surveillance specifically in crowd analysis based on Hajj video surveillance. First, the paper provides a brief discussion of the existing deep learning frameworks. Second, the paper presents a review of selected FCNN and CNN techniques on density estimation. The CNN techniques were categorized into network, image, and training-based CNN. In addition, the categories were subdivided into two main branches. Third, we critically reviewed selected research works related to crowd analysis. Lastly, we presented a review of the works in each category, with focus on the key characteristics, data sets, and architectures used. We believe that this work will contribute towards bridging the research gaps in this field of study.

"True Islam". Quran-Islam.org . Archived from the original on 2013-03-13. Retrieved 2013-07-31.

"A history of hajj tragedies | world news". London: theguardian.com . January 13, 2006. Archived from the original on August 29, 2013. Retrieved 2013-07-31

Albattah W et al (2020) Hajj crowd management using CNN-based approach. Comput, Mat Continua 66(2):2183–2197. https://doi.org/10.32604/cmc.2020.014227

Article   Google Scholar  

Amirgholipour S et al. (2018). 'A-CCNN : Adaptive CCNN for Density Estimation and Crowd Counting Global Big Data Technologies Centre , University of Technology Sydney , Australia Quantitative Imaging , CSIRO Data61 , Australia Institute for Sustainable Futures , University of Technology'. 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, pp. 948–952. https://doi.org/10.1109/ICIP.2018.8451399 .

Anjum N, Cavallaro A (2008) Multifeature object trajectory clustering for video analysis. IEEE Trans Circ Syst Video Technol 18(11):1555–1564. https://doi.org/10.1109/TCSVT.2008.2005603

Arteta C, Lempitsky V. Zisserman A (2016) Counting in the wild. In Proceedings of the European Conference on Computer Vision, Amsterdam, the Netherlands, 8–16; pp. 483–498.

Bansal M, Kumar M, Kumar M (2021) 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed Tools Appl 80(12):18839–18857. https://doi.org/10.1007/s11042-021-10646-0

Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection, in: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–8.

Bendali-Braham M et al (2021) ‘Recent trends in crowd analysis: a review’, machine learning with applications. Elsevier ltd., 4(October 2020), p. 100023. https://doi.org/10.1016/j.mlwa.2021.100023

Boominathan L, Kruthiventi SSS, Venkatesh Babu R (2016) ‘CrowdNet: a deep convolutional network for dense crowd counting’. MM 2016 - Proceed 2016 ACM Multimedia Conf pp. 640–644. https://doi.org/10.1145/2964284.2967300

Brostow GJ, Cipolla R (2006) ‘Unsupervised bayesian detection of independent motion in crowds’, Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 1, pp. 594–601. https://doi.org/10.1109/CVPR.2006.320

Candamo J, … Kasturi R (2010) Understanding transit scenes: a survey on human behavior-recognition algorithms. IEEE Trans Intell Transp Syst 11(1):206–224. https://doi.org/10.1109/TITS.2009.2030963

Chan AB, Vasconcelos N (2009) ‘Bayesian poisson regression for crowd counting’. Proceed IEEE Int Conf Comput Vision, (Iccv). 545–551. https://doi.org/10.1109/ICCV.2009.5459191

Chan AB, Liang ZSJ, Vasconcelos N (2008) ‘Privacy preserving crowd monitoring: counting people without people models or tracking’, 26th IEEE conference on computer vision and pattern recognition. CVPR. https://doi.org/10.1109/CVPR.2008.4587569

Chan TH, … Ma Y (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24(12):5017–5032. https://doi.org/10.1109/TIP.2015.2475625

Article   MathSciNet   MATH   Google Scholar  

Chattopadhyay P et al (2017) ‘Counting everyday objects in everyday scenes’, proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, 2017-January, pp. 4428–4437. https://doi.org/10.1109/CVPR.2017.471

Chen K et al (2012) ‘Feature mining for localised crowd counting’, BMVC 2012 - electronic proceedings of the British machine vision conference 2012. https://doi.org/10.5244/C.26.21

Cheng HY, Hwang JN (2011) Integrated video object tracking with applications in trajectory-based event detection. J Visual Comm Image Represent. Elsevier Inc 22(7):673–685. https://doi.org/10.1016/j.jvcir.2011.07.001

Cohen JP et al (2017) ‘Count-ception: counting by fully convolutional redundant counting’, proceedings - 2017 IEEE international conference on computer vision workshops, ICCVW 2017 , 2018-January, pp. 18–26. https://doi.org/10.1109/ICCVW.2017.9

Cui X et al (2011) ‘Abnormal detection using interaction energy potentials’, proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 3161–3167. https://doi.org/10.1109/CVPR.2011.5995558

Dargan S, … Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng. Springer Netherlands 27(4):1071–1092. https://doi.org/10.1007/s11831-019-09344-w

Article   MathSciNet   Google Scholar  

Deb D, Ventura J (2018) ‘An aggregated multicolumn dilated convolution network for perspective-free counting’, IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, 2018-June, pp. 308–317. https://doi.org/10.1109/CVPRW.2018.00057

Dollár P et al (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155

Du D, Qi H, Huang Q, Zeng W, Zhang C (2013) Abnormal event detection in crowded scenes based on structural multi-scale motion interrelated patterns, in: proceedings of the IEEE international conference on multimedia and Expo, pp. 1–6.

Elbishlawi S, Abdelpakey MH, Eltantawy A, Shehata MS, Mohamed MM (2020) Deep learning-based crowd scene analysis survey. J Imaging 6(9):95. https://doi.org/10.3390/jimaging6090095

Feng Y, Yuan Y, Lu X (2017) ‘Learning deep event models for crowd anomaly detection’, Neurocomputing . Elsevier 219(July 2016):548–556. https://doi.org/10.1016/j.neucom.2016.09.063

Fiaschi T, Giannoni E, Taddei ML, Chiarugi P (2012) Globular adiponectin activates motility and regenerative traits of muscle satellite cells. PLoS One 7(5):e34782. https://doi.org/10.1371/journal.pone.0034782

Forsyth D (2014) Object detection with discriminatively trained part-based models’. Computer 47(2):6–7. https://doi.org/10.1109/MC.2014.42

Fu M, … Zhu C (2015) ‘Fast crowd density estimation with convolutional neural networks’, engineering applications of artificial intelligence. Elsevier 43:81–88. https://doi.org/10.1016/j.engappai.2015.04.006

Gao, G. et al. (2020). 'CNN-based density estimation and crowd counting: a Survey', pp. 1–25. Available at: http://arxiv.org/abs/2003.12783 .

Gao J, Wang Q, Li X (2019) PCC net: perspective crowd counting via spatial convolutional network. IEEE Trans Circ Syst Video Technol 30:1–1, 3498. https://doi.org/10.1109/tcsvt.2019.2919139

Gupta, S., Kumar, M. and Garg, A. (2019) ‘Improved object recognition results using SIFT and ORB feature detector’, multimedia tools and applications. Multimedia Tools and Applications, 78(23), pp. 34157–34171. doi: https://doi.org/10.1007/s11042-019-08232-6 .

Han K et al (2017) Image crowd counting using convolutional neural network and Markov random field. J Advanc Comput Intell Intell Inform 21(4):632–638. https://doi.org/10.20965/jaciii.2017.p0632

Hossain MA et al (2019) 'Crowd counting using scale-aware attention networks', proceedings - 2019 IEEE winter conference on applications of computer vision , WACV 2019 , pp. 1280–1288. https://doi.org/10.1109/WACV.2019.00141

Hu Y, … Li T (2016) Dense crowd counting from still images with convolutional neural networks. J Visual Commun Image Representation Elsevier Inc 38:530–539. https://doi.org/10.1016/j.jvcir.2016.03.021

Huang S, … Han J (2018) Body structure aware deep crowd counting. IEEE Trans Image Process 27(3):1049–1059. https://doi.org/10.1109/TIP.2017.2740160

Idrees H et al (2013) ‘Multi-source multi-scale counting in extremely dense crowd images’, Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 2547–2554. https://doi.org/10.1109/CVPR.2013.329

Idrees H et al (2018) ‘Composition loss for counting, density map estimation and localization in dense crowds’, lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 11206 LNCS, pp. 544–559. https://doi.org/10.1007/978-3-030-01216-8_33

Ilyas N, Shahzad A, Kim K (2020) Convolutional-neural network-based image crowd counting: review, categorisation, analysis, and performance evaluation. Sensors (Switzerland) 20(1). https://doi.org/10.3390/s20010043

Ji S, … Yu K (2013) ‘3D convolutional neural networks for human action recognition’, IEEE transactions on pattern analysis and machine intelligence. IEEE 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59

Kang D, Chan A (2019) ‘Crowd counting by adaptively fusing predictions from an image pyramid’, British machine vision conference 2018. BMVC 2018:1–12

Kumar A, Arunnehru J (2021) Crowd behavior monitoring and analysis in surveillance applications: a survey crowd behavior monitoring and analysis in surveillance applications: a survey. Turkish J Comput Math Educ 12(7):2322–2336

Google Scholar  

Lempitsky V and Zisserman A (2010) “Learning To Count Objects in Images,” pp. 1–9

Li W, Zhao R, Xiao T, and Wang X (2014), “DeepReID: Deep filter pairing neural network for person re-identification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 152–159, doi: https://doi.org/10.1109/CVPR.2014.27 .

Li Y, Zhang X, Chen D (2018) ‘CSRNet : dilated convolutional neural networks for understanding the highly congested scenes’, pp. 1091–1100.

Li Y, Zhang X, Chen D (2018) 'CSRNet: dilated convolutional neural networks for understanding the highly congested Scenes'. Proceed IEEE comput Soc Conf Comput Vision Patt Recogn. pp. 1091–1100. https://doi.org/10.1109/CVPR.2018.00120

Li Y, Sarvi M, Khoshelham K, Haghani M (2020) Multi-view crowd congestion monitoring system based on an ensemble of convolutional neural network classifiers. J Intell Trans Syst: Technol, Plann, Oper. Taylor & Francis 0(0):1–12. https://doi.org/10.1080/15472450.2020.1746909

Liu C, Yuen J, Torralba A (2011) Sift flow: dense correspondence across scenes and its applications. TPAMI 33:978–994

Liu J et al (2018) ‘DecideNet: counting varying density crowds through attention guided detection and density estimation’, proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 5197–5206. https://doi.org/10.1109/CVPR.2018.00545

Liu L et al (2018) ‘Crowd counting using deep recurrent spatial-aware network’, IJCAI international joint conference on artificial intelligence, 2018-July, pp. 849–855. https://doi.org/10.24963/ijcai.2018/118 .

Liu W, Salzmann M, Fua P (2019) 'Context-aware crowd counting', proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019-June, pp. 5094–5103. https://doi.org/10.1109/CVPR.2019.00524

Lu C, Shi J, Jia J (2013) ‘Abnormal event detection at 150 FPS in MATLAB’, Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727. https://doi.org/10.1109/ICCV.2013.338

Lu T, Wu L, Ma X, Shivakumara P, Tan CL (2014) Anomaly detection through spatiotemporal context modeling in crowded scenes, in: proceedings of the international conference on pattern recognition, pp. 2203–2208.

Marsden M, McGuinness K et al (2017) 'Fully convolutional crowd counting on highly congested scenes', VISIGRAPP 2017 - proceedings of the 12th international joint conference on computer vision, imaging and computer graphics theory and applications, 5(Visigrapp), pp. 27–33. https://doi.org/10.5220/0006097300270033

Marsden M et al (2017) ‘ResnetCrowd: a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification’, 2017 14th IEEE international conference on advanced video and signal based surveillance, AVSS 2017. https://doi.org/10.1109/AVSS.2017.8078482

Mundhenk TN, Konjevod G, Sakla WA, Boakye K (2016). A large contextual dataset for classification, detection and counting of cars with deep learning. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–16; pp. 785–800

Onoro-Rubio D, Niepert M, Lopez-Sastre RJ (2019) ‘Learning short-cut connections for object counting’, British machine vision conference 2018. BMVC 2018:1–12

Ouyang W and Wang X (2013), “Joint deep learning for pedestrian detection,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2056–2063, doi: https://doi.org/10.1109/ICCV.2013.257 .

Pham VQ et al (2015) ‘COUNT forest: co-voting uncertain number of targets using random forest for crowd density estimation’, proceedings of the IEEE international conference on computer vision, 2015 international conference on computer vision, ICCV 2015, pp. 3253–3261. https://doi.org/10.1109/ICCV.2015.372

Rodriguez M, Sivic J, Laptev I, Audibert J (2011) Datadriven crowd analysis in videos In ICCV

Ryan D et al (2009) ‘Crowd counting using multiple local features’, DICTA 2009 - digital image computing: techniques and applications, pp. 81–88. https://doi.org/10.1109/DICTA.2009.22

Sam DB (2017) ‘Switching convolutional neural network for crowd counting’, pp. 5744–5752.

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.

Shami MB, … Cheung SCS (2019) People counting in dense crowd images using sparse head detections. IEEE Trans Circ Syst for Video Technol 29(9):2627–2636. https://doi.org/10.1109/TCSVT.2018.2803115

Sindagi VA, Patel VM (2017) 'CNN-based cascaded multitask learning of high-level prior and density estimation for crowd counting', 2017 14th IEEE international conference on advanced video and signal based surveillance, AVSS 2017. https://doi.org/10.1109/AVSS.2017.8078491

Sindagi VA, Patel VM (2017) Pt us CRPattern Recognition Letters. Elsevier B.V. https://doi.org/10.1016/j.patrec.2017.07.007

Sindagi VA, Patel VM (2017) ‘Generating high-quality crowd density maps using contextual pyramid CNNs’, proceedings of the IEEE international conference on computer vision, 2017-October, pp. 1879–1888. https://doi.org/10.1109/ICCV.2017.206

Singh K, … Walia GS (2020) 'Crowd anomaly detection using aggregation of ensembles of fine-tuned ConvNets', Neuro computing. Elsevier BV 371:188–198. https://doi.org/10.1016/j.neucom.2019.08.059

Tian Y et al (2020) ‘PaDNet : pan-density crowd counting’, pp. 1–14.

J. Tighe and S. Lazebnik. (2010). Superparsing: scalable nonparametric image parsing with superpixels. In ECCV. Springer.

Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727. https://doi.org/10.1109/TPAMI.2008.75

van den Oord A, Schrauwen B (2014) Factoring variations in natural images with deep gaussian mixture models, in: proceedings of the advances in neural information processing systems, pp. 3518–3526.

Walach E, Wolf L (2016). Learning to count with CNN boosting. In Proceedings of the European Conference on Computer Vision, Amsterdam, the Netherlands, 11–14; pp. 660–676

Wang C et al (2015) ‘Deep people counting in extremely dense crowds’, MM 2015 - proceedings of the 2015 ACM multimedia conference, pp. 1299–1302. https://doi.org/10.1145/2733373.2806337

Wang, Li et al. (2018). ‘Crowd Counting with Density Adaption Networks’, pp. 1–5. Available at: http://arxiv.org/abs/1806.10040 .

Wang, Q. et al. (2020). 'NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting,' 10 jan 2020, pp. 1–8. Available at: http://arxiv.org/abs/2001.03360 .

Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet based part detectors. Int J Comput Vis 75(2):247–266. https://doi.org/10.1007/s11263-006-0027-7

Wu S, Moore BE, Shah M (2010) ‘Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes’, Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 2054–2060. https://doi.org/10.1109/CVPR.2010.5539882

Xu M et al (2019) ‘Depth information guided crowd counting for complex crowd scenes’, Pattern Recognition Letters. Elsevier B.V., 125, pp. 563–569. https://doi.org/10.1016/j.patrec.2019.02.026

Xu M et al (2019) ‘Depth information guided crowd counting for complex crowd scenes’. Patt Recogn Lett. Elsevier B.V., 125, pp. 563–569. https://doi.org/10.1016/j.patrec.2019.02.026

Yang B, … Zou L (2018) ‘Counting challenging crowds robustly using a multi-column multi-task convolutional neural network’, signal processing: image communication. Elsevier Ltd 64:118–129. https://doi.org/10.1016/j.image.2018.03.004

Zeng X, … Ye Y (2020) 'DSPNet: deep scale purifier network for dense crowd counting', expert systems with applications . Elsevier Ltd 141:112977. https://doi.org/10.1016/j.eswa.2019.112977

Zhang C et al (2015) ‘Cross-scene crowd counting via deep convolutional neural networks’, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07-12-June-2015, pp. 833–841. https://doi.org/10.1109/CVPR.2015.7298684

Zhang C et al (2015) ‘Cross-scene crowd counting via deep convolutional neural networks’, proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07-12-June, pp. 833–841. https://doi.org/10.1109/CVPR.2015.7298684

Zhang Y et al (2016) ‘Single-image crowd counting via multi-column convolutional neural network’, proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 589–597. https://doi.org/10.1109/CVPR.2016.70

Zhang Y et al (2016) ‘Single-image crowd counting via multi-column convolutional neural network’, proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2016-December, pp. 589–597. https://doi.org/10.1109/CVPR.2016.70

Zhang Y et al (2018) ‘Auxiliary learning for crowd counting via count-net’. Neuro Computing. Elsevier B.V., 273, pp. 190–198. https://doi.org/10.1016/j.neucom.2017.08.018

Zhou S, … Zhang Z (2016) ‘Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes’. Signal Process: Image Comm. Elsevier 47:358–368. https://doi.org/10.1016/j.image.2016.06.007

Zhu J, Feng F and Shen B (2018) ‘People counting and pedestrian flow statistics based on convolutional neural network and recurrent neural network’, Proceedings - 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation, YAC 2018. IEEE, pp. 993–998. doi: https://doi.org/10.1109/YAC.2018.8406516 .

Download references

Acknowledgements

Multimedia University, Cyberjaya, Malaysia fully supported this research.

Author information

Authors and affiliations.

Faculty of Computing & Informatics, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia

Md Roman Bhuiyan, Junaidi Abdullah, Noramiza Hashim & Fahmid Al Farid

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Md Roman Bhuiyan .

Ethics declarations

Conflict of interest.

The authors hereby declare that there are no conflicts of interest in this research.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Bhuiyan, M.R., Abdullah, J., Hashim, N. et al. Video analytics using deep learning for crowd analysis: a review. Multimed Tools Appl 81 , 27895–27922 (2022). https://doi.org/10.1007/s11042-022-12833-z

Download citation

Received : 18 January 2021

Revised : 02 February 2022

Accepted : 09 March 2022

Published : 29 March 2022

Issue Date : August 2022

DOI : https://doi.org/10.1007/s11042-022-12833-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Crowd analysis
  • Abnormal behavior
  • Video surveillances
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

remotesensing-logo

Article Menu

literature review on video analysis

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A review of computer vision-based crack detection methods in civil infrastructure: progress and challenges.

literature review on video analysis

1. Introduction

2. crack detection combining traditional image processing methods and deep learning, 2.1. crack detection based on image edge detection and deep learning, 2.2. crack detection based on threshold segmentation and deep learning, 2.3. crack detection based on morphological operations and deep learning, 3. crack detection based on multimodal data fusion, 3.1. multi-sensor fusion, 3.2. multi-source data fusion, 4. crack detection based on image semantic understanding, 4.1. crack detection based on classification networks, 4.2. crack detection based on object detection networks, 4.3. crack detection based on segmentation networks.

ModelImprovement/InnovationBackbone/Feature Extraction ArchitectureEfficiencyResults
FCS-Net [ ]Integrating ResNet-50, ASPP, and BNResNet-50-MIoU = 74.08%
FCN-SFW [ ]Combining fully convolutional network (FCN) and structural forests with wavelet transform (SFW) for detecting tiny cracksFCNComputing time = 1.5826 sPrecision = 64.1%
Recall = 87.22%
F1 score = 68.28%
AFFNet [ ]Using ResNet101 as the backbone network, and incorporating two attention mechanism modules, namely VH-CAM and ECAUMResNet101Execution time = 52 msMIoU = 84.49%
FWIoU = 97.07%
PA = 98.36%
MPA = 92.01%
DeepLabv3+ [ ]Replacing ordinary convolution with separable convolution; improved SE_ASSP moduleXception-65-AP = 97.63%
MAP = 95.58%
MIoU = 81.87%
U-Net [ ]The parameters were optimized (the depths of the network, the choice of activation functions, the selection of loss functions, and the data augmentation)Encoder and decoderAnalysis speed (1024 × 1024 pixels) = 0.022 sPrecision = 84.6%
Recall = 72.5%
F1 score = 78.1%
IoU = 64%
KTCAM-Net [ ]Combined CAM and RCM; integrating classification network and segmentation networkDeepLabv3FPS = 28Accuracy = 97.26%
Precision = 68.9%
Recall = 83.7%
F1 score = 75.4%
MIoU = 74.3%
ADDU-Net [ ]Featuring asymmetric dual decoders and dual attention mechanismsEncoder and decoderFPS = 35Precision = 68.9%
Recall = 83.7%
F1 score = 75.4%
MIoU = 74.3%
CGTr-Net [ ]Optimized CG-Trans, TCFF, and hybrid loss functionsCG-Trans-Precision = 88.8%
Recall = 88.3%
F1 score = 88.6%
MIoU = 89.4%
PCSN [ ]Using Adadelta as the optimizer and categorical cross-entropy as the loss function for the networkSegNetInference time = 0.12 smAP = 83%
Accuracy = 90%
Recall = 50%
DEHF-Net [ ]Introducing dual-branch encoder unit, feature fusion scheme, edge refinement module, and multi-scale feature fusion moduleDual-branch encoder unit-Precision = 86.3%
Recall = 92.4%
Dice score = 78.7%
mIoU = 81.6%
Student model + teacher model [ ]Proposed a semi-supervised semantic segmentation networkEfficientUNet-Precision = 84.98%
Recall = 84.38%
F1 score = 83.15%

5. Datasets

6. evaluation index, 7. discussion, 8. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest.

AspectCombining Traditional Image Processing Methods and Deep LearningMultimodal Data Fusion
Processing speedModerate—traditional methods are usually fast, but deep learning models may be slower, and the overall speed depends on the complexity of the deep learning modelSlower—data fusion and processing speed can be slow, especially with large-scale multimodal data, involving significant computational and data transfer overhead
AccuracyHigh—combines the interpretability of traditional methods with the complex pattern handling of deep learning, generally resulting in high detection accuracyTypically higher—combining different data sources (e.g., images, text, audio) provides comprehensive information, improving overall detection accuracy
RobustnessStrong—traditional methods provide background knowledge, enhancing robustness, but deep learning’s risk of overfitting may reduce robustnessVery strong—fusion of multiple data sources enhances the model’s adaptability to different environments and conditions, better handling noise and anomalies
ComplexityHigh—integrating traditional methods and deep learning involves complex design and balancing, with challenges in tuning and interpreting deep learning modelsHigh—involves complex data preprocessing, alignment, and fusion, handling inconsistencies and complexities from multiple data sources
AdaptabilityStrong—can adapt to different types of cracks and background variations, with deep learning models learning features from data, though it requires substantial labeled dataVery strong—combines diverse data sources, adapting well to various environments and conditions, and handling complex backgrounds and variations effectively
InterpretabilityHigher—traditional methods provide clear explanations, while deep learning models often lack interpretability; combining them can improve overall interpretabilityLower—fusion models generally have lower interpretability, making it difficult to intuitively explain how different data sources influence the final results
Data requirementsHigh—deep learning models require a lot of labeled data, while traditional methods are more lenient, though deep learning still demands substantial dataVery high—requires large amounts of data from various modalities, and these data need to be processed and aligned effectively for successful fusion
FlexibilityModerate—combining traditional methods and deep learning handles various types of cracks, but may be limited in very complex scenariosHigh—handles multiple data sources and different crack information, improving performance in diverse conditions through multimodal fusion
Real-time capabilityPoor—deep learning models are often slow to train and infer, making them less suitable for real-time detection, though combining with traditional methods can helpPoor—multimodal data fusion processing is generally slow, making it less suitable for real-time applications
Maintenance costModerate to high—deep learning models require regular updates and maintenance, while traditional methods have lower maintenance costsHigh—involves ongoing maintenance and updates for multiple data sources, with complex data preprocessing and fusion processes
Noise handlingGood—traditional methods effectively handle noise under certain conditions, and deep learning models can mitigate noise effects through trainingStrong—multimodal fusion can complement information from different sources, improving robustness to noise and enhancing detection accuracy
  • Azimi, M.; Eslamlou, A.D.; Pekcan, G. Data-driven structural health monitoring and damage detection through deep learning: State-of-the-art review. Sensors 2020 , 20 , 2778. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Han, X.; Zhao, Z. Structural surface crack detection method based on computer vision technology. J. Build. Struct. 2018 , 39 , 418–427. [ Google Scholar ]
  • Kruachottikul, P.; Cooharojananone, N.; Phanomchoeng, G.; Chavarnakul, T.; Kovitanggoon, K.; Trakulwaranont, D. Deep learning-based visual defect-inspection system for reinforced concrete bridge substructure: A case of thailand’s department of highways. J. Civ. Struct. Health Monit. 2021 , 11 , 949–965. [ Google Scholar ] [ CrossRef ]
  • Gehri, N.; Mata-Falcón, J.; Kaufmann, W. Automated crack detection and measurement based on digital image correlation. Constr. Build. Mater. 2020 , 256 , 119383. [ Google Scholar ] [ CrossRef ]
  • Mohan, A.; Poobal, S. Crack detection using image processing: A critical review and analysis. Alex. Eng. J. 2018 , 57 , 787–798. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Fan, J.; Nie, J.; Kong, S.; Qi, Y. Review and prospect of digital-image-based crack detection of structure surface. China Civ. Eng. J. 2021 , 54 , 79–98. [ Google Scholar ]
  • Hsieh, Y.-A.; Tsai, Y.J. Machine learning for crack detection: Review and model performance comparison. J. Comput. Civ. Eng. 2020 , 34 , 04020038. [ Google Scholar ] [ CrossRef ]
  • Xu, Y.; Bao, Y.; Chen, J.; Zuo, W.; Li, H. Surface fatigue crack identification in steel box girder of bridges by a deep fusion convolutional neural network based on consumer-grade camera images. Struct. Health Monit. 2019 , 18 , 653–674. [ Google Scholar ] [ CrossRef ]
  • Wang, W.; Deng, L.; Shao, X. Fatigue design of steel bridges considering the effect of dynamic vehicle loading and overloaded trucks. J. Bridge Eng. 2016 , 21 , 04016048. [ Google Scholar ] [ CrossRef ]
  • Zheng, K.; Zhou, S.; Zhang, Y.; Wei, Y.; Wang, J.; Wang, Y.; Qin, X. Simplified evaluation of shear stiffness degradation of diagonally cracked reinforced concrete beams. Materials 2023 , 16 , 4752. [ Google Scholar ] [ CrossRef ]
  • Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986 , PAMI-8 , 679–698. [ Google Scholar ] [ CrossRef ]
  • Otsu, N. A threshold selection method from gray-level histograms. Automatica 1975 , 11 , 23–27. [ Google Scholar ] [ CrossRef ]
  • Sohn, H.G.; Lim, Y.M.; Yun, K.H.; Kim, G.H. Monitoring crack changes in concrete structures. Comput.-Aided Civ. Infrastruct. Eng. 2005 , 20 , 52–61. [ Google Scholar ] [ CrossRef ]
  • Wang, P.; Qiao, H.; Feng, Q.; Xue, C. Internal corrosion cracks evolution in reinforced magnesium oxychloride cement concrete. Adv. Cem. Res. 2023 , 36 , 15–30. [ Google Scholar ] [ CrossRef ]
  • Loutridis, S.; Douka, E.; Trochidis, A. Crack identification in double-cracked beams using wavelet analysis. J. Sound Vib. 2004 , 277 , 1025–1039. [ Google Scholar ] [ CrossRef ]
  • Fan, C.L. Detection of multidamage to reinforced concrete using support vector machine-based clustering from digital images. Struct. Control Health Monit. 2021 , 28 , e2841. [ Google Scholar ] [ CrossRef ]
  • Kyal, C.; Reza, M.; Varu, B.; Shreya, S. Image-based concrete crack detection using random forest and convolution neural network. In Computational Intelligence in Pattern Recognition: Proceedings of the International Conference on Computational Intelligence in Pattern Recognition (CIPR 2021), Held at the Institute of Engineering and Management, Kolkata, West Bengal, India, on 24–25 April 2021 ; Springer: Singapore, 2022; pp. 471–481. [ Google Scholar ]
  • Jia, H.; Lin, J.; Liu, J. Bridge seismic damage assessment model applying artificial neural networks and the random forest algorithm. Adv. Civ. Eng. 2020 , 2020 , 6548682. [ Google Scholar ] [ CrossRef ]
  • Park, M.J.; Kim, J.; Jeong, S.; Jang, A.; Bae, J.; Ju, Y.K. Machine learning-based concrete crack depth prediction using thermal images taken under daylight conditions. Remote Sens. 2022 , 14 , 2151. [ Google Scholar ] [ CrossRef ]
  • LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015 , 521 , 436–444. [ Google Scholar ] [ CrossRef ]
  • Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using u-net fully convolutional networks. Autom. Constr. 2019 , 104 , 129–139. [ Google Scholar ] [ CrossRef ]
  • Li, G.; Ma, B.; He, S.; Ren, X.; Liu, Q. Automatic tunnel crack detection based on u-net and a convolutional neural network with alternately updated clique. Sensors 2020 , 20 , 717. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chaiyasarn, K.; Buatik, A.; Mohamad, H.; Zhou, M.; Kongsilp, S.; Poovarodom, N. Integrated pixel-level cnn-fcn crack detection via photogrammetric 3d texture mapping of concrete structures. Autom. Constr. 2022 , 140 , 104388. [ Google Scholar ] [ CrossRef ]
  • Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput.-Aided Civ. Infrastruct. Eng. 2019 , 34 , 616–634. [ Google Scholar ] [ CrossRef ]
  • Zheng, X.; Zhang, S.; Li, X.; Li, G.; Li, X. Lightweight bridge crack detection method based on segnet and bottleneck depth-separable convolution with residuals. IEEE Access 2021 , 9 , 161649–161668. [ Google Scholar ] [ CrossRef ]
  • Azouz, Z.; Honarvar Shakibaei Asli, B.; Khan, M. Evolution of crack analysis in structures using image processing technique: A review. Electronics 2023 , 12 , 3862. [ Google Scholar ] [ CrossRef ]
  • Hamishebahar, Y.; Guan, H.; So, S.; Jo, J. A comprehensive review of deep learning-based crack detection approaches. Appl. Sci. 2022 , 12 , 1374. [ Google Scholar ] [ CrossRef ]
  • Meng, S.; Gao, Z.; Zhou, Y.; He, B.; Djerrad, A. Real-time automatic crack detection method based on drone. Comput.-Aided Civ. Infrastruct. Eng. 2023 , 38 , 849–872. [ Google Scholar ] [ CrossRef ]
  • Humpe, A. Bridge inspection with an off-the-shelf 360 camera drone. Drones 2020 , 4 , 67. [ Google Scholar ] [ CrossRef ]
  • Truong-Hong, L.; Lindenbergh, R. Automatically extracting surfaces of reinforced concrete bridges from terrestrial laser scanning point clouds. Autom. Constr. 2022 , 135 , 104127. [ Google Scholar ] [ CrossRef ]
  • Cusson, D.; Rossi, C.; Ozkan, I.F. Early warning system for the detection of unexpected bridge displacements from radar satellite data. J. Civ. Struct. Health Monit. 2021 , 11 , 189–204. [ Google Scholar ] [ CrossRef ]
  • Bonaldo, G.; Caprino, A.; Lorenzoni, F.; da Porto, F. Monitoring displacements and damage detection through satellite MT-INSAR techniques: A new methodology and application to a case study in rome (Italy). Remote Sens. 2023 , 15 , 1177. [ Google Scholar ] [ CrossRef ]
  • Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021 , 265 , 112636. [ Google Scholar ] [ CrossRef ]
  • Chen, X.; Zhang, X.; Ren, M.; Zhou, B.; Sun, M.; Feng, Z.; Chen, B.; Zhi, X. A multiscale enhanced pavement crack segmentation network coupling spectral and spatial information of UAV hyperspectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2024 , 128 , 103772. [ Google Scholar ] [ CrossRef ]
  • Liu, F.; Liu, J.; Wang, L. Deep learning and infrared thermography for asphalt pavement crack severity classification. Autom. Constr. 2022 , 140 , 104383. [ Google Scholar ] [ CrossRef ]
  • Liu, S.; Han, Y.; Xu, L. Recognition of road cracks based on multi-scale retinex fused with wavelet transform. Array 2022 , 15 , 100193. [ Google Scholar ] [ CrossRef ]
  • Zhang, H.; Qian, Z.; Tan, Y.; Xie, Y.; Li, M. Investigation of pavement crack detection based on deep learning method using weakly supervised instance segmentation framework. Constr. Build. Mater. 2022 , 358 , 129117. [ Google Scholar ] [ CrossRef ]
  • Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018 , 186 , 1031–1045. [ Google Scholar ] [ CrossRef ]
  • Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-based crack detection methods: A review. Infrastructures 2021 , 6 , 115. [ Google Scholar ] [ CrossRef ]
  • Chen, D.; Li, X.; Hu, F.; Mathiopoulos, P.T.; Di, S.; Sui, M.; Peethambaran, J. Edpnet: An encoding–decoding network with pyramidal representation for semantic image segmentation. Sensors 2023 , 23 , 3205. [ Google Scholar ] [ CrossRef ]
  • Mo, S.; Shi, Y.; Yuan, Q.; Li, M. A survey of deep learning road extraction algorithms using high-resolution remote sensing images. Sensors 2024 , 24 , 1708. [ Google Scholar ] [ CrossRef ]
  • Chen, D.; Li, J.; Di, S.; Peethambaran, J.; Xiang, G.; Wan, L.; Li, X. Critical points extraction from building façades by analyzing gradient structure tensor. Remote Sens. 2021 , 13 , 3146. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Yeoh, J.K.; Chua, D.K. Deep learning-based enhancement of motion blurred UAV concrete crack images. J. Comput. Civ. Eng. 2020 , 34 , 04020028. [ Google Scholar ] [ CrossRef ]
  • Flah, M.; Nunez, I.; Ben Chaabene, W.; Nehdi, M.L. Machine learning algorithms in civil structural health monitoring: A systematic review. Arch. Comput. Methods Eng. 2021 , 28 , 2621–2643. [ Google Scholar ] [ CrossRef ]
  • Li, G.; Li, X.; Zhou, J.; Liu, D.; Ren, W. Pixel-level bridge crack detection using a deep fusion about recurrent residual convolution and context encoder network. Measurement 2021 , 176 , 109171. [ Google Scholar ] [ CrossRef ]
  • Ali, R.; Chuah, J.H.; Talip, M.S.A.; Mokhtar, N.; Shoaib, M.A. Structural crack detection using deep convolutional neural networks. Autom. Constr. 2022 , 133 , 103989. [ Google Scholar ] [ CrossRef ]
  • Wang, H.; Li, Y.; Dang, L.M.; Lee, S.; Moon, H. Pixel-level tunnel crack segmentation using a weakly supervised annotation approach. Comput. Ind. 2021 , 133 , 103545. [ Google Scholar ] [ CrossRef ]
  • Zhu, J.; Song, J. Weakly supervised network based intelligent identification of cracks in asphalt concrete bridge deck. Alex. Eng. J. 2020 , 59 , 1307–1317. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; Bao, T.; Xu, B.; Shu, X.; Zhou, Y.; Du, Y.; Wang, R.; Zhang, K. A deep residual neural network framework with transfer learning for concrete dams patch-level crack classification and weakly-supervised localization. Measurement 2022 , 188 , 110641. [ Google Scholar ] [ CrossRef ]
  • Yang, Q.; Shi, W.; Chen, J.; Lin, W. Deep convolution neural network-based transfer learning method for civil infrastructure crack detection. Autom. Constr. 2020 , 116 , 103199. [ Google Scholar ] [ CrossRef ]
  • Dais, D.; Bal, I.E.; Smyrou, E.; Sarhosis, V. Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. Autom. Constr. 2021 , 125 , 103606. [ Google Scholar ] [ CrossRef ]
  • Abdellatif, M.; Peel, H.; Cohn, A.G.; Fuentes, R. Combining block-based and pixel-based approaches to improve crack detection and localisation. Autom. Constr. 2021 , 122 , 103492. [ Google Scholar ] [ CrossRef ]
  • Dan, D.; Dan, Q. Automatic recognition of surface cracks in bridges based on 2D-APES and mobile machine vision. Measurement 2021 , 168 , 108429. [ Google Scholar ] [ CrossRef ]
  • Weng, X.; Huang, Y.; Wang, W. Segment-based pavement crack quantification. Autom. Constr. 2019 , 105 , 102819. [ Google Scholar ] [ CrossRef ]
  • Kao, S.-P.; Chang, Y.-C.; Wang, F.-L. Combining the YOLOv4 deep learning model with UAV imagery processing technology in the extraction and quantization of cracks in bridges. Sensors 2023 , 23 , 2572. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, X.; Xu, X.; He, X.; Wei, X.; Yang, H. Intelligent crack detection method based on GM-ResNet. Sensors 2023 , 23 , 8369. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Choi, Y.; Park, H.W.; Mi, Y.; Song, S. Crack detection and analysis of concrete structures based on neural network and clustering. Sensors 2024 , 24 , 1725. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Guo, J.-M.; Markoni, H.; Lee, J.-D. BARNet: Boundary aware refinement network for crack detection. IEEE Trans. Intell. Transp. Syst. 2021 , 23 , 7343–7358. [ Google Scholar ] [ CrossRef ]
  • Luo, J.; Lin, H.; Wei, X.; Wang, Y. Adaptive canny and semantic segmentation networks based on feature fusion for road crack detection. IEEE Access 2023 , 11 , 51740–51753. [ Google Scholar ] [ CrossRef ]
  • Ranyal, E.; Sadhu, A.; Jain, K. Enhancing pavement health assessment: An attention-based approach for accurate crack detection, measurement, and mapping. Expert Syst. Appl. 2024 , 247 , 123314. [ Google Scholar ] [ CrossRef ]
  • Liu, K.; Chen, B.M. Industrial UAV-based unsupervised domain adaptive crack recognitions: From database towards real-site infrastructural inspections. IEEE Trans. Ind. Electron. 2022 , 70 , 9410–9420. [ Google Scholar ] [ CrossRef ]
  • Wang, W.; Hu, W.; Wang, W.; Xu, X.; Wang, M.; Shi, Y.; Qiu, S.; Tutumluer, E. Automated crack severity level detection and classification for ballastless track slab using deep convolutional neural network. Autom. Constr. 2021 , 124 , 103484. [ Google Scholar ] [ CrossRef ]
  • Xu, Z.; Zhang, X.; Chen, W.; Liu, J.; Xu, T.; Wang, Z. Muraldiff: Diffusion for ancient murals restoration on large-scale pre-training. IEEE Trans. Emerg. Top. Comput. Intell. 2024 , 8 , 2169–2181. [ Google Scholar ] [ CrossRef ]
  • Bradley, D.; Roth, G. Adaptive thresholding using the integral image. J. Graph. Tools 2007 , 12 , 13–21. [ Google Scholar ] [ CrossRef ]
  • Sezgin, M.; Sankur, B.l. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004 , 13 , 146–168. [ Google Scholar ]
  • Kapur, J.N.; Sahoo, P.K.; Wong, A.K. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 1985 , 29 , 273–285. [ Google Scholar ] [ CrossRef ]
  • Pal, N.R.; Pal, S.K. A review on image segmentation techniques. Pattern Recognit. 1993 , 26 , 1277–1294. [ Google Scholar ] [ CrossRef ]
  • Flah, M.; Suleiman, A.R.; Nehdi, M.L. Classification and quantification of cracks in concrete structures using deep learning image-based techniques. Cem. Concr. Compos. 2020 , 114 , 103781. [ Google Scholar ] [ CrossRef ]
  • Mazni, M.; Husain, A.R.; Shapiai, M.I.; Ibrahim, I.S.; Anggara, D.W.; Zulkifli, R. An investigation into real-time surface crack classification and measurement for structural health monitoring using transfer learning convolutional neural networks and otsu method. Alex. Eng. J. 2024 , 92 , 310–320. [ Google Scholar ] [ CrossRef ]
  • He, Z.; Xu, W. Deep learning and image preprocessing-based crack repair trace and secondary crack classification detection method for concrete bridges. Struct. Infrastruct. Eng. 2024 , 20 , 1–17. [ Google Scholar ] [ CrossRef ]
  • He, T.; Li, H.; Qian, Z.; Niu, C.; Huang, R. Research on weakly supervised pavement crack segmentation based on defect location by generative adversarial network and target re-optimization. Constr. Build. Mater. 2024 , 411 , 134668. [ Google Scholar ] [ CrossRef ]
  • Su, H.; Wang, X.; Han, T.; Wang, Z.; Zhao, Z.; Zhang, P. Research on a U-Net bridge crack identification and feature-calculation methods based on a CBAM attention mechanism. Buildings 2022 , 12 , 1561. [ Google Scholar ] [ CrossRef ]
  • Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.-J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020 , 118 , 103291. [ Google Scholar ] [ CrossRef ]
  • Lei, Q.; Zhong, J.; Wang, C. Joint optimization of crack segmentation with an adaptive dynamic threshold module. IEEE Trans. Intell. Transp. Syst. 2024 , 25 , 6902–6916. [ Google Scholar ] [ CrossRef ]
  • Lei, Q.; Zhong, J.; Wang, C.; Xia, Y.; Zhou, Y. Dynamic thresholding for accurate crack segmentation using multi-objective optimization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy, 18 September 2023 ; Springer: Cham, Switzerland, 2023; pp. 389–404. [ Google Scholar ]
  • Vincent, L.; Soille, P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 1991 , 13 , 583–598. [ Google Scholar ] [ CrossRef ]
  • Huang, H.; Zhao, S.; Zhang, D.; Chen, J. Deep learning-based instance segmentation of cracks from shield tunnel lining images. Struct. Infrastruct. Eng. 2022 , 18 , 183–196. [ Google Scholar ] [ CrossRef ]
  • Fan, Z.; Lin, H.; Li, C.; Su, J.; Bruno, S.; Loprencipe, G. Use of parallel resnet for high-performance pavement crack detection and measurement. Sustainability 2022 , 14 , 1825. [ Google Scholar ] [ CrossRef ]
  • Kong, S.Y.; Fan, J.S.; Liu, Y.F.; Wei, X.C.; Ma, X.W. Automated crack assessment and quantitative growth monitoring. Comput.-Aided Civ. Infrastruct. Eng. 2021 , 36 , 656–674. [ Google Scholar ] [ CrossRef ]
  • Dang, L.M.; Wang, H.; Li, Y.; Park, Y.; Oh, C.; Nguyen, T.N.; Moon, H. Automatic tunnel lining crack evaluation and measurement using deep learning. Tunn. Undergr. Space Technol. 2022 , 124 , 104472. [ Google Scholar ] [ CrossRef ]
  • Andrushia, A.D.; Anand, N.; Lubloy, E. Deep learning based thermal crack detection on structural concrete exposed to elevated temperature. Adv. Struct. Eng. 2021 , 24 , 1896–1909. [ Google Scholar ] [ CrossRef ]
  • Dang, L.M.; Wang, H.; Li, Y.; Nguyen, L.Q.; Nguyen, T.N.; Song, H.-K.; Moon, H. Deep learning-based masonry crack segmentation and real-life crack length measurement. Constr. Build. Mater. 2022 , 359 , 129438. [ Google Scholar ] [ CrossRef ]
  • Nguyen, A.; Gharehbaghi, V.; Le, N.T.; Sterling, L.; Chaudhry, U.I.; Crawford, S. ASR crack identification in bridges using deep learning and texture analysis. Structures 2023 , 50 , 494–507. [ Google Scholar ] [ CrossRef ]
  • Dong, C.; Li, L.; Yan, J.; Zhang, Z.; Pan, H.; Catbas, F.N. Pixel-level fatigue crack segmentation in large-scale images of steel structures using an encoder–decoder network. Sensors 2021 , 21 , 4135. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jian, L.; Chengshun, L.; Guanhong, L.; Zhiyuan, Z.; Bo, H.; Feng, G.; Quanyi, X. Lightweight defect detection equipment for road tunnels. IEEE Sens. J. 2023 , 24 , 5107–5121. [ Google Scholar ]
  • Liang, H.; Qiu, D.; Ding, K.-L.; Zhang, Y.; Wang, Y.; Wang, X.; Liu, T.; Wan, S. Automatic pavement crack detection in multisource fusion images using similarity and difference features. IEEE Sens. J. 2023 , 24 , 5449–5465. [ Google Scholar ] [ CrossRef ]
  • Alamdari, A.G.; Ebrahimkhanlou, A. A multi-scale robotic approach for precise crack measurement in concrete structures. Autom. Constr. 2024 , 158 , 105215. [ Google Scholar ] [ CrossRef ]
  • Liu, H.; Kollosche, M.; Laflamme, S.; Clarke, D.R. Multifunctional soft stretchable strain sensor for complementary optical and electrical sensing of fatigue cracks. Smart Mater. Struct. 2023 , 32 , 045010. [ Google Scholar ] [ CrossRef ]
  • Dang, D.-Z.; Wang, Y.-W.; Ni, Y.-Q. Nonlinear autoregression-based non-destructive evaluation approach for railway tracks using an ultrasonic fiber bragg grating array. Constr. Build. Mater. 2024 , 411 , 134728. [ Google Scholar ] [ CrossRef ]
  • Yan, M.; Tan, X.; Mahjoubi, S.; Bao, Y. Strain transfer effect on measurements with distributed fiber optic sensors. Autom. Constr. 2022 , 139 , 104262. [ Google Scholar ] [ CrossRef ]
  • Shukla, H.; Piratla, K. Leakage detection in water pipelines using supervised classification of acceleration signals. Autom. Constr. 2020 , 117 , 103256. [ Google Scholar ] [ CrossRef ]
  • Chen, X.; Zhang, X.; Li, J.; Ren, M.; Zhou, B. A new method for automated monitoring of road pavement aging conditions based on recurrent neural network. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 24510–24523. [ Google Scholar ] [ CrossRef ]
  • Zhang, S.; He, X.; Xue, B.; Wu, T.; Ren, K.; Zhao, T. Segment-anything embedding for pixel-level road damage extraction using high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2024 , 131 , 103985. [ Google Scholar ] [ CrossRef ]
  • Park, S.E.; Eem, S.-H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build. Mater. 2020 , 252 , 119096. [ Google Scholar ] [ CrossRef ]
  • Yan, Y.; Mao, Z.; Wu, J.; Padir, T.; Hajjar, J.F. Towards automated detection and quantification of concrete cracks using integrated images and lidar data from unmanned aerial vehicles. Struct. Control Health Monit. 2021 , 28 , e2757. [ Google Scholar ] [ CrossRef ]
  • Dong, Q.; Wang, S.; Chen, X.; Jiang, W.; Li, R.; Gu, X. Pavement crack detection based on point cloud data and data fusion. Philos. Trans. R. Soc. A 2023 , 381 , 20220165. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kim, H.; Lee, S.; Ahn, E.; Shin, M.; Sim, S.-H. Crack identification method for concrete structures considering angle of view using RGB-D camera-based sensor fusion. Struct. Health Monit. 2021 , 20 , 500–512. [ Google Scholar ] [ CrossRef ]
  • Chen, J.; Lu, W.; Lou, J. Automatic concrete defect detection and reconstruction by aligning aerial images onto semantic-rich building information model. Comput.-Aided Civ. Infrastruct. Eng. 2023 , 38 , 1079–1098. [ Google Scholar ] [ CrossRef ]
  • Pozzer, S.; Rezazadeh Azar, E.; Dalla Rosa, F.; Chamberlain Pravia, Z.M. Semantic segmentation of defects in infrared thermographic images of highly damaged concrete structures. J. Perform. Constr. Facil. 2021 , 35 , 04020131. [ Google Scholar ] [ CrossRef ]
  • Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2023 , 132 , 103812. [ Google Scholar ] [ CrossRef ]
  • Sharma, V.K.; Mir, R.N. A comprehensive and systematic look up into deep learning based object detection techniques: A review. Comput. Sci. Rev. 2020 , 38 , 100301. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3708–3712. [ Google Scholar ]
  • Yang, C.; Chen, J.; Li, Z.; Huang, Y. Structural crack detection and recognition based on deep learning. Appl. Sci. 2021 , 11 , 2868. [ Google Scholar ] [ CrossRef ]
  • Rajadurai, R.-S.; Kang, S.-T. Automated vision-based crack detection on concrete surfaces using deep learning. Appl. Sci. 2021 , 11 , 5229. [ Google Scholar ] [ CrossRef ]
  • Kim, B.; Yuvaraj, N.; Sri Preethaa, K.; Arun Pandian, R. Surface crack detection using deep learning with shallow CNN architecture for enhanced computation. Neural Comput. Appl. 2021 , 33 , 9289–9305. [ Google Scholar ] [ CrossRef ]
  • O’Brien, D.; Osborne, J.A.; Perez-Duenas, E.; Cunningham, R.; Li, Z. Automated crack classification for the CERN underground tunnel infrastructure using deep learning. Tunn. Undergr. Space Technol. 2023 , 131 , 104668. [ Google Scholar ]
  • Chen, K.; Reichard, G.; Xu, X.; Akanmu, A. Automated crack segmentation in close-range building façade inspection images using deep learning techniques. J. Build. Eng. 2021 , 43 , 102913. [ Google Scholar ] [ CrossRef ]
  • Dong, Z.; Wang, J.; Cui, B.; Wang, D.; Wang, X. Patch-based weakly supervised semantic segmentation network for crack detection. Constr. Build. Mater. 2020 , 258 , 120291. [ Google Scholar ] [ CrossRef ]
  • Buatik, A.; Thansirichaisree, P.; Kalpiyapun, P.; Khademi, N.; Pasityothin, I.; Poovarodom, N. Mosaic crack mapping of footings by convolutional neural networks. Sci. Rep. 2024 , 14 , 7851. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, Y.; Zhang, L. Detection of pavement cracks by deep learning models of transformer and UNet. arXiv 2023 , arXiv:2304.12596. [ Google Scholar ] [ CrossRef ]
  • Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; Rabea, A.-J.; Zhai, D. A hybrid deep learning pavement crack semantic segmentation. Eng. Appl. Artif. Intell. 2023 , 122 , 106142. [ Google Scholar ] [ CrossRef ]
  • Shamsabadi, E.A.; Xu, C.; Rao, A.S.; Nguyen, T.; Ngo, T.; Dias-da-Costa, D. Vision transformer-based autonomous crack detection on asphalt and concrete surfaces. Autom. Constr. 2022 , 140 , 104316. [ Google Scholar ] [ CrossRef ]
  • Huang, S.; Tang, W.; Huang, G.; Huangfu, L.; Yang, D. Weakly supervised patch label inference networks for efficient pavement distress detection and recognition in the wild. IEEE Trans. Intell. Transp. Syst. 2023 , 24 , 5216–5228. [ Google Scholar ] [ CrossRef ]
  • Huang, G.; Huang, S.; Huangfu, L.; Yang, D. Weakly supervised patch label inference network with image pyramid for pavement diseases recognition in the wild. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 7978–7982. [ Google Scholar ]
  • Guo, J.-M.; Markoni, H. Efficient and adaptable patch-based crack detection. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 21885–21896. [ Google Scholar ] [ CrossRef ]
  • König, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Weakly-supervised surface crack segmentation by generating pseudo-labels using localization with a classifier and thresholding. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 24083–24094. [ Google Scholar ] [ CrossRef ]
  • Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; Rabea, A.-J.; Al-maqtari, O.; Zhai, D. Asymmetric dual-decoder-U-Net for pavement crack semantic segmentation. Autom. Constr. 2023 , 156 , 105138. [ Google Scholar ] [ CrossRef ]
  • Wen, T.; Lang, H.; Ding, S.; Lu, J.J.; Xing, Y. PCDNet: Seed operation-based deep learning model for pavement crack detection on 3d asphalt surface. J. Transp. Eng. Part B Pavements 2022 , 148 , 04022023. [ Google Scholar ] [ CrossRef ]
  • Mishra, A.; Gangisetti, G.; Eftekhar Azam, Y.; Khazanchi, D. Weakly supervised crack segmentation using crack attention networks on concrete structures. Struct. Health Monit. 2024 , 23 , 14759217241228150. [ Google Scholar ] [ CrossRef ]
  • Kompanets, A.; Pai, G.; Duits, R.; Leonetti, D.; Snijder, B. Deep learning for segmentation of cracks in high-resolution images of steel bridges. arXiv 2024 , arXiv:2403.17725. [ Google Scholar ]
  • Liu, Y.; Yeoh, J.K. Robust pixel-wise concrete crack segmentation and properties retrieval using image patches. Autom. Constr. 2021 , 123 , 103535. [ Google Scholar ] [ CrossRef ]
  • Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [ Google Scholar ]
  • Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [ Google Scholar ]
  • Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [ Google Scholar ]
  • He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [ Google Scholar ]
  • Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [ Google Scholar ]
  • Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [ Google Scholar ]
  • Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018 , arXiv:1804.02767. [ Google Scholar ]
  • Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020 , arXiv:2004.10934. [ Google Scholar ]
  • Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [ Google Scholar ]
  • Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [ Google Scholar ]
  • Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [ Google Scholar ]
  • Xu, Y.; Li, D.; Xie, Q.; Wu, Q.; Wang, J. Automatic defect detection and segmentation of tunnel surface using modified mask R-CNN. Measurement 2021 , 178 , 109316. [ Google Scholar ] [ CrossRef ]
  • Zhao, W.; Liu, Y.; Zhang, J.; Shao, Y.; Shu, J. Automatic pixel-level crack detection and evaluation of concrete structures using deep learning. Struct. Control Health Monit. 2022 , 29 , e2981. [ Google Scholar ] [ CrossRef ]
  • Li, R.; Yu, J.; Li, F.; Yang, R.; Wang, Y.; Peng, Z. Automatic bridge crack detection using unmanned aerial vehicle and faster R-CNN. Constr. Build. Mater. 2023 , 362 , 129659. [ Google Scholar ] [ CrossRef ]
  • Tran, T.S.; Nguyen, S.D.; Lee, H.J.; Tran, V.P. Advanced crack detection and segmentation on bridge decks using deep learning. Constr. Build. Mater. 2023 , 400 , 132839. [ Google Scholar ] [ CrossRef ]
  • Zhang, J.; Qian, S.; Tan, C. Automated bridge crack detection method based on lightweight vision models. Complex Intell. Syst. 2023 , 9 , 1639–1652. [ Google Scholar ] [ CrossRef ]
  • Ren, R.; Liu, F.; Shi, P.; Wang, H.; Huang, Y. Preprocessing of crack recognition: Automatic crack-location method based on deep learning. J. Mater. Civ. Eng. 2023 , 35 , 04022452. [ Google Scholar ] [ CrossRef ]
  • Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on gpr investigation and improved mask R-CNN. Autom. Constr. 2023 , 146 , 104689. [ Google Scholar ] [ CrossRef ]
  • Li, Z.; Zhu, H.; Huang, M. A deep learning-based fine crack segmentation network on full-scale steel bridge images with complicated backgrounds. IEEE Access 2021 , 9 , 114989–114997. [ Google Scholar ] [ CrossRef ]
  • Alipour, M.; Harris, D.K.; Miller, G.R. Robust pixel-level crack detection using deep fully convolutional neural networks. J. Comput. Civ. Eng. 2019 , 33 , 04019040. [ Google Scholar ] [ CrossRef ]
  • Wang, S.; Pan, Y.; Chen, M.; Zhang, Y.; Wu, X. FCN-SFW: Steel structure crack segmentation using a fully convolutional network and structured forests. IEEE Access 2020 , 8 , 214358–214373. [ Google Scholar ] [ CrossRef ]
  • Hang, J.; Wu, Y.; Li, Y.; Lai, T.; Zhang, J.; Li, Y. A deep learning semantic segmentation network with attention mechanism for concrete crack detection. Struct. Health Monit. 2023 , 22 , 3006–3026. [ Google Scholar ] [ CrossRef ]
  • Sun, Y.; Yang, Y.; Yao, G.; Wei, F.; Wong, M. Autonomous crack and bughole detection for concrete surface image based on deep learning. IEEE Access 2021 , 9 , 85709–85720. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Leng, Z.; Zhang, Z. A weakly-supervised transformer-based hybrid network with multi-attention for pavement crack detection. Constr. Build. Mater. 2024 , 411 , 134134. [ Google Scholar ] [ CrossRef ]
  • Chen, T.; Cai, Z.; Zhao, X.; Chen, C.; Liang, X.; Zou, T.; Wang, P. Pavement crack detection and recognition using the architecture of segNet. J. Ind. Inf. Integr. 2020 , 18 , 100144. [ Google Scholar ] [ CrossRef ]
  • Bai, S.; Ma, M.; Yang, L.; Liu, Y. Pixel-wise crack defect segmentation with dual-encoder fusion network. Constr. Build. Mater. 2024 , 426 , 136179. [ Google Scholar ] [ CrossRef ]
  • Wang, W.; Su, C. Semi-supervised semantic segmentation network for surface crack detection. Autom. Constr. 2021 , 128 , 103786. [ Google Scholar ] [ CrossRef ]
  • Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020 , 31 , 759–776. [ Google Scholar ] [ CrossRef ]
  • König, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Optimized deep encoder-decoder methods for crack segmentation. Digit. Signal Process. 2021 , 108 , 102907. [ Google Scholar ] [ CrossRef ]
  • Wang, C.; Liu, H.; An, X.; Gong, Z.; Deng, F. Swincrack: Pavement crack detection using convolutional swin-transformer network. Digit. Signal Process. 2024 , 145 , 104297. [ Google Scholar ] [ CrossRef ]
  • Lan, Z.-X.; Dong, X.-M. Minicrack: A simple but efficient convolutional neural network for pixel-level narrow crack detection. Comput. Ind. 2022 , 141 , 103698. [ Google Scholar ] [ CrossRef ]
  • Salton, G. Introduction to Modern Information Retrieval ; McGraw-Hill: New York, NY, USA, 1983. [ Google Scholar ]
  • Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A deep convolutional neural network for semantic pixel-wise segmentation of road and pavement surface cracks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; IEEE: Piscataway, NJ, USA; pp. 2120–2124. [ Google Scholar ]
  • Tsai, Y.-C.; Chatterjee, A. Comprehensive, quantitative crack detection algorithm performance evaluation system. J. Comput. Civ. Eng. 2017 , 31 , 04017047. [ Google Scholar ] [ CrossRef ]
  • Li, H.; Wang, J.; Zhang, Y.; Wang, Z.; Wang, T. A study on evaluation standard for automatic crack detection regard the random fractal. arXiv 2020 , arXiv:2007.12082. [ Google Scholar ]

Click here to enlarge figure

MethodFeaturesDomainDatasetImage Device/SourceResultsLimitations
Canny and YOLOv4 [ ]Crack detection and measurementBridges1463 images
256 × 256 pixels
Smartphone and DJI UAVAccuracy = 92%
mAP = 92%
The Canny edge detector is affected by the threshold
Canny and GM-ResNet [ ]Crack detection, measurement, and classificationRoad522 images
224 × 224 pixels
Concrete crack sub-datasetPrecision = 97.9%
Recall = 98.9%
F1 measure = 98.0%
Accuracy in shadow conditions = 99.3%
Accuracy in shadow-free conditions = 99.9%
Its detection performance for complex cracks is not yet perfect
Sobel and ResNet50 [ ]Crack detectionConcrete4500 images
100 × 100 pixels
FLIR E8Precision = 98.4%
Recall = 88.7%
F1 measure = 93.2%
-
Sobel and BARNet [ ]Crack detection and localizationRoad206 images
800 × 600 pixels
CrackTree200 datasetAIU = 19.85%
ODS = 79.9%
OIS = 81.4%
Hyperparameter tuning is needed to balance the penalty weights for different types of cracks
Canny and DeepLabV3+ [ ]Crack detectionRoad2000 × 1500 pixelsCrack500 datasetMIoU = 77.64%
MAE = 1.55
PA = 97.38%
F1 score = 63%
Detection performance deteriorating in dark environments or when interfering objects are present
Canny and RetinaNet [ ]Crack detection and measurementRoad850 images
256 × 256 pixels
SDNET 2018 datasetPrecision = 85.96%
Recall = 84.48%
F1 score = 85.21%
-
Canny and Transformer [ ]Crack detection and segmentationBuildings11298 images
450 × 450 pixels
UAVsGA = 83.5%
MIoU = 76.2%
Precision = 74.3%
Recall = 75.2%
F1 score = 74.7%
Resulting in a marginal increment in computational costs for various network backbones
Canny and Inception-ResNet-v2 [ ]Crack detection, measurement, and classificationHigh-speed railway4650 images
400 × 400 pixels
The track inspection vehicleHigh severity level:
Precision = 98.37%
Recall = 93.82%
F1 score = 95.99%
Low severity level:
Precision = 94.25%
Recall = 98.39%
F1 score = 96.23%
Only the average width was used to define the severity of the crack, and the influence of the length on the detection result was not considered
Canny and Unet [ ]Crack detectionBuildings165 images-SSIM = 14.5392
PSNR = 0.3206
RMSE = 0.0747
Relies on a large amount of mural data for training and enhancement
MethodFeaturesDomainDatasetImage Device/SourceResultsLimitations
Otsu and Keras classifier [ ]Crack detection, measurement, and classificationConcrete4000 images
227 × 227 pixels
Open dataset availableClassifiers accuracy = 98.25%, 97.18%, 96.17%
Length error = 1.5%
Width error = 5%
Angle of orientation error = 2%
Only accurately quantify one single crack per image
Otsu and TL MobileNetV2 [ ]Crack detection, measurement, and classificationConcrete11435 images
224 × 224 pixels
Mendeley data—crack detectionAccuracy = 99.87%
Recall = 99.74%
Precision = 100%
F1 score = 99.87%
Dependency on image quality
Otsu, YOLOv7, Poisson noise, and bilateral filtering [ ]Crack detection and classificationBridges500 images
640 × 640 pixels
DatasetTraining time = 35 min
Inference time = 8.9 s
Target correct rate = 85.97%
Negative sample misclassification rate = 42.86%
It does not provide quantified information such as length and area
Adaptive threshold and WSIS [ ]Crack detectionRoad320 images
3024 × 4032 pixels
Photos of cracksRecall = 90%
Precision = 52%
IoU = 50%
F1 score = 66%
Accuracy = 98%
For some small cracks (with a width of less than 3 pixels), model can only identify the existence of small cracks, but it is difficult to depict the cracks in detail
Adaptive threshold and U-GAT-IT [ ]Crack detectionRoad300 training images and237 test imagesDeepCrack datasetRecall = 79.3%
Precision = 82.2%
F1 score = 80.7%
Further research is needed to address the interference caused by factors such as small cracks, road shadows, and water stains
Local thresholding and DCNN [ ]Crack detectionConcrete125 images
227 × 227 pixels
CamerasAccuracy = 93%
Recall = 91%
Precision = 92%
F1 score = 91%
-
Otsu and Faster R-CNN [ ]Crack detection, localization, and quantificationConcrete100 images
1920 × 1080 pixels
Nikon d7200 camera and Galaxy s9 cameraAP = 95%
mIoU = 83%
RMSE = 2.6 pixels
Length accuracy = 93%
The proposed method is useful for concrete cracks only; its applicability for the detection of other crack materials might be limited
Adaptive Dynamic Thresholding
Module (ADTM) and Mask DINO [ ]
Crack detection and segmentationRoad395 images
2000 × 1500 pixels
Crack500mIoU = 81.3%
mAcc = 96.4%
gAcc = 85.0%
ADTM module can only handle binary classification problems
Dynamic Thresholding Branch and DeepCrack [ ]Crack detection and classificationBridges3648 × 5472 pixelsCrack500mIoU = 79.3%
mAcc = 98.5%
gAcc = 86.6%
Image-level thresholds lead to misclassification of the background
MethodFeaturesDomainDatasetImage Device/SourceResultsLimitations
Morphological closing operations and Mask R-CNN [ ]Crack detectionTunnel761 images
227 × 227 pixels
MTI-200aBalanced accuracy = 81.94%
F1 score = 68.68%
IoU = 52.72%
Relatively small compared to the needs of the required sample size for universal conditions
Morphological operations and Parallel ResNet [ ]Crack detection and measurementRoad206 images (CrackTree200)
800 × 600 pixels
and 118 images (CFD)
320 × 480 pixels
CrackTree200 dataset and CFD datasetCrackTree200:
Precision = 94.27%
Recall = 92.52%
F1 = 93.08%
CFD:
Precision = 96.21%
Recall = 95.12%
F1 = 95.63%
The method was only performed on accurate static images
Closing and CNN [ ]Crack detection, measurement, and classificationConcrete3208 images
256 × 256 pixels
or
128 × 128 pixels
Hand-held DSLR camerasRelative error = 5%
Accuracy > 95%
Loss < 0.1
The extraction of the cracks’ edge will have a larger influence on the results
Dilation and TunnelURes [ ]Crack detection, measurement, and classificationTunnel6810 images
image sizes vary 10441 × 2910 to 50739 × 3140
Night 4K line-scan camerasAUC = 0.97
PA = 0.928
IoU = 0.847
The medial-axis skeletonization algorithm created many errors because it was susceptible to the crack intersection and the image edges where the crack’s representation changed
Opening, closing, and U-Net [ ]Crack detection, measurement, and classificationConcrete200 images
512 × 512 pixels
Canon SX510 HS cameraPrecision = 96.52%
Recall = 93.73%
F measure = 96.12%
Accuracy = 99.74%
IoU = 78.12%
It can only detect the other type of cracks which have the same crack geometry as that of thermal cracks
Morphological operations and DeepLabV3+ [ ]Crack detection and measurementMasonry structure200 images
780 × 355 pixels
and
2880 × 1920 pixels
Internet, drones,
and smartphones
IoU = 0.97
F1 score = 98%
Accuracy = 98%
The model will not detect crack features that do not appear in the dataset (complicated cracks, tiny cracks, etc.)
Erosion, texture analysis techniques, and InceptionV3 [ ]Crack detection and classificationBridges1706 images
256 × 256 pixels
CamerasF1 score = 93.7%
Accuracy = 94.07%
-
U-Net, opening, and closing operations [ ]Crack detection and segmentationBridges244 images
512 × 512 pixels
CamerasmP = 44.57%
mR = 53.13%
Mf1 = 42.79%
mIoU = 64.79%
The model lacks generality, and there are cases of false detection
Sensor TypeFusion MethodAdvantagesDisadvantagesApplication Scenarios
Optical sensor [ ]Data-level fusionHigh resolution, rich in detailsSusceptible to light and occlusionSurface crack detection, general environments
Thermal sensor [ ]Feature level fusionSuitable for nighttime or low-light environments, detects temperature changesLow resolution, lack of detailNighttime detection, heat-sensitive areas, large-area surface crack detection
Laser sensor [ ]Data-level fusion and feature level fusionHigh-precision 3D point cloud data, accurately measures crack morphologyHigh equipment cost, complex data processingComplex structures, precise measurements
Strain sensor [ ]Feature level fusion and decision-level fusionHigh sensitivity to structural changes; durableRequires contact with the material; installation complexityMonitoring structural health in bridges and buildings; detecting early-stage crack development
Ultrasonic sensor [ ]Data-level fusion and feature level fusionDetects internal cracks in materials, strong penetrationAffected by material and geometric shape, limited resolutionInternal cracks, metal material detection
Optical fiber sensor [ ]Feature level fusionHigh sensitivity to changes in material properties, non-contact measurementAffected by environmental conditions, requires calibrationSurface crack detection, structural health monitoring
Vibration sensor [ ]Data-level fusionDetects structural vibration characteristics, strong adaptabilityAffected by environmental vibrations, requires complex signal processingDynamic crack monitoring, bridges and other structures
Multispectral satellite sensor [ ]Data-level fusionRich spectral informationLimited spectral resolution, weather- and lighting-dependent,
high cost
Pavement crack detection, bridge and infrastructure monitoring, building facade inspection
High-resolution satellite sensors [ ]Data-level fusion and feature level fusionHigh spatial resolution, wide coverage, frequent revisit times, rich information contentWeather dependency, high cost, data processing complexity, limited temporal resolutionRoad and pavement crack detection, bridge and infrastructure monitoring, urban building facade inspection, railway and highway crack monitoring
ScaleDataset/(Pixels × Pixels)References
Image-based227 × 227[ , , , ]
224 × 224[ ]
256 × 256[ ]
416 × 416[ ]
512 × 512[ ]
Patch-based128 × 128[ , ]
200 × 200[ ]
224 × 224[ , , , , ]
227 × 227[ ]
256 × 256[ , ]
300 × 300[ , ]
320 × 480[ , ]
544 × 384[ ]
512 × 512[ , , , ]
584 × 384[ ]
ModelImprovement/InnovationDatasetBackboneResults
Faster R-CNN [ ]Combined with drones for crack detection2000 images
5280 × 2970 pixels
VGG-16Precision = 92.03%
Recall = 96.26%
F1 score = 94.10%
Faster R-CNN [ ]Double-head structure is introduced, including an independent fully connected head and a convolution head1622 images
1612 × 1947 pixels
ResNet50AP = 47.2%
Mask R-CNN [ ]The morphological closing operation was incorporated into the M-R-101-FPN model to form an integrated model761 images
227 × 227 pixels
ResNets and VGGBalanced accuracy = 81.94%
F1 score = 68.68%
IoU = 52.72%
Mask R-CNN [ ]PAFPN module and edge detection branch was introduced9680 images
1500 × 1500 pixels
ResNet-FPNPrecision = 92.03%
Recall = 96.26%
AP = 94.10%
mAP = 90.57%
Error rate = 0.57%
Mask R-CNN [ ]FPN structure introduces side join method and combines FPN with ResNet-101 to change RoI-Pooling layer to RoI-Align layer3430 images
1024 × 1024 pixels
ResNet101AP = 83.3%
F1 score = 82.4%
Average error = 2.33%
mIoU = 70.1%
YOLOv3-tiny [ ]A structural crack detection and quantification method combined with structured light is proposed500 images
640 × 640 pixels
Darknet-53Accuracy = 94%
Precision = 98%
YOLOv4 [ ]Some lightweight networks were used instead of the original backbone feature extraction network, and DenseNet, MobileNet, and GhostNet were selected for the lightweight networks800 images
416 × 416 pixels
DenseNet, MobileNet v1, MobileNet v2, MobileNet v3, and GhostNetPrecision = 93.96%
Recall = 90.12%
F1 score = 92%
YOLOv4 [ ]-1463 images
256 × 256 pixels
Darknet-53Accuracy = 92%
mAP = 92%
Datasets NameNumber of ImagesImage ResolutionManual AnnotationScope of ApplicabilityLimitations
CrackTree200 [ ]206 images800 × 600 pixelsPixel-level annotations for cracksCrack classification and segmentationWith only 200 images, the dataset’s relatively small size can hinder the model’s ability to generalize across diverse conditions, potentially leading to overfitting on the specific examples provided
Crack500 [ ]500 images2000 × 1500 pixelsPixel-level annotations for cracksCrack classification and segmentationLimited number of images compared to larger datasets, which might affect the generalization of models trained on this dataset
SDNET 2018 [ ]56000 images256 × 256 pixelsPixel-level annotations for cracksCrack classification and segmentationThe dataset’s focus on concrete surfaces may limit the model’s performance when applied to different types of surfaces or structures
Mendeley data—crack detection [ ]40000 images227 × 227 pixelsPixel-level annotations for cracksCrack classificationThe dataset might not cover all types of cracks or surface conditions, which can limit its applicability to a wide range of real-world scenarios
DeepCrack [ ]2500 images512 × 512 pixelsAnnotations for cracksCrack segmentationThe resolution might limit the ability of models to capture very small or subtle crack features
CFD [ ]118 images320 × 480 pixelsPixel-level annotations for cracksCrack segmentationThe dataset contains a limited number of data samples, which may limit the generalization ability of the model
CrackTree260 [ ]260 images800 × 600 pixels
and
960 × 720 pixels
Pixel-level labeling, bounding boxes, or other crack markersObject detection and segmentationBecause the dataset is small, it can be easy for the model to overfit the training data, especially if you’re using a complex model
CrackLS315 [ ]315 images512 × 512 pixelsPixel-level segmentation mask or bounding boxObject detection and segmentationThe small size of the dataset may make the model perform poorly in complex scenarios, especially when encountering different types of cracks or uncommon crack features
Stone331 [ ]331 images512 × 512 pixelsPixel-level segmentation mask or bounding boxObject detection and segmentationThe relatively small number of images limits the generalization ability of the model, especially in deep learning tasks where smaller datasets tend to lead to overfitting
IndexIndex Value and Calculation FormulaCurve
True positive -
False positive -
True negative -
False negative -
Precision PRC
Recall PRC, ROC curve
F1 score F1 score curve
Accuracy Accuracy vs. threshold curve
Average precision PRC
Mean average precision -
IoU IoU distribution curve, precision-recall curve with IoU thresholds
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Yuan, Q.; Shi, Y.; Li, M. A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges. Remote Sens. 2024 , 16 , 2910. https://doi.org/10.3390/rs16162910

Yuan Q, Shi Y, Li M. A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges. Remote Sensing . 2024; 16(16):2910. https://doi.org/10.3390/rs16162910

Yuan, Qi, Yufeng Shi, and Mingyue Li. 2024. "A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges" Remote Sensing 16, no. 16: 2910. https://doi.org/10.3390/rs16162910

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. How to Write a Literature Review in 5 Simple Steps

    literature review on video analysis

  2. A complete Guide to Literature Review in Research

    literature review on video analysis

  3. What's the Difference between a Literature Review, Systematic Review, and Meta-Analysis ?

    literature review on video analysis

  4. Literature Review Matrix 1

    literature review on video analysis

  5. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    literature review on video analysis

  6. How to Write a Literature Review

    literature review on video analysis

COMMENTS

  1. Performing Qualitative Content Analysis of Video Data in Social

    This paper proposes a comprehensive method for video analysis, outlining a series of concrete steps involved in video analysis, and explaining in detail how to extract and present video data in ways that maximize evaluation of the content.

  2. A Systematic Review of Single-Case Research on Video Analysis as

    Abstract Studies using video analysis are being reported more frequently in the literature. Although the body of research suggests that video analysis is effective for changing educators' instructional practices, questions regarding for whom and under what circumstances it is most effective still remain. This meta-analysis reports on the overall effectiveness of video analysis when used with ...

  3. Video Data Analysis: A Methodological Frame for a Novel Research Trend

    Her research areas include microsociology, violence, and video data analysis. Among other issues, she is interested in collective action, qualitative research design, symbolic interaction, emotions, and criminal behavior.

  4. 34025 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on VIDEO ANALYSIS. Find methods information, sources, references or conduct a literature review on VIDEO ...

  5. Digital Records of Practice: A Literature Review of Video Analysis in

    Digital Records of Practice: A Literature Review of Video Analysis in Teacher Practice Michael M. Rook Instructional Systems, Learnin g & Performance Systems College of Education The Pennsylvania ...

  6. 34025 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on VIDEO ANALYSIS. Find methods information, sources, references or conduct a literature review on VIDEO ...

  7. Intelligent video surveillance: a review through deep learning

    The paper begins with identifying the main outcomes of video analysis. Application areas where surveillance cameras are unavoidable are discussed. Current status and trends in video analysis are revealed through literature review. Finally the vital points which need more consideration in near future are explicitly stated.

  8. Predictive Video Analytics in Online Courses: A Systematic Literature

    The systematic literature review undertaken in the current study aims to address this research gap by presenting the current state of the art in predictive video analytics of online courses, including publication trends, resources, measures, and analysis techniques.

  9. A Literature Review on Video Analytics of Crowded Scenes

    This chapter presents a review and systematic comparison of the state of the art on crowd video analysis. The rationale of our review is justified by a recent increase in intelligent video surveillance algorithms capable of analysing automatically visual streams of...

  10. A Literature Review on Video Analytics of Crowded Scenes

    This chapter presents a review and systematic comparison of the state of the art on crowd video analysis, divided into two broad categories: the macroscopic and the microscopic modelling approach. This chapter presents a review and systematic comparison of the state of the art on crowd video analysis. The rationale of our review is justified by a recent increase in intelligent video ...

  11. PDF A Literature Review on Video Analytics of Crowded Scenes

    The aim of this chapter is to propose a critical review of existing literature pertaining to the au-tomatic analysis of complex and crowded scenes. The literature is divided into two broad categories: the macroscopic and the microscopic modelling approach.

  12. PDF Assessing the Impact of Educational Video on Student Engagement

    cept itself is not well-defined across the literature and full of definitional inconsistencies. An analysis of 113 peer-reviewed articles relating to student engagement in higher education within the specific context of "technology mediated learning" (with video as one of the five most studied technologies) carried out by Henrie et al ...

  13. Video Processing Using Deep Learning Techniques: A Systematic

    The prominent fields of video processing research are observed as human action recognition, crowd anomaly detection, and behavior analysis. This SLR is a helpful guide for the researchers to explore the recent literature, available datasets, and existing deep learning techniques for video processing. Systematic Literature Review Process.

  14. Video

    A literature review is not just a summary of these writings; it's also a critical analysis of the state of research on your chosen topic. A good literature review provides context for your own research. It summarizes the state of existing research on your topic; helps identify gaps in the literature; provides a theoretical foundation for your ...

  15. A Literature Review of Video-Sharing Platform Research in HCI

    This paper contributes a scoping review of 106 articles on video-sharing published in HCI literature from 2012 to June 2022. We identified six research themes through grounded theory analysis and encoded five HCI research methods in VSP studies.

  16. Using video to support teachers' reflective practice : A literature review

    This article thus presents a literature review on the use of video for the professional development of teachers, particularly regarding their ability to reflect on their own teaching practices.

  17. An evaluation of education videos for women experiencing domestic and

    The current review will aim to further understand the characteristics, methods of evaluation, and outcomes of DFV video education interventions for perinatal women. The review will be reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) statement.

  18. A systematic literature review of Internet of Video Things: Trends

    SLR (Systematic Literature Review) is described as a process of identifying, specify and analyze all research publications in the field of interest to find out the answer of Research Questions (RQ), identified from the literature survey.

  19. A Systematic Literature Review of Analytics for Adaptivity Within

    Hence, this literature review cannot make clear conclusions as to the efficacy of analytics for adaptation in educational video games. However, a recent and more general review on adaptive learning technologies in general reached a more positive verdict on the effectiveness of adaptation ( Aleven et al., 2016 ).

  20. Resource Guides: Literature Reviews: Tutorials and resources

    Literature review tutorials There are many helpful Literature Review video tutorials online. Here is an excellent, succinct (10 min) introduction to how to succeed at a literature review: Literature Reviews: An Overview for Graduate Students from NC State University Libraries on Vimeo.

  21. Adherence and Retention Rates to Home-Based Video Exercise Programs in

    Introduction: This systematic review and meta-analysis aimed to investigate adherence and retention rates to home-based video exercise programs and identify key factors associated with these rates in older adults to understand the effectiveness of home-based video exercise interventions. Methods: We searched PubMed, Web of Science, and Scopus for articles addressing adherence to and retention ...

  22. A practical guide to data analysis in general literature reviews

    The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  23. Video analytics using deep learning for crowd analysis: a review

    This paper has reviewed different approaches, techniques and frameworks used for crowd analysis in video monitoring and surveillance specifically in crowd analysis based on Hajj video surveillance. First, the paper provides a brief discussion of the existing deep learning frameworks.

  24. Fast fashion consumption and its environmental impact: a literature review

    The goal is to provide a comprehensive and insightful picture of fast fashion and its environmental impact. The study employs a scientometric analysis based on a systematic literature review to construct maps of existing knowledge. This article pursues three primary objectives.

  25. Video Structure Analysis: A survey

    Therefore, video structure analysis is the basis for facilitating the. video se arch process based on its contents, indexing, and retrieval. In this. research, we will present a survey of the ...

  26. Prevalence and Management of Complications of Percutaneous Ethanol

    Results: The literature search yielded 1189 studies, of which 48 studies were included in the systematic review and meta-analysis, in addition to our institutional experience (3670 CTNs in total). The overall quality of each included study was judged as fair.

  27. Sustainability

    Specifically, this paper takes urban road traffic accidents as the target layer and based on the existing literature and the real traffic accident data and the analysis of influencing factors, the human factors, vehicle factors, road factors, and environmental factors are taken as the first-level indicators, and then the human factor is ...

  28. A Review of Computer Vision-Based Crack Detection Methods in Civil

    Therefore, detecting and analyzing cracks in civil infrastructures can effectively determine the extent of damage, which is crucial for safe operation. In this paper, Web of Science (WOS) and Google Scholar were used as literature search tools and "crack", "civil infrastructure", and "computer vision" were selected as search terms.

  29. Video: 'He reads Republican': Reporter explains why Harris chose Walz

    CNN special correspondent Jamie Gangel and CNN political commentator Alyssa Farah Griffin provide analysis on Vice President Kamala Harris' decision to choose Minnesota governor Tim Walz as her ...

  30. Writing a literature review

    A formal literature review is an evidence-based, in-depth analysis of a subject. There are many reasons for writing one and these will influence the length and style of your review, but in essence a literature review is a critical appraisal of the current collective knowledge on a subject. Rather than just being an exhaustive list of all that ...