• Agustin Guerra Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.
  • Vivek Gadhiya Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.
  • Punyaanek Srisurin Transportation Institute, Chulalongkorn University, Bangkok, Thailand.



Machine Learning, Statistical Learning, Random Forest, Linear Regression, Support Vector Machine, Artificial Neural Network, Crash Prediction


This study adopted the Highway Safety Information System’s (HSIS) data for crashes occurred on road segments to develop supervised machine learning prediction models. Five machine learning models are developed: Linear Regression (LR), Generalize Additive Model (GAM), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). A comparison among the five model was performed using the root mean square error (RMSE) and the mean absolute error (MAE) as quality model indicators. The results indicated that the RF model was found to produce the best crash prediction results. The findings suggested that the increase in Annual Average Daily Traffic (AADT) exponentially increased the number of crashes on highway segments. In addition, roadway segments with the higher design speed induced the lower number of crashes, compared to the segments with the lower design speed. For segments of shorter than 5-mile long, the number of crashes rapidly increased as the segment length increased. However, there was no substantial increase in the number of crashes as the segment length increased for segments of longer than 5 miles. Also, the greater number of lanes on a roadway segment, the greater chance for increasing the number of crashes. Finally, the moderate grades showed the highest risk for occurrences of crashes, respectively followed by flat and rolling grades. These findings are useful for transportation professionals to consider when designing highways.

Author Biographies

Agustin Guerra , Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.

Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.

Vivek Gadhiya, Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.

Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, USA.


United States Department of Transportation, Fatality Analysis Reporting System (FARS). 2020. Available: [Accessed: May 2021]

R.M. Cunningham, M.A. Walton, and P.M. Carter, 2018. “The Major Causes of Death in Children and Adolescents in the United States,” New England Journal of Medicine, 379(25): 2468-2475, doi: 10.1056/nejmsr1804754.

V. Pasquale, G. Guido, V. Astarita, V. P. Giofrè, G. Guido, and A. Vitale, 2021. “Review of the Use of Traffic Simulation for the Evaluation of Traffic Safety Levels: Can We Use Simulation to Predict Crashes?” Transportation Research Procedia, 52: 244–251, doi: 10.1016/j.trpro.2021.01.028.

L. Wahab, and H. Jiang, 2019. “A Comparative Study on Machine Learning Based Algorithms for Prediction of Motorcycle Crash Severity,” PLoS ONE, 14(4): 1–17, doi: 10.1371/journal.pone.0214966.

FHWA. HSIS - Highway Safety Information System. Fhwa-Hrt-11-031.

V.R. Duddu, S.S. Pulugurtha, and V.M. Kukkapalli, 2020. “Variable Categories Influencing Single-Vehicle Run-off-Road Crashes and Their Severity,” Transportation Engineering, 2, October, doi: 10.1016/j.treng.2020.100038.

K. Wang, T. Bhowmik, S. Zhao, N. Eluru, and E. Jackson, 2021. “Highway Safety Assessment and Improvement through Crash Prediction by Injury Severity and Vehicle Damage Using Multivariate Poisson-Lognormal Model and Joint Negative Binomial-Generalized Ordered Probit Fractional Split Model,” Journal of Safety Research, 76: 44-55, doi: 10.1016/j.jsr.2020.11.005.

C. Dong, C. Shao, J. Li, and Z. Xiong, 2018. “An Improved Deep Learning Model for Traffic Crash Prediction,” Journal of Advanced Transportation, 2018, doi: 10.1155/2018/3869106.

S.P. Washington, M.G. Karlaftis, F. Mannering, and P. Anastasopoulos, Statistical and Econometric Methods for Transportation Data Analysis, 2nd Edition, CRC Press, New York, NY, USA, 2013.

S. Das, X. Sun, and M. Sun, 2021. “Rule-Based Safety Prediction Models for Rural Two-Lane Run-off-Road Crashes,” International Journal of Transportation Science and Technology, 10(3): 235-244, doi: 10.1016/j.ijtst.2020.08.001.

E. Hauer, 2014. The Art of Regression Modeling in Road Safety, Springer, New York, USA,

G. Casella, S. Fienberg, and I. Olkin, ed., 2006. Modern Mathematical Statistics with Applications, 2nd Edition, Springer, New York, NY, USA,

T. Hastie, R. Tibshirani, and J. Friedman, 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2th Edition, Springer, New York, NY, USA,

P.B. Silva, M. Andrade, and S. Ferreira, 2020. “Machine Learning Applied to Road Safety Modeling: A Systematic Literature Review,” Journal of Traffic and Transportation Engineering (English Edition), 7(6): 775–790, doi: 10.1016/j.jtte.2020.07.004.

C. Lyon, and B. Persaud, 2002 “Pedestrian Collision Prediction Models for Urban Intersections,” Transportation Research Record, 1818(1): 102-107. doi: 10.3141/1818-16.

H. Rakha, M. Arafeh, A.G. Abdel-Salam, F. Guo, and A.M. Flintsch, 2010. Linear Regression Crash Prediction Models: Issues and Proposed Solutions, VT-2008-02, Virginia Tech Transportation Institute, Blacksburg, Virginia, USA,

Y. Xie, and Y. Zhang, 2008. “Crash Frequency Analysis with Generalized Additive Models,” Transportation Research Record, 2061(1): 39-45, doi: 10.3141/2061-05.

S. Sittikariya, V. Shankar, and N. Venkataraman, 2009. “Modeling Heterogeneity: Traffic Accidents,” VDM-Verlag, Riga, Latvia,

F. Guo, X. Wang, and M.A. Abdel-Aty, 2010. “Modeling Signalized Intersection Safety with Corridor-Level Spatial Correlations,” Accident Analysis and Prevention, 42(1): 84-92, doi: 10.1016/j.aap.2009.07.005.

Y. Zhang, Y. Xie, and L. Li, 2012. “Crash Frequency Analysis of Different Types of Urban Roadway Segments Using Generalized Additive Model,” Journal of Safety Research, 43(2): 107-114, doi: 10.1016/j.jsr.2012.01.003.

M. Machsus, R. Basuki, and A.F. Mawardi, 2015. “Generalized Additive Models for Estimating Motorcycle Collisions on Collector Roads,” Procedia Engineering, 125: 411-416, doi: 10.1016/j.proeng.2015.11.105.

A. Khoda Bakhshi, and M.M. Ahmed, 2021 “Real-Time Crash Prediction for a Long Low-Traffic Volume Corridor Using Corrected-Impurity Importance and Semi-Parametric Generalized Additive Model,” Journal of Transportation Safety and Security, 1-35. doi: 10.1080/19439962.2021.1898069.

M.H. Pham, A. Bhaskar, E. Chung, and A.G. Dumont, 2010, “Random Forest Models for Identifying Motorway Rear-End Crash Risks Using Disaggregate Data,” Paper presented at The 13th International IEEE Conference on Intelligent Transportation Systems, IEEE, Funchal, Madeira Island, Portugal, 468-473, doi: 10.1109/ITSC.2010.5625003

X. Jiang, M. Abdel-Aty, J. Hu, and J. Lee, 2016. “Investigating Macro-Level Hotzone Identification and Variable Importance Using Big Data: A Random Forest Models Approach,” Neurocomputing, 181: 53-63, doi: 10.1016/j.neucom.2015.08.097.

X. Li, D. Lord, Y. Zhang, and Y. Xie, 2008. “Predicting Motor Vehicle Crashes Using Support Vector Machine Models,” Accident Analysis and Prevention, 40(4): 1611-1618, doi: 10.1016/j.aap.2008.04.010.

N. Dong, H. Huang, and L. Zheng, 2015. “Support Vector Machine in Crash Prediction at the Level of Traffic Analysis Zones: Assessing the Spatial Proximity Effects,” Accident Analysis and Prevention, 82: 192-198, doi: 10.1016/j.aap.2015.05.018.

J. Sun, and J. Sun, 2016. “Real-Time Crash Prediction on Urban Expressways: Identification of Key Variables and a Hybrid Support Vector Machine Model,” IET Intelligent Transport Systems, 10(5): 331-337, doi: 10.1049/iet-its.2014.0288.

H.T. Abdelwahab, and M.A. Abdel-Aty, 2002. “Artificial Neural Networks and Logit Models for Traffic Safety Analysis of Toll Plazas,” Transportation Research Record, 1784(1): 115-125, doi: 10.3141/1784-15

L. Y. Chang, 2005. “Analysis of Freeway Accident Frequencies: Negative Binomial Regression versus Artificial Neural Network,” Safety Science, 43(8): 541-557, doi: 10.1016/j.ssci.2005.04.004.

Y.C. Chiou, 2006. “An Artificial Neural Network-Based Expert System for the Appraisal of Two-Car Crash Accidents,” Accident Analysis and Prevention, 38(4): 777-785, doi: 10.1016/j.aap.2006.02.006.

C. Riviere, P. Lauret, J.F.M. Ramsamy, and Y. Page, 2006. “A Bayesian Neural Network Approach to Estimating the Energy Equivalent Speed,” Accident Analysis and Prevention, 38(2): 248-259, doi: 10.1016/j.aap.2005.08.008.

J. Kononov, B. Bailey, and B. K. Allery, 2008. “Relationships between Safety and Both Congestion and Number of Lanes on Urban Freeways,” Transportation Research Record, 2083(1): 26-39. doi: 10.3141/2083-04.

A. Abdulhafedh, “Crash Frequency Analysis,” 2016, Journal of Transportation Technologies, 6(4): 169–180, doi: 10.4236/jtts.2016.64017

J. Yuan, M. Abdel-Aty, Y. Gong, and Q. Cai, 2019. “Real-Time Crash Risk Prediction Using Long Short-Term Memory Recurrent Neural Network,” Transportation Research Record, 2673(4): 314-326, doi: 10.1177/0361198119840611.

C. Lee, B. Hellinga, and F. Saccomanno, 2003. “Real-Time Crash Prediction Model for Application to Crash Prevention in Freeway Traffic,” Transportation Research Record, 1840(1): 67-77, doi: 10.3141/1840-08

F. Chen, S. Chen, and X. Ma, 2016. “Crash Frequency Modeling Using Real-Time Environmental and Traffic Data and Unbalanced Panel Data Models,” International Journal of Environmental Research and Public Health, 13(6): 1-16, doi: 10.3390/ijerph13060609.

Q. Cai, M. Abdel-Aty, J. Yuan, J. Lee, and Y. Wu, 2020. “Real-Time Crash Prediction on Expressways Using Deep Generative Models,” Transportation Research Part C: Emerging Technologies, 117: (1-14) doi: 10.1016/j.trc.2020.102697.

C. Caliendo, M. Guida, and A. Parisi, 2007. “A Crash-Prediction Model for Multilane Roads,” Accident Analysis and Prevention, 39(4): 657-670, doi: 10.1016/j.aap.2006.10.012.

T. Chen, C. Zhang, and L. Xu, 2016. “Factor Analysis of Fatal Road Traffic Crashes with Massive Casualties in China,” Advances in Mechanical Engineering, 8(4): 1-11, doi: 10.1177/1687814016642712.

P.C. Anastasopoulos, and F.L. Mannering, 2011. “An Empirical Assessment of Fixed and Random Parameter Logit Models Using Crash- and Non-Crash-Specific Injury Data,” Accident Analysis and Prevention, 43(3): 1140-1147, doi: 10.1016/j.aap.2010.12.024.

M.H. Islam, L. Teik Hua, H. Hamid, and A. Azarkerdar, 2019. “Relationship of Accident Rates and Road Geometric Design,” In: IOP Conference Series: Earth and Environmental Science, IOP Publishing, Kuala Lumpur, Malaysia,

M.A. Abdel-Aty, and A.E. Radwan, 2000. “Modeling Traffic Accident Occurrence and Involvement,” Accident Analysis and Prevention, 32(5): 633-642, doi: 10.1016/S0001-4575(99)00094-9.

R.B. Noland, and L. Oh, 2004. “The Effect of Infrastructure and Demographic Change on Traffic-Related Fatalities and Crashes: A Case Study of Illinois County-Level Data,” Accident Analysis and Prevention, 36(4): 525-532, doi: 10.1016/S0001-4575(03)00058-7.

A.J. Anarkooli, M. Hosseinpour, and A. Kardar, 2017. “Investigation of Factors Affecting the Injury Severity of Single-Vehicle Rollover Crashes: A Random-Effects Generalized Ordered Probit Model,” Accident Analysis and Prevention, 106: 399-410, doi: 10.1016/j.aap.2017.07.008.

D.D. Clarke, P. Ward, C. Bartle, and W. Truman, 2010. “Killer Crashes: Fatal Road Traffic Accidents in the UK,” Accident Analysis and Prevention, 42(2):764-770, doi: 10.1016/j.aap.2009.11.008.

A. Tavakoli Kashani, A. Shariat Mohaymany, and A. Ranjbari, 2012. “Analysis of Factors Associated with Traffic Injury Severity on Rural Roads in Iran,” Journal of Injury and Violence Research, 4(1): 36-41, doi: 10.5249/jivr.v4i1.67.

C. Siddiqui, M. Abdel-Aty, and K. Choi, “Macroscopic Spatial Analysis of Pedestrian and Bicycle Crashes,” Accident Analysis and Prevention, Vol. 45, pp.382-391, 2012. doi: 10.1016/j.aap.2011.08.003.

V. Ratanavaraha, and S. Suangka, 2014. “Impacts of Accident Severity Factors and Loss Values of Crashes on Expressways in Thailand,” IATSS Research, 37(2): 130-136, doi: 10.1016/j.iatssr.2013.07.001.

Y. Wang, and W. Zhang, 2017. “Analysis of Roadway and Environmental Factors Affecting Traffic Crash Severities,” Transportation research procedia, 25: 2119-2125, doi: 10.1016/j.trpro.2017.05.407

American Association of State Highway and Transportation Officials, 2010. Highway Safety Manual, 1st Edition, AASHTO, Washington D.C., USA,

P. Royston, 2005. “Multiple Imputation of Missing Values: Update of Ice,” The Stata Journal, 5(4): 527–536, doi: 10.1177/1536867X0900900308

F. Noghrehchi, Missing Data with MICE, 2016. Available: [Accessed: Mar 2021]

UCLA. UCLA IDRE Statistical Consulting. [Accessed: Mar 2021]

MICE. Data from: Data Management in R Imputing Missing Data with R [dataset], MICE Package. [Accessed: Mar 2021]

R.A. Irizarry, 2019. Introduction to Data Science: Data Analysis and Prediction Algorithms with R, CRC Press, Boca Raton, Florida, USA,

D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, F. Leisch, C.C. Chang, and C.C. Lin. 2020. CRAN Package “E1071.” CRAN - Package, [Accessed: Feb 2021]

T. Vanderbilt, 2009. Traffic: Why We Drive the Way We Do (and What It Says About Us), Penguin Group, New York, New York, USA,

P. Srisurin, and S. Chalermpong, 2021 “Analyzing Human, Roadway, Vehicular and Environmental Factors Contributing to Fatal Road Traffic Crashes in Thailand,” Engineering Journal, 25(10): 27–38,. doi: 10.4186/ej.2021.25.10.27

S.A. Sarm, and K. Kanitpong, 2016. “Analysis of factors affecting the severity of motorcycle casualties in Phnom Penh using a Bayesian approach,” Asian transport studies, 4(2): 430-443, doi: 10.11175/eastsats.4.430

A. Iranitalab, and A. Khattak, 2017. “Comparison of Four Statistical and Machine Learning Methods for Crash Severity Prediction,” Accident Analysis and Prevention, 108: 27–36, doi: 10.1016/j.aap.2017.08.008.




How to Cite

Guerra , A., Gadhiya, V., & Srisurin, P. . (2022). CRASH PREDICTION ON ROAD SEGMENTS USING MACHINE LEARNING METHODS. ASEAN Engineering Journal, 12(3), 27-37.