A Statistical Comparison of Feature Selection Techniques for Solar Energy Forecasting Based on Geographical Data
In recent years, solar energy forecasting has been increasingly embraced as a sustainable low-energy solution to environmental awareness. It is a subject of interest to the scientific community, and machine learning techniques have proven to be a powerful means to construct an automatic learning model for an accurate prediction. Along with the various machine learning and data mining utilities applied to solar energy prediction, the process of feature selection is becoming an ultimate requirement for improving model building efficiency. In this paper, we consider the feature selection (FS) approach potential. We provide a detailed taxonomy of various feature selection techniques and examine their usability and ability to deal with a solar energy forecasting problem, given meteorological and geographical data. We focus on filter-based, wrapper-based, and embedded-based feature selection methods. We use the reduced number of selected features, stability, and regression accuracy and compare feature selection techniques. Moreover, the experimental results demonstrate how the feature selection methods studied can considerably improve the prediction process and how the selected features vary by method, depending on the given data constraints.
Keywordsfeature selection, filter method, wrapped method, embedded method, solar energy forecasting, regression performance, smart environment,
References1. M. Diagne, M. David, Ph. Lauret, J. Boland, N. Schmut, Review of solar irradiance forecasting
methods and a proposition for small-scale insular grids, Renewable and Sustainable
Energy Reviews, 27: 65–76, 2013, doi: 10.1016/j.rser.2013.06.042.
2. H. Liu, L. Yu, Toward integrating feature selection algorithms or classification and clustering,
IEEE Trans. on Knowledge and Data Engineering, 17(4): 491–502, 2005, doi:
3. M.A. Hall, Correlation-based feature selection for discrete and numeric class machine
learning, [in:] Proceedings of the Seventeenth International Conference on Machine Learning,
ICML ’00, pp. 359–366, Morgan Kaufmann Publishers Inc., 2000.
4. M. Dash et al., Feature selection for clustering – a filter solution, [in:] Proceedings of the
2002 IEEE International Conference on Data Mining, ICDM ’02, pp. 115–122, Washington,
DC, USA, IEEE Computer Society, 2002.
5. Y. Saeys, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics,
Bioinformatics, 23(19): 2507–2517, 2007 doi: 10.1093/bioinformatics/btm344.
6. R. Kohavi, G.H. John, Wrappers for feature subset selection, Artificial Intelligence,
97(1–2): 273–324, 1997, doi: https://doi.org/10.1016/S0004-3702(97)00043-X.
7. L. Rangarajan, Veerabhadrappa. Bi-level dimensionality reduction methods using feature
selection and feature extraction, International Journal of Computer Applications, 4(2):
8. I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine
Learning Research, 3: 1157–1182, 2003.
9. R. Mundry, C.L. Nunn, Stepwise model fitting and statistical inference: turning noise into
signal pollution, The American Naturalist, 173(1): 119–123, 2009, doi: 10.1086/593303.
10. J Reunanen, Overfitting in making comparisons between variable selection methods, Journal
of Machine Learning Research, 3:1371–1382, 2003.
11. J. Cai, J. Luo, S.Wang, S. Yang, Feature selection in machine learning: A new perspective,
Neurocomputing, 300: 70–79, 2018, doi: 10.1016/j.neucom.2017.11.077.
12. J. Brownlee, Data Preparation for Machine Learning: Data Cleaning, Feature Selection,
and Data Transforms in Python, Machine Learning Mastery, 2020.
13. J. Li et al., Feature selection: A data perspective, ACM Computing Surveys, 50(6): 1–45,
2017, doi: 10.1145/3136625.
14. G. Georgiev, I. Valova, N. Gueorguieva, Feature selection for multiclass problems
based on information weights, Procedia Computer Science, 6: 189–194, 2011, doi:
15. L. Wang, Y. Wang, Q. Chang, Feature selection methods for big data bioinformatics:
A survey from the search perspective, Methods, 111: 21–31, 2016, doi: 10.1016/
16. P. Drotár, J. Gazda, Z. Smékal, An experimental comparison of feature selection methods
on two-class biomedical datasets, Computers in Biology and Medicine, 66: 1–10, 2015,
17. S. Khalid, T. Khalil, S. Nasreen, A survey of feature selection and feature extraction
techniques in machine learning, [in:] 2014 Science and Information Conference, pp. 372–
378, Aug. 2014, doi: 10.1109/SAI.2014.6918213.
18. W. Awada, T.M. Khoshgoftaar, D. Dittman, R. Wald, A. Napolitano, A review of the
stability of feature selection techniques for bioinformatics data, [in:] 2012 IEEE 13th
International Conference on Information Reuse & Integration (IRI), pp. 356–363, 2012,
19. R. Martin, R. Aler, J.M. Valls, I.M. Galvan, Machine learning techniques for daily solar
energy prediction and interpolation using numerical weather models, Concurrency and
Computation: Practice and Experience, 28(4): 1261–1274, 2016, doi: 10.1002/cpe.3631.
20. R. Aler, R. Martín, J.M. Valls, I.M. Galván, A study of machine learning techniques
for daily solar energy forecasting using numerical weather models, [in:] D. Camacho,
L. Braubach, S. Venticinque, C. Badica [Eds], Intelligent Distributed Computing VIII,
Studies in Computational Intelligence, Vol. 570, pp. 269–278, Springer International Publishing,
2015, doi: 10.1007/978-3-319-10422-5_29.
21. D. O’Leary, J. Kubby, Feature selection and ANN solar power prediction, Journal of
Renewable Energy, 2017: 1–7, 2017, doi: 10.1155/2017/2437387.
22. O. Abedinia, N. Amjady, N. Ghadimi, Solar energy forecasting based on hybrid neural
network and improved metaheuristic algorithm, Computational Intelligence, 34(1): 241–
260, 2018, doi: 10.1111/coin.12145.
23. L. Zhang, J. Wen, A systematic feature selection procedure for short-term data-driven
building energy forecasting model development, Energy and Buildings, 183: 428–442, 2019,
24. O. Garcia-Hinde et al., Feature selection in solar radiation prediction using bootstrapped
SVRs, [in:] 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3638–3645,
2016, doi: 10.1109/CEC.2016.7744250.
25. M.R. Hossain, A.M.T. Oo, A.B.M.S. Ali, The effectiveness of feature selection method in
solar power prediction, Journal of Renewable Energy, 2013, Article ID: 952613, 9 pages,
2013, doi: 10.1155/2013/952613.
26. C. Lazar et al., A survey on filter techniques for feature selection in gene expression
microarray analysis, IEEE/ACM Transactions on Computational Biology and Bioinformatics,
9(4): 1106–1119, 2012, doi: 10.1109/TCBB.2012.33.
27. A. Kraskov, H. Stögbauer, P. Grassberger, Estimating mutual information, Physical Review
E, 69: 066138, 2004, doi: 10.1103/PhysRevE.69.066138.
28. R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal
Statistical Society: Series B (Methodological), 58(1): 267–288, 1996, doi: 10.1111/j.2517-
29. L. Breiman, Random Forests, Machine Learning, 45(1): 5–32, 2001, doi: 10.1023/
30. Open Power System Data – A platform for open data of the European power system,
31. A.-C. Haury, P. Gestraud, J.-P. Vert, The influence of feature selection methods on accuracy,
stability and interpretability of molecular signatures, PloS ONE, 6(12): e28210,
2011, doi: 10.1371/journal.pone.0028210.
32. P. Mohana Chelvan, K. Perumal, A survey on feature selection stability measures, International
Journal of Computer and Information Technology, 5(1): 98–103, 2016.
33. U.M. Khaire, R. Dhanalakshmi, Stability of feature selection algorithm: A review,
Journal of King Saud University – Computer and Information Sciences, 2019, doi: