# Interpretability versus Explainability: Classification for Understanding Deep Learning Systems and Models

### Abstract

The techniques of explainability and interpretability are not alternatives for many realworld problems, as recent studies often suggest. Interpretable machine learning is not a subset of explainable artificial intelligence or vice versa. While the former aims to build glass-box predictive models, the latter seeks to understand a black box using an explanatory model, a surrogate model, an attribution approach, relevance importance, or other statistics. There is concern that definitions, approaches, and methods do not match, leading to the inconsistent classification of deep learning systems and models for interpretation and explanation. In this paper, we attempt to systematically evaluate and classify the various basic methods of interpretability and explainability used in the field of deep learning. One goal of this paper is to provide specific definitions for interpretability and explainability in Deep Learning. Another goal is to spell out the various research methods for interpretability and explainability through the lens of the literature to create a systematic classifier for interpretability and explainability in deep learning. We present a classifier that summarizes the basic techniques and methods of explainability and interpretability models. The evaluation of the classifier provides insights into the challenges of developing a complete and unified deep learning framework for interpretability and explainability concepts, approaches, and techniques.

### Keywords

explainable artificial intelligence, interpretable machine learning, deep learning, deep neural networks, interpretability, explainability,### References

1. D. Gunning, Explainable artificial intelligence (XAI), Technical Report, Defense Advanced Research Projects Agency (DARPA), 2017.2. A. Adadi, M. Berrada, Peeking inside the black-box: A survey on explainable artificial intelligence (XAI), IEEE Access, 6: 52138–52160, 2018.

3. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25: 1097–1105, 2012.

4. Y.A. LeCun, Y. Bengio, G.E. Hinton, Deep learning, Nature, 521(7553): 436–444, 2015.

5. W. Xiong, L. Wu, F. Alleva, F. Droppo, X. Huang, A. Stolcke, The Microsoft 2017 conversational speech recognition system, [in:] 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5934–5938, 2018.

6. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, [in:]: IEEE CVPR, pp. 1725–1732, 2014.

7. E. Tjoa, C. Guan, A survey on explainable artificial intelligence (XAI): Towards medical XAI, arXiv, 2019, arXiv:1907.07374v5.

8. G. Montavon, Gradient-based vs. propagation-based explanations: An axiomatic comparison, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer, pp. 253–265, 2019.

9. L.H. Gilpin, D. Bau, B.Z. Yuan, A. Bajwa, M. Specter, L. Kagal, Explaining explanations: An approach to evaluating interpretability of machine learning, arXiv, 2019, arXiv:1806.00069v3.

10. J. Seo, J. Choe, J. Koo, S. Jeon, B. Kim, T. Jeon, Noise-adding methods of saliency map as series of higher order partial derivative, arXiv, 2018, arXiv:1806.03000v1.

11. D. Castelvecchi, Can we open the black box of AI?, Nature News, 538(7623): 20, 2016.

12. H. Lakkaraju, R. Caruana, E. Kamar, J. Leskovec, Interpretable & explorable approximations of black box models, arXiv, 2017, arXiv:1707.01154v1.

13. D.W. Apley, J. Zhu, Visualizing the effects of predictor variables in black box supervised learning models, arXiv, 2019, arXiv:1612.08468v2.

14. V. Burhrmester, D. Münch, M. Arens, Analysis of explainers of black box deep neural networks for computer vision: A survey, arXiv, 2019, arXiv:1911.12116v1.

15. C. Rudin, Stop explaining black box machine learning models for high stake decisions and use interpretable models instead, arXiv, 2019, arXiv:1811.10154v3.

16. D. Pedreshi, F. Giannotti, R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, Meaningful explanations of black box AI decision systems, [in:] The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.

17. A. Rai, Explainable AI: From black box to glass box, Journal of the Academy of Marketing Science, 48, 137–141, 2020, doi: 10.1007/s11747-019-00710-5.

18. W. Samek, G. Montavon, A. Vedaldi, L.K. Hansen, K.-R. Müller [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer, 2019.

19. W. Landecker, M.D. Thomure, L.M.A. Bettencourt, M. Mitchell, G.T. Kenyon, S.P. Brumby, Interpreting individual classifications of hierarchical networks, [in:] IEEE Symposium on Computational Intelligence, pp. 32–38, 2013.

20. P. Chen, W. Dong, J. Wang, X. Lu, U. Kaymak, Z. Huang, Interpretable clinical prediction via attention-based neural network, BMC Medical Informatics and Decision Making, 20(Suppl 3): 131, 2020, doi: 10.1186/s12911-020-1110-7.

21. Y. Shen et al., To explain or not to explain: A study on the necessity of explanations for autonomous vehicles, arXiv, 2020, arXiv:2006.11684v1.

22. D. Slack, S. Hilgard, E. Jia, S. Singh, H. Lakkaraju, Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods, arXiv, 2020, arXiv:1911.02508v2.

23. A. Holzinger, Interactive machine learning for health informatics: When do we need the human-in-the-loop?, Brain Informatics, 3: 119–131, 2016, doi: 10.1007/s40708-016-0042-6.

24. K.R. Varshney, Trustworthy machine learning and artificial intelligence, ACM XRDS Magazine, 25(3): 26–29, 2019.

25. F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning, arXiv, 2017, arXiv:1702.08608v2.

26. W. Yuan, P. Liu, G. Neubig, Can we automate scientific reviewing?, arXiv, 2021, arXiv:2102.00176v1.

27. V. Arya et al., One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques, arXiv, 2019, arXiv:1909.03012v2.

28. A.B. Arrieta et al., Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities, and challenges towards AI, Information Fusion, 58: 82–115, 2020.

29. S. Hooker, D. Erhan, P.-J. Kindermans, B. Kim, A benchmark for interpretability methods in deep neural networks, [in:] H. Wallach et al. [Eds.], Advances in Neural Information Processing Systems, Vol. 32, 2019.

30. U. Bhatt, A. McKane, A. Weller, A. Xiang, Machine learning explainability for external stakeholders, [in:] IJCAI – PRICAI Workshop on Explainable Artificial Intelligence (XAI), January 8, 2021.

31. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.

32. J. Wang, M. Ren, I. Bogunovic, Y. Xiong, R. Urtasun, Cost-efficient online hyperparameter optimization, arXiv, 2021, arXiv:2101.06590v1.

33. Y. LeCun, L. Bottou, L. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86(11): 2278–2324, 1998.

34. B.B. Traore, B. Kamsu-Foguem, F. Tangara, Deep convolution neural network for image recognition, Ecological Informatics, 48: 257–268, 2018.

35. A. Esteva et al., Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542: 115–118, 2017.

36. M. Schwarz, A. Milan, A.S. Periyasamy, S. Behnke, RGB-D object detection and semantic segmentation for autonomous manipulation in clutter, The International Journal of Robotics Research, 37(4–5): 437-451, 2018, doi: 10.1177/0278364917713117.

37. J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, [in:] 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

38. Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdionov, W.W. Cohen, Review networks for caption generation, arXiv, 2016, arXiv:1605.07912v2.

39. J. Johnson, A. Karpathy, L. Fei-Fei, DenseCap: Fully convolutional localization networks for dense captioning, [in:] 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

40. H. Gao et al., Are you talking to a machine? Dataset and methods for multilingual image question answering, [in:] NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, pp. 2296–2304, 2015.

41. M. Ren, R. Kiros, R. Zemel, Exploring models and data for image question answering, [in:] Advances in Neural Information Processing Systems 28 (NIPS), pp. 1–9, 2015.

42. S. Antol et al., VQA: Visual question answering, [in:] Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.

43. M. Malinowski, M. Rohrbach, M. Fritz, Ask your neurons: A neural-based approach to answering questions about images, [in:] Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.

44. D. Gordon, A. Kembhavi, M. Rastegari, J. Redmon, D. Fox, A. Farhadi, IQA: Visual question answering in interactive environments, arXiv, 2017, arXiv:1712.03316.

45. A. Das, S. Datta, G. Gkioxari, S. Lee, D. Parikh, D. Batra, Embodied question answering, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

46. H. de Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, A.C. Courville, Guess what?! Visual object discovery through multi-modal dialogue, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

47. A. Das et al., Visual dialog, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

48. S.A. Bargal, A. Zunino, D. Kim, J. Zhang, V. Murion, S. Sclaroff, Excitation backprop for RNNs, arXiv, 2018, arXiv:1711.06778v3.

49. S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9(8): 1735–1780, 1997.

50. B. DuSell, D. Chiang, Learning context-free languages with nondeterministic stack RNNs, arXiv, 2020, arXiv:2010.04674v1.

51. M. Venzke, D. Klish, P. Kubik, A. Ali, J.D. Missier, Artificial neural networks for sensor data classification on small embedded systems, arXiv, 2020, arXiv:2012.08403v1.

52. L. Arras, G. Montavon, K.-R. Müller, W. Samek, Explaining recurrent neural network predictions in sentiment analysis, [in:] Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 159–168, 2017, doi: 10.18653/v1/w17-5221.3/1/.

53. S. Tao, Deep neural network ensambles, arXiv, 2019, arXiv:1904.05488v2.

54. W. Samek, T. Wiegand, K.-R. Müller, Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models, arXiv, 2017, arXiv:1708.08296v1.

55. P. Angelov, E. Soares, Towards explainable deep neural networks (xDNN), Neural Networks, 130: 185–194, 2020, doi: 10.1016/j.neunet.2020.07.010.

56. S.J. Oh, M. Augustin, B. Schiele, M. Fritz, Towards reverse-engineering black-box neural networks, [in:] 9th International Conference on Learning Representations, Vancouver, Canada, 30 April – 3 May 2018.

57. Z.C. Lipton, The mythos of model interpretability, arXiv, 2017, arXiv:1606.03490v3.

58. A. Holzinger, M. Plass, K. Holzinger, G.C. Crisan, C.-M. Pintea, V. Palade, A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop, arXiv, 2017, arXiv:1708.01104.

59. P.W. Koh, P. Liang, Understanding black-box predictions via influence functions, [in:] Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, Vol. 70, pp. 1885–1894, August 6–11, 2017.

60. D. Mascharka, P. Tran, R. Soklaski, A. Majumdar, Transparency by design: Closing the gap between performance and interpretability in visual reasoning, arXiv, 2018, arXiv:1803.05268v2.

61. V. Beaudouin et al., Flexible and context-specific AI explainability: A multidisciplinary approach, arXiv, 2020, arXiv:2003.07703v1.

62. K. Sokol, P. Flach, Explainability fact sheets: A framework for systematic assessment of explainable approaches, [in:] FAT* ’20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 56–67, 2020, doi: 10.1145/3351095.3372870.

63. F. Xu, H. Uszkoreit, Y. Du, W. Fan, D. Zhao, J. Zhu, Explainable AI: A brief survey on history, research areas, approaches and challenges, [in:] J. Tang et al. [Eds.], Natural Language Processing and Chinese Computing (NLPCC), Lecture Notes in Computer Science, Springer, Cham, Vol. 11839, 2019, doi: 10.1007/978-3-030-32236-6_51.

64. N.C. Thompson, K. Greenwald, K. Lee, G.F. Manso, The computational limits of deep learning, arXiv, 2020, arXiv:2007.05558v1.

65. S. Liu, X. Wang, M. Liu, J. Zhu, Towards better analysis of machine learning models: A visual analytics perspective, Visual Informatics, 1(1): 48–56, 2017, doi: 10.1016/j.visinf.2017.01.006.

66. F.F.J. Kameni, N. Tsopze, Simplifying explanation of deep neural networks with sufficient and necessary feature-sets: Case of text classification, arXiv, 2020, arXiv:2010.03724v2.

67. D. Erhan, A. Courville, Y. Bengio, Understanding representations learned in deep architectures, Department d’Informatique et Recherche Operationnelle, University of Montreal, QC, Canada, 2010.

68. M. Du, N. Liu, X. Hu, Techniques for interpretable machine learning, arXiv, 2019, arXiv:1808.00033v3.

69. S. Watcher, B. Mittelstadt, L. Floridi, Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation, International Data Privacy Law, 7(2): 76–99, 2017.

70. D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, G.-Z. Yang, XAI – Explainable artificial intelligence, Science Robotics, 4(37): eaay7120, 2019, doi: 10.1126/scirobotics.aay7120.

71. A. Holzinger, G. Langs, H. Denk, K. Zatlouk, H. Müller, Causability and explainability of artificial intelligence in medicine, WIREs Data Mining Knowledge Discovery, 9(4): e1312, 2019, doi: 10.1002/widm.1312.

72. G. Ras, M. van Gerven, P. Haselager, Explanation methods in deep learning: Users, values, concerns and challenges, arXiv, 2018, arXiv:1803.07517v2.

73. T. Kulesza, M. Burnett, W. Wong, S. Stumpf, Principles of explanatory debugging to personalize interactive machine learning, [in:] Proceedings of the 20th International Conference on Intelligent User Interfaces (ACM), pp. 126–137, 2015.

74. D. Wang, Q. Yang, A. Abdul, B.Y. Lim, Designing theory-driven user-centric explainable AI, [in:] Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (ACM), Paper no. 601, pp. 1–15, 2019.

75. C.C.S. Liem et al., Psychology meets machine learning: Interdisciplinary perspectives on algorithmic job candidate screening, [in:] Explainable and Interpretable Models in Computing Vision and Machine Learning, H.J. Escalante et al. [Eds.], Springer, Cham, pp. 197–253, 2018.

76. J. Hestness et al., Deep learning scaling is predictable, empirically, arXiv, 2017, arXiv:1712.00409v1.

77. B. Ginsburg et al., Training deep networks with stochastic gradient normalized by layer-wise adaptive second moments, [in:] OpenReview.net, 2019.

78. G. Montavon, W. Samek, K.-R. Müller, Methods for interpreting and understanding deep neural networks, Digital Signal Processing, 73: 1–15, 2018, doi: 10.1016/j.dsp.2017.10.011.

79. T. Miller, Explanation in artificial intelligence: Insight from the social sciences, Artificial Intelligence, 267: 1–38, 2019, doi: 10.1016/j.artint.2018.07.007.

80. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, [in:] 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23–28, 2014.

81. M. Ancona, E. Ceolini, C. Özitreli, M. Gross, Towards better understanding of gradient-based attribution methods for deep neural networks, arXiv, 2018, arXiv:1711.06104v4.

82. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation, [in:] D.E. Rumelhart, J.L. McClelland [Eds.], Parallel Distributing Processing, MIT Press, pp. 318–362, 1986.

83. P.-J. Kindermans et al., The (un)realibility of saliency methods, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, LNCS, Springer, Cham, Vol. 11700, pp. 267–280, 2019.

84. R. Roscher, B. Bohn, M.F. Duarte, J. Garcke, Explainable machine learning for scientific insights and discoveries, arXiv, 2020, arXiv:1905.08883v3.

85. M.T. Ribeiro, S. Singh, C. Guestrin, Why should I trust you?: Explaining the predictions of any classifier, [in:] ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, pp. 1135–1144, August 13–17, 2016.

86. M. Alber, Software and application patterns for explanation methods, arXiv, 2019, arXiv:1904.04734v1.

87. K.R. Varshey, H. Alemzadeh, On the safety of machine learning: Cyber-physical systems, decision science, and data products, arXiv, 2017, arXiv:1610.01256v2.

88. X. Zhao et al., A safety framework for critical systems utilising deep neural networks, arXiv, 2020, arXiv:2003.05311v3.

89. A. Weller, Transparency: Motivations and challenges, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer, pp. 23–40, 2019.

90. M. Raghu, E. Schmidt, A survey of deep learning for scientific discovery, arXiv, 2020, arXiv:2003.11755v1.

91. L.A. Hendricks, A. Rohrbach, B. Schiele, T. Darrell, Generating visual explanations with natural language, Applied AI Letters, 2(4): e55, pp. 1–16, 2021.

92. J.M. Oramas, K. Wang, T. Tuytelaars, Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks, arXiv, 2019, arXiv:1712.06302v3.

93. J. Kaplan et al., Scaling laws for natural language models, arXiv, 2020, arXiv:2001.08361v1.

94. G.G. Towell, J.W. Shavlik, Extracting refined rules from knowledge-based neural networks, Machine Learning, 13: 71–101, 1993.

95. C. Molnar, Interpretable machine learning: A guide for making black box models explainable, 1st ed., Leanpub, https://christophm.github.io/interpretable-ml-book/, accessed on 08.01.2021.

96. S. Kim, M. Jeong, B.C. Ko, Interpretation and simplification of deep forest, arXiv, 2020, arXiv:2001.04721v1.

97. W.-J. Nam, S. Gur, J. Choi, L. Wolf, S.-W. Lee, Relative attribution propagation: Interpreting the comparative contribution of individual units in deep neural networks, arXiv, 2019, arXiv:1904.00605v4.

98. L.K. Hansen, L. Rieger, Interpretability in intelligent systems – A new concept?, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, LNCS, Springer, Cham, Vol. 11700, pp. 41–49, 2019.

99. R. Fong, A. Vedaldi, Explanations for attributing deep neural network prediction, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, LNCS, Springer, Cham, pp. 149–168, 2019.

100. Q.V. Liao, D. Gruen, S. Miller, Questioning the AI: Information design practice for explainable AI user experiences, arXiv, 2020, arXiv:2001.02478v2.

101. W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asi, B. Yu, Interpretable machine learning: Definitions, methods, and applications, arXiv, 2019, arXiv:1901.04592v1.

102. Z. Yang, A. Zhang, A. Sudjianto, Enhancing explainability of neural networks through architecture constraints, arXiv, 2019, arXiv:1901.03838v2.

103. J. Vaughan, A. Sudjianto, E. Brahimi, J. Chen, V.N. Nair, Explainable neural networks based on additive index models, arXiv, 2018, arXiv:1806.01933v1.

104. S. Chauhan, L. Vig, M. De Filippo De Grazia, M. Corbetta, S. Ahmad, M. Zorzi, A comparison of shallow and deep learning methods for predicting cognitive performance of stroke patients from MRI lesion images, Frontiers in Neuroinformatics, 13: 53, 2019, doi: 10.3389/fninf.2019.00053.

105. N. Tintarev, Explaining recommendations, PhD Dissertation, University of Aberdeen, 2009.

106. G. Chrysostomou, N. Alertas, Improving the faithfulness of attention-based explanations with task-specific information for text classification, arXiv, 2021, arXiv:2105. 02657v2.

107. G. Vilone, L. Longo, Explainable artificial intelligence: A systematic review, arXiv, 2020, arXiv:2006.00093v3.

108. A. Papenmeier, G. Englebienne, C. Seifert, How model accuracy and explanation fidelity influence user trust, arXiv, 2019, arXiv:1907.12652v1.

109. H. Harutyunyan et al., Estimating informativeness of samples with smooth unique information, arXiv, 2021, arXiv:2101.06640v1.

110. S. Rivera, J. Klipfel, D. Weeks, Flexible deep transfer learning by separate feature embeddings and manifold alignment, arXiv, 2020, arXiv:2012.12302v1.

111. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, D. Pedreschi, F. Giannotti, A survey of methods for explaining black box models, arXiv, 2018, arXiv:1802.01933v3.

112. U. Kamath, J. Liu, Explainable artificial intelligence: An introduction to interpretable machine learning, Springer, 2021.

113. F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, S. Rinzivillo, Benchmarking and survey of explanation methods for black box models, arXiv, 2021, arXiv:2102.13076v1.

114. P. Linardatos, V. Papastefanopoulos, S. Kostianstis, Explainable AI: A review of machine learning interpretability methods, Entropy, 23(1): 18, 2021, doi: 10.3390/e23010018.

115. J. Zhou, A.H. Gandomi, F. Chen, A. Holzinger, Evaluating the quality of machine learning explanations: A survey on methods and metrics, Electronics, 10(5): 593, 2021, doi: 10.3390/electronics10050593.

116. V. Belle, I. Papantonis, Principles and practice of explainable machine learning, arXiv, 2020, arXiv:2009.11698v1.

117. O. Benchekroun, A. Rahimi, Q. Zhang, T. Kodliuk, The need for standardized explainability, arXiv, 2020, arXiv:2010.11273v2.

118. S. Chari, O. Seneviratne, D.M. Gruen, M.A. Foreman, A.K. Das, D.L. McGuinness, Explanation ontology: A model of explanations for user-centered AI, arXiv, 2020, arXiv:2010.01479v1.

119. A. Das, P. Rad, Opportunities and challenges in explainable artificial intelligence (XAI): A survey, arXiv, 2020, arXiv:2006.11371v2.

120. P. Hase, M. Bansal, Evaluating explainable AI: Which algorithmic explanations help users predict model behavior?, arXiv, 2020, arXiv:2005.01831v1.

121. N. Xie, G. Ras, M. van Gerven, D. Doran, Explainable deep learning: A field guide for the uninitiated, arXiv, 2020, arXiv:2004.14545v1.

122. D.V. Carvalho, E.M. Pereira, J.S. Cardoso, Machine learning interpretability: A survey on methods and metrics, Electronics, 8(8): 832, 2019, doi: 10.3390/electronics8080832.

123. K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv, 2013, arXiv:1312.6034.

124. L.M. Zintgraf, T.S. Cohen, S. Adel, M. Welling, Visualizing deep neural network decisions: Prediction difference analysis, arXiv, 2017, arXiv:1702.04595v1.

125. S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PLoS ONE, 10(7): e0130140, 2015, doi: 10.1371/journal.pone.0130140.

126. D. Bau, J.-Y. Zhu, H. Strobelt, A. Lapedriza, B. Zhou, A. Torralba, Understanding the role of individuals units in a deep neural network, arXiv, 2020, arXiv:2009.05041v2.

127. S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, K.-R. Müller, Unmasking Clever Hans predictors and assessing what machines really learn, Nature Communications, 10: 1096, 2019.

128. D. Bau, B. Zhou, A. Khosla, A. Oliva, A. Torralba, Network dissection: Quantifying interpretability of deep visual representations, arXiv, 2017, arXiv:1704.05796v1.

129. A. Lucieri, M.N. Bajwa, S.A. Braun, M.I. Malik, A. Dengel, S. Ahmed, On interpretability of deep learning based skin lesion classifiers using concept activation vectors, arXiv, 2020, arXiv:2005.02000v1.

130. D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, SmoothGrad: Removing noise by adding noise, arXiv, 2017, arXiv:1706.03825v1.

131. R.R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, arXiv, 2016, arXiv:1610.02391.

132. M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, arXiv, 2017, arXiv:1703.01365v2.

133. M. Munir, S.A. Siddiqui, F. Kusters, D. Mercier, A. Dengel, S. Ahmed, TSXplain: Demystification of DNN decisions for time-series using natural language and statistical features, arXiv, 2019, arXiv:1905.06175.

134. Z. Zhang, Y. Xie, F. Xing, M. McGough, L. Yang, MDNet: A semantically and visually interpretable medical image diagnosis network, [in:] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, pp. 6428–6436, July 21–26, 2017.

135. B. Kim et al., Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV), arXiv, 2018, arXiv:1711.11279v5.

136. A. Nguyen, M.R. Martínez, On quantitative aspects of model interpretability, arXiv, 2020, arXix:2007.07584v1.

137. I. Lage et al., An evaluation of the human-interpretability of explanation, arXiv, 2019, arXiv:1902.00006v2.

138. P. Cortez, M.J. Embrechts, Opening black box data mining models using sensitivity analysis, [in:] Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, April 11–15, 2011.

139. M. Sundararajan, A. Taly, Q. Yan, Gradients of counterfactuals, arXiv, 2016, arXiv: 1611.02639v2.

140. R.C. Fong, A. Vedaldi, Interpretable explanation of black boxes by meaningful perturbation, [in:] Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 3449–3457, October 22–29, 2017.

141. R. Sun, Optimization for deep learning: Theory and algorithms, arXiv, 2019, arXiv: 1912.08957v1.

142. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, arXiv, 2013, arXiv:1311.29013v3.

143. R. Fong, M. Parick, A. Vedaldi, Understanding deep networks via extremal perturbations and smooth masks, arXiv, 2019, arXiv:1910.08485v1.

144. V. Petsiuk, A. Das, K. Saenlo, RISE: Randomized input sampling for explanation of black-box models, arXiv, 2018, arXiv:1806.07421v3.

145. S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, arXiv, 2017, arXiv:1610.08401v3.

146. J. Li, W. Monroe, D. Jurafsky, Understanding neural networks through representation erasure, arXiv, 2017, arXiv:1612.08220v3.

147. D. Baehrens, D. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K.-R. Müller, How to explain individual classification decisions, Journal of Machine Learning Research, 11: 1803–1831, 2010.

148. A. Shrikumar, P. Greenside, A. Shcherbina, A. Kundaje, Not just a black box: Learning important features through propagation activation differences, arXiv, 2016, arXiv:1605.01713v2.

149. C. Szegedy et al., Intriguing properties of neural networks, arXiv, 2014, arXiv:1312.6199v4.

150. K. Dhamdhere, M. Sundararajan, Q. Yan, How important is a neuron?, arXiv, 2018, arXiv:1805.12233v1.

151. J.T. Springenberg, A. Dosovitsky, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, arXiv, 2015, arXiv:1412.6806v3.

152. K. Leino, S. Sen, A. Datta, M. Fredrikson, L. Li, Influence-directed networks explanations for deep convolutional, arXiv, 2018, arXiv:1802.03788v2.

153. A. Nguyen, J. Yosinski, J. Clune, Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, arXiv, 2016, arXiv:1602.03616v2.

154. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, arXiv, 2015, arXiv:1512.04150v1.

155. A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating activation differences, arXiv, 2019, arXiv:1704.02685.

156. J. Zhang, Z. Lin, J. Brandt, X. Shen, S. Sclaroff, Top-down neural attention by excitation backprop, arXiv, 2016, arXiv:1608.00507v1.

157. J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y. Lai, N. Davis, F. Nuflo, Modeling visual attention via selective tuning, Artificial Intelligence, 78: 507–545, 1994.

158. G. Liu, D. Gifford, Visualizing feature maps in deep neural networks using DeepResolve. A genomics case study, [in:] International Conference on Machine Learning 2017 – Workshop on Visualization for Deep Learning (ICML), Sydney, Australia, pp. 32–41, 2017.

159. A. Chattopadhyay, A. Sarkar, P. Howlader, V. Balasubramanian, Grad-CAM++: Improved visual explanations for deep convolutional networks, arXiv, 2018, arXiv:1710.11063v3.

160. S. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, [in:] 31st Conference of Neural Information Processing Systems (NIPS), Long Beach, CA, USA, December 4–9, 2017.

161. G. Montavon, S. Bach, A. Binder, W. Samek, K.-R. Müller, Explaining nonlinear classification decisions with deep Taylor decomposition, arXiv, 2015, arXiv:1512.02479v1.

162. L. Kirsch, J. Kunze, D. Barber, Modular networks: Learning to decompose computation, [in:] 32nd Conference on Neural Information Processing (NeurIPS), Montréal, Canada, December 2–8, 2018.

163. P. Manisha, C.V. Jawahar, S. Gujar, Learning optimal redistribution mechanisms through neural networks, arXiv, 2018, arXiv:1801.08808v1.

164. H. Tsukimoto, Extracting rules from trained neural networks, IEEE Trans Neural Network, 11(2): 377–389, 2000.

165. B. Zhou, D. Bau, A. Oliva, A. Torralba, Interpreting deep visual, representations via network dissection, arXiv, 2018, arXiv:1711.05611v2.

166. H. Li, J.G. Ellis, L. Zhang, S.-F. Chang, PatternNet: Visual pattern mining with deep neural network, arXiv, 2018, arXiv:1703.06339v2.

167. B. Zhou, D. Bau, A. Oliva, A. Torralba, Comparing the interpretability of deep networks via network dissection, [in:] W. Samek et al. [Eds.], Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, LNCS, Springer, Cham, Vol. 11700, pp. 243–252, 2019.

168. W. Samek, G. Montavon, S. Lapushkin, C.J. Anders, K.-R. Müller, Towards interpretable machine learning: Transparent deep neural networks and beyond, arXiv, 2020, arXiv:2003.07631v1.

169. M. Graziani, V. Andrearczyk, H. Müller, Regression concept vectors for bidirectional explanations in histopathology, [in:] D. Stoyanow et al. [Eds.], Understanding and Interpreting Machine Learning in Medical Image Computing Applications, Springer, pp. 124–132, 2018.

170. A. Ghorbani, J. Wexler, J. Zou, B. Kim, Towards automatic concept-based explanations, arXiv, 2019, arXiv:1902.03129v3.

171. D. Bau et al., GAN dissection: Visualizing and understanding generative adversarial networks, arXiv, 2018, arXiv:1811.10597v2.

172. J. Kauffmann, M. Esders, G. Montavon, W. Samek, K.-R. Müller, From clustering explanations via neural networks, arXiv, 2019, arXiv:1906.07633v1.

173. G. Jeon, H. Jeon, J. Choi, An efficient explorative sampling considering the generative boundaries of deep generative neural networks, arXiv, 2019, arXiv:1912.05827v1.

174. M.D. Zeiler, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, [in:] 13th International Conference on Computer Vision (ICCV), Barcelon, Spain, November 6–13, 2011.

175. P.-J. Kindermans, K.T. Schütt, M. Alber, K.-R. Müller, D. Erhan, B. Kim, Learning how to explain neural networks: PatternNet and PatternAttribution, arXiv, 2017, arXiv:1705.05598v2.

176. B.N. Oreshkin, D. Carpov, N. Chapados, Y. Bengio, N-BETAS: Neural basis expansion analysis for interpretable time series forecasting, arXiv, 2020, arXiv:1905.10437v4.

177. U. Schlegel, D. Oelke, D.A. Keim, M. El-Assady, An empirical study of explainable AI techniques on deep learning models for time series tasks, arXiv, 2020, arXiv:2012.04344v1.

178. G. Bologna, Y. Hayashi, Characterization of symbolic rules embedded in deep DIMLP networks: A challenge to transparency of deep learning, Journal of Artificial Intelligence and Soft Computing Research (JAISCR), 7(4): 265–286, 2017, doi: 10.1515/jaiscr-2017-0019.

179. D.R. Kuhn, R.N. Kacker, Y. Lei, D. Simos, Combinatorial methods for explainable AI, [in:] 9th International Workshop on Combinatorial Testing (IWCT), Porto, Portugal, March 23–27, 2020.

180. W. Samek, K.-R. Müller, Towards explainable artificial intelligence, arXiv, 2019, arXiv:1919.12072v1.

181. M.T. Ribeiro, S. Singh, C. Guestrin, Model-agnostic interpretability of machine learning, [in:] Proceedings of Human Interpretability in Machine Learning Workshop (WHI), New York, USA, 2016.

182. S. Liu et al., Actionable attribution maps for scientific machine learning, arXiv, 2020, arXiv:2006.16533v1.

183. M.T. Ribeiro, S. Singh, C. Guestrin, Anchors: High-precision model-agnostic explanations, [in:] Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA, pp. 1527–1535, February 2–7, 2018.

184. L. Bottou et al., Counterfactual reasoning and learning systems, arXiv, 2013, arXiv: 1209.2355.

185. K. Sokol, P. Flach, Counterfactual explanations of machine learning predictions: Opportunities and challenges for AI safety, [in:] Proceedings of the AAAI Workshop on Artificial Intelligence Safety, Vol. 2301, 2019.

186. P. Hall, On the art and science of explainable machine learning: Techniques, recommendations, and responsibilities, arXiv, 2020, arXiv:1810.02909v4.

187. E.R. Elenberg, A.G. Dimakis, M. Feldman, A. Karbasi, Streaming weak sub-modularity: Interpreting neural networks on the fly, arXiv, 2017, arXiv:1703.02647v3.

188. G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, arXiv, 2015, arXiv:1503.02531v1.

189. Q. Zhang, R. Cao, Y.N. Wu, S.-C. Zhu, Growing interpretable part graphs on ConvNets via multi-shot learning, arXiv, 2017, arXiv:1611.04246v2.

190. Q. Zhang, X.Wang, R. Cao, Y.N.Wu, F. Shi, S.-C. Zhu, Extracting an explanatory graph to interpret a CNN, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11): 3863–3877, 2020, doi: 10.1109/TPAMI.2020.2992207.

191. R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, F. Giannotti, Local rule-based explanations of black box decision systems, arXiv, 2018, arXiv:1805.10820.

192. J.R. Zilke, E.L. Mencía, F. Janssen, DeepRED – Rule extraction from deep neural networks, [in:] Discovery Science 19th International Conference Proceedings (LNAI), Vol. 9956, pp. 457–473, 2016.

193. M.G. Augasta, T. Kathirvalavakumar, Reverse engineering the neural networks for rule extraction in classification problems, Neural Processing Letters, 35(2): 131–150, 2012.

194. G. Su, D. Wei, K.R. Varshney, D.M. Malioutov, Interpretable two-level Boolean rule learning for classification, arXiv, 2016, arXiv:1511.07361v1.

195. W.J. Murdoch, A. Szlam, Automatic rule extraction from long short term memory networks, [in:] International Conference on Learning Representations, Toulon, France, April 23–26, 2017.

196. Y. Ming, H. Qu, E. Bertini, RuleMatrix: Visualizing and understanding classifiers with rules, arXiv, 2018, arXiv:1807.06228v1.

197. M. Sato, H. Tsukimoto, Rule extraction from neural networks via decision tree induction, [in:] Proceedings of International Joint Conference on Neural Networks (IJCNN), Cat. No. 01CH37222, Vol. 3, pp. 1870–1875, 2001, doi: 10.1109/IJCNN.2001.938448.

198. N. Frosst, G. Hinton, Distilling a neural network into a soft decision tree, arXiv, 2017, arXiv:1711.09784v1.

199. Q. Cao, X. Liang, K. Wang, L. Lin, Linguistic driven graph capsule network for visual question reasoning, arXiv, 2020, arXiv:2003.10065v1.

200. O. Bastani, C. Kim, H. Bastani, Interpretability via model extraction, arXiv, 2018, arXiv: 1706.09773v4.

201. S. Tan, R. Caruana, G. Hooker, P. Koch, A. Gordo, Learning global explanations for neural nets model distillation, arXiv, 2018, arXiv:1801.08640v2.

202. H. Bride, J. Dong, J.S. Dong, Z. Hóu, Towards dependable and explainable machine learning using automated reasoning, [in:] 20th International Conference on Formal Engineering Methods (ICFEM), Gold Coast, QLD, Australia, 2018.

203. S. Krishnan, E. Wu, PALM: Machine learning explanations for iterative debugging, [in:] Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017, doi: 10.1145/3077257.3077271.

204. C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, C. Zhong, Interpretable machine learning: Fundamental principles and 10 grand challenges, arXiv, 2021, arXiv:2103.11251v2.

205. J. Andreas, M. Rohrbach, T. Darell, D. Klein, Neural module networks, arXiv, 2017, arXiv:1511.02799v4.

206. R. Hu, M. Rohrbach, J. Andreas, T. Darrell, K. Saenko, Modeling relationship in referential expressions with compositional modular networks, arXiv, 2016, arXiv:1611.09978.

207. S. Sabour, N. Frost, G.E. Hinton, Dynamic routing between capsules, [in:] 31st Conference of Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 2017.

208. H. Xue, W. Chu, Z. Zhao, D. Cai, A better way to attend: Attention with tress for video question answering, arXiv, 2019, arXiv:1909.02218v1.

209. A. Vaswani et al., Attention is all you need, arXiv, 2017, arXiv:1706.03762v5.

210. X. Liu, K. Duh, L. Liu, J. Gao, Very deep transformers for neural machine translation, arXiv, 2020, arXiv:62008.07772v2.

211. H. Zhang, I. Goodfellow, D. Metaxas, A. Odena, Self-attention generative adversarial networks, arXiv, 2019, arXiv:1805.08318v2.

212. N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, arXiv, 2018, arXiv:1707.03141v3.

213. B. Hoover, H. Strobelt, S. Gehrmann, exBERT: A visual analysis tool to explore learned representations in transformers models, arXiv, 2019, arXiv:1910.05276.

214. R. He, W.S. Lee, H.T. Ng, D. Dahlmeier, Effective attention modeling for aspect-level sentiment classification, [in:] Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, August 20–26, 2018.

215. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv, 2019, arXiv:1810.04805v2.

216. G. Letarte, F. Paradis, P. Giguère, F. Laviolette, Importance of self-attention for sentiment analysis, [in:] Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 267–275, 2018.

217. E. Choi, M.T. Bahadori, J.A. Kulas, A. Schuetz, W.F. Stewart, J. Sun, RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism, arXiv, 2017, arXiv:1608.05745v4.

218. K. Xu et al., Show, attend and tell: Neural image caption generation with visual attention, arXiv, 2016, arXiv:1502.03044v3.

219. O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, arXiv, 2015, arXiv:1411.4555v2.

220. N. Xie, F. Lai, D. Doran, A. Kadav, Visual entailment: A novel task for fine-grained image understanding, arXiv, 2019, arXiv:1901.06706.

221. D.H. Park, L.A. Hendricks, Z. Akata, B. Schiele, T. Darrell, M. Rohrbach, Attentive explanations: Justifying decisions and pointing to the evidence, arXiv, 2016, arXiv:1612.04757.

222. D. Masharka, P. Tran, R. Soklaski, A. Majumdar, Transparency by design: Closing the gap between performance and interpretability in visual reasoning, arXiv, 2018, arXiv:1803.05268v2.

223. P. Anderson et al., Bottom-up and top-down attention for image captioning and visual question answering, arXiv, 2018, arXiv:1707.07998v3.

224. D. Teney, P. Anderson, X. He, A. van der Hengel, Tips and tricks for visual question answering: Learnings from the 2017 challenge, arXiv, 2017, arXiv:1708.02711v1.

225. L.A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, T. Darrell, Generating visual explanations, arXiv, 2016, arXiv:1603.08507.

226. J. Kim, A. Rohrbach, T. Darrell, J. Canny, Z. Akata, Textual explanations for self-driving vehicles, arXiv, 2018, arXiv:1807.11546v1.

227. H. Liu, Q. Yin, W.Y. Wang, Towards explainable NLP: A generative explanation framework for text classification, arXiv, 2019, arXiv:1811.00196v2.

228. R. Zellers, Y. Bisk, A. Farhadi, Y. Choi, From recognition to cognition: Visual commonsense reasoning, arXiv, 2019, arXiv:1811.10830v2.

229. O. Li, H. Liu, C. Chen, C. Rudin, Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions, arXiv, 2017, arXiv:1710.048062v2.

230. C. Chen, O. Li, C. Tao, A.J. Barnett, J. Su, C. Rudin, This looks like that: Deep learning for interpretable image recognition, arXiv, 2019, arXiv:1806.10574v5.

231. P. Hase, C. Chen, O. Li, C. Rudin, Interpretable image recognition with hierarchical prototypes, arXiv, 2019, arXiv:1906.10651v1.

232. T. Lei, R. Barzilay, T. Jaakkola, Rationalizing neural predictions, arXiv, 2016, arXiv:1606.04155.

233. D. Alvarez-Melis, T.S. Jaakkola, Towards robust interpretability with self-explaining neural networks, arXiv, 2018, arXiv:1806.07538v2.

234. Y. Dong, H. Su, J. Zhu, B. Zhang, Improving interpretability of deep neural networks with semantic information, arXiv, 2017, arXiv:1703.04096v2.

235. M. Alber et al., iNNvestigate neural networks!, arXiv, 2018, arXiv:1808.04260v1.

236. H. Nori, S. Jenkins, P. Koch, R. Caruana, InterpretML: A unified framework for machine learning interpretability, arXiv, 2019, arXiv:1909.09223v1.

237. T. Spinner, U. Schlegel, H. Schäfer, M. El-Assady, explAIner: A visual analytics framework for interactive and explainable machine learning, arXiv, 2019, arXiv:1908.00087v2.

**Computer Assisted Methods in Engineering and Science**, [S.l.], v. 29, n. 4, p. 297–356, july 2022. ISSN 2956-5839. Available at: <https://cames.ippt.pan.pl/index.php/cames/article/view/518>. Date accessed: 03 mar. 2024. doi: http://dx.doi.org/10.24423/cames.518.

This work is licensed under a Creative Commons Attribution 4.0 International License.