Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Fernandez, Carlos; Provost, Foster; Han, Xintian

Computer Science > Machine Learning

arXiv:2001.07417v2 (cs)

[Submitted on 21 Jan 2020 (v1), revised 5 Feb 2020 (this version, v2), latest version 13 Oct 2021 (v5)]

Title:Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Authors:Carlos Fernandez, Foster Provost, Xintian Han

View PDF

Abstract:Lack of understanding of the decisions made by model-based AI systems is an important barrier for their adoption. We examine counterfactual explanations as an alternative for explaining AI decisions. The counterfactual approach defines an explanation as a set of the system's data inputs that causally drives the decision (meaning that removing them changes the decision) and is irreducible (meaning that removing any subset of the inputs in the explanation does not change the decision). We generalize previous work on counterfactual explanations, resulting in a framework that (a) is model-agnostic, (b) can address features with arbitrary data types, (c) can explain decisions made by complex AI systems that incorporate multiple models, and (d) is scalable to large numbers of features. We also propose a heuristic procedure to find the most useful explanations depending on the context. We contrast counterfactual explanations with another alternative: methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME). This paper presents two fundamental reasons why explaining model predictions is not the same as explaining the decisions made using those predictions, suggesting we should carefully consider whether importance-weight explanations are well-suited to explain decisions made by AI systems. Specifically, we show that (1) features that have a large importance weight for a model prediction may not actually affect the corresponding decision, and (2) importance weights are insufficient to communicate whether and how features influence system decisions. We demonstrate this with several examples, including three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than feature importance weights.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2001.07417 [cs.LG]
	(or arXiv:2001.07417v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2001.07417

Submission history

From: Carlos Fernandez [view email]
[v1] Tue, 21 Jan 2020 09:58:58 UTC (592 KB)
[v2] Wed, 5 Feb 2020 13:28:14 UTC (593 KB)
[v3] Sat, 9 May 2020 03:30:11 UTC (593 KB)
[v4] Tue, 1 Jun 2021 21:52:22 UTC (629 KB)
[v5] Wed, 13 Oct 2021 07:50:39 UTC (627 KB)

Computer Science > Machine Learning

Title:Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators