Enhancing Model Explainability and Performance via Attention-guided Gradient Restriction

Published in ECML-PKDD, 2026

In critical domains, both model explainability and predictive accuracy are indispensable for informed decision-making. In natural language processing, one approach to enhancing model interpretability is to train models on extracted rationales - snippets of input text that justify the output. However, this typically leads to a drop in accuracy compared to full-text learning. To increase both explainability and performance, we propose Gradient-restricted Learning through Attention-guided Selective Snippets (GLASS), a novel selective learning method that learns from extracted rationale features while preserving full-context information. In our framework, the model processes the entire input context during the forward pass, while restricting gradient flow to influential tokens identified by an extractor trained on a small rationale-annotated dataset. We demonstrate on the medical code prediction task that our approach outperforms the full-text learning baseline, surpasses an adversarial robustness training strategy, and achieves equivalent or higher scores to the supervised counterpart, while exceeding all of them in faithfulness metrics.
Link

Share on

Twitter Facebook LinkedIn