Enhancing Model Explainability and Performance via Attention-guided Gradient Restriction

Published in ECML-PKDD, 2026

In critical domains, both model explainability and predictive accuracy are indispensable for informed decision-making. In natural language processing, one approach to enhancing model interpretability is to train models on extracted rationales - snippets of input text that justify the output. However, this typically leads to a drop in accuracy compared to full-text learning. To increase both explainability and performance, we propose Gradient-restricted Learning through Attention-guided Selective Snippets (GLASS), a novel selective learning method that learns from extracted rationale features while preserving full-context information. In our framework, the model processes the entire input context during the forward pass, while restricting gradient flow to influential tokens identified by an extractor trained on a small rationale-annotated dataset. We demonstrate on the medical code prediction task that our approach outperforms the full-text learning baseline, surpasses an adversarial robustness training strategy, and achieves equivalent or higher scores to the supervised counterpart, while exceeding all of them in faithfulness metrics.
Link