This paper explores when and why graph attention mechanisms are effective in node classification tasks.
The theoretical analysis reveals that the effectiveness of graph attention mechanisms depends on the relative levels of structure noise and feature noise in graphs.
In situations where structure noise exceeds feature noise, graph attention mechanisms enhance classification performance.
A novel multi-layer Graph Attention Network (GAT) architecture is proposed, which outperforms single-layer GATs in achieving perfect node classification.