You can use knowledge graphs to enrich simpler features, but they are often too large and have over sensitivity, resulting in low specificity.Node embedding is a method of transforming binary features into a continuous, lower-dimensional vector space.PMI is used to evaluate the relevance of each edge in the Knowledge Graph based on its occurrence and the target variable.By removing irrelevant edges with low PMI, the sparsity of the graph can be increased intelligently.The hyperparameter alpha can be tuned to control the sparsity of the graph while trading off with generalization error.Caveat 1: Edges that exhibit sparsity and hold no information should not be removed.Caveat 2: Edge-variables can also be defined as an 'either-or' relationship than an 'and' relationship.Caveat 3: Conditional PMI can be used to check if an edge between two features is relevant when the first feature is positive.The use-case of medical Wikipedia history is used to get a better intuition for the need of sparsification.Normalized PMI, ranging between -1 and 1, is a notable variant of PMI.