menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

Questions ...
source image

Dev

1M

read

27

img
dot

Image Credit: Dev

Questions Recognition System using NLP-BERT from Un-labeled Data

  • The article showcases a NLP-BERT based questions recognition system that categorizes un-labeled question data into specific groups or clusters without the need for labeled data.
  • The system involves loading a dataset containing questions, cleaning the text using regular expressions, and preprocessing it with the BERT natural language processing model to create embeddings.
  • The embeddings are then clustered using the K-means algorithm, following which they are manually assigned a category for easy interpretation.
  • This is followed by plotting the reduced features of the questions using PCA to visualize clusters.
  • The final category results are exported to CSV, and metrics are used to evaluate clustering quality.
  • The article also provides insight on how this system can help evaluate product/customer success through feedback and work on improving existing issues.
  • Libraries like 're', 'pandas', and 'sklearn' are used for cleaning, data manipulation, and clustering.
  • The project also leverages BERT natural language processing library along with GPUs for fast processing.
  • A mapping of cluster labels to descriptive categories is used and sample verification is done for more accurate clustering.
  • The goal is to extract the semantics of the text and simplify the mapping process for downstream applications.

Read Full Article

like

1 Like

For uninterrupted reading, download the app