PCA & K-Means for Traffic Data in Python

A naukri.com initiative

New

PCA & K-Me...

Towards Data Science

204

Image Credit: Towards Data Science

Principal Component Analysis (PCA) can be used in traffic data to detect anomalies or to capture the patterns of a transit station's traffic history.
PCA can be applied to reduce dimensionality and can be used for machine learning tasks including clustering, classification, and regression.
The Taipei Metro Rapid Transit System, Hourly Traffic Data was used to keep only weekday data, with most interesting patterns during weekdays.
PCA helps to identify when traffic trends of different stations are most representative, e.g. commute hours to cluster stations.
PCA output matrices include Z and W, where the latter can be thought of as weights on each feature or hour, and the former as the representations of stations.
The 3 principal components generated with PCA resulted in PC_1 weighting more on night hours, PC_2 weighting more at noon, and PC_3 about morning time.
Stations are clustered based on passenger distributions among the 3 periods, with K-Means being used in this article.
Taipei Main Station is a huge transit hub, with a high-traffic pattern during morning and evening periods, while Taipei Zoo station has fewer people in either period due to few residents living in its area.
Fine-tuning hyper-parameters of K-Means can help in better grouping of stations.
The article presents examples of how PCA can be used for machine learning analysis, specifically for clustering transit stations depending on traffic patterns in different periods.

Read Full Article

12 Likes

For uninterrupted reading, download the app