The latest iteration of the Python Developer Survey, collected between November 2023 and February 2024, includes a new Data Science section which allowed for a more complete picture of trends over the past year.
Data processing is an essential part of data science and pandas is still at the top of most commonly used data processing tools, used by 77% of respondents.
Polars is gaining ground and been in the spotlight due to the advantages it provides in terms of speed and parallel processing, with 10% of respondents reportedly using the tool.
Plotly Dash was the most popular visualization dashboard tool, however, HoloViz Panel is gaining traction within the PyData community and could catch up within the next year.
Scikit-LMM is a new library worth paying attention to that allows you to tap into Open AI models to perform text analysis.
MLOps tools designed for data science projects have emerged and continue to progress with tools like Docker containers now slightly ahead of Anaconda in the Python installation and upgrade category.
Big data requires distributed computing resources such as Apache Spark and PySpark for better performance and scalability.
Python events like PyCon and EuroPython have shifted focus towards data science with more tracks, talks, and workshops catering to data science use cases.
The latest developments in the fields of data science and machine learning continue to rapidly change.
PyCharm is an integrated development environment that can help data scientists efficiently build their skill set, providing intelligent coding assistance, top-tier debugging, version control, integrated database management, and seamless Docker integration.