The role of data engineer is like building and maintaining the utilities for a house. It is essential to make the house functional. Building and maintaining the plumbing ensure everything flows smoothly. The role of data analyst is like people who decide how to use these utilities effectively. Data engineers are responsible for maintaining access to structured and up-to-date data.
A junior data engineer working for an e-scooter service company must collect data from external sources like weather and flight information. These external data sources are necessary to make informed decisions to manage the e-scooter locations and meet the changing demand of people.
There are two common approaches to extract the external data 'Pump from a river: Web Scraping, Connect to a central water supply: APIs'. Sometimes, the website uses JavaScript to load content dynamically, making it harder to extract. In the API approach, computers or programs communicate with each other. The APIs have API keys to authenticate requests to ensure that only authorized users can access the service.
Cleaning and organizing raw data into a usable format is known as Transformation. The most interesting transformation is converting colonial data from DMS format to decimal format which made data compatible with APIs.
Loading the cleaned data into a relational database for easy access and analysis is the final step. Excel is disconnected and inefficient as the system grows. MySQL is like a modern plumbing system that delivers water to every part of the house seamlessly and reliably. It is scalable, efficient and keeps everything linked together.
The automation process is essential to maintain up-to-date data. Information that rarely changes needs automation to ensure that the data is updated daily and accessible for everyone in the company. Job scheduler like cron is used to run the Python script daily to automate this process.
Data engineering lays the foundation for decision-makers to dive deeper, uncover insights, and build predictive models. The groundwork laid by data engineers allows data analysts to analyze the data using various techniques and models to make informed decisions.
Visit the author's GitHub repository to find all the scripts, functions, and documentation to help you dive deeper into the project, as he used in this project to extract, transform, and load data from external APIs.
The article resembles a basic learning project that enables a new data engineer to understand the process of extracting, transforming, and loading data from external APIs and websites. We learned how to collect weather and flight data to help an e-scooter company predict usage patterns and respond to challenges like weather changes or tourist activity.
Together, these roles (data engineers, data analysts, and data scientists) drive data-driven decision-making. The article inspires and explains the concepts in a relatable and straightforward way.