Data integration projects before starting with Data Science and Machine Learning (DS/ML) may not be ideal, as integrating data without knowing its use can lead to unfit data for ML use cases.
It is suggested to integrate data on a use-case-per-use-case basis by working backwards to identify the required data, optimizing value for money in integration efforts.
Drivers for premature data integration include difficulty in identifying AI/ML use cases due to unknown data availability, but this can be better solved by communication and dialogue within teams.
Integrating data without clarity on the ML use case may result in unnecessary data integration leading to increased cost and storage of unused data.
Cultural barriers to data sharing can be better addressed by involving relevant team members in projects and fostering communication rather than mandating data integration.
Setting up a data platform strategy and creating a catalog of dataset descriptions for search can be a cost-effective data discovery method for ML projects.
Data integration should focus on necessities for each use case, and solving organizational, political, and technical challenges prior to ML projects can help tackle data access issues.
In summary, tackling data integration by prioritizing use-case-based integration, fostering communication among teams, and utilizing low-cost data discovery tools can enhance the success of ML projects.