menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Robotics News

>

Nearly 80%...
source image

Unite

4w

read

88

img
dot

Image Credit: Unite

Nearly 80% of Training Datasets May Be a Legal Hazard for Enterprise AI

  • A recent paper from LG AI Research reveals that 80% of open datasets used for training AI models may pose legal risks due to hidden copyrighted material and licensing terms.
  • The paper suggests implementing AI-based compliance agents to scan dataset histories for legal issues faster and more accurately than human lawyers.
  • Only 21% of datasets labeled as commercially usable were deemed legally safe for commercialization after in-depth analysis.
  • Companies developing AI models are facing challenges in navigating uncertain legal landscapes regarding dataset copyright and licensing.
  • Transparency around dataset sources is becoming a critical issue, with concerns arising about hidden copyrighted data in training datasets.
  • Initiatives are emerging to ensure license compliance in datasets, but the new research indicates errors and uncertainties in dataset licenses.
  • The Nexus Data Compliance framework proposed in the paper leverages AI-driven tools like AutoCompliance to assess legal risks and compliance across dataset dependencies.
  • AutoCompliance demonstrated superior accuracy and efficiency in identifying dependencies and license terms compared to human experts, highlighting its potential in ensuring dataset compliance.
  • Dataset investigation revealed numerous cases of non-compliant dataset redistribution, both explicitly prohibited and involving conflicting license conditions.
  • The study emphasizes the need for clear identification of non-compliance in datasets to avoid legal consequences and suggests ongoing improvements to AI-driven legal review processes.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app