menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Robotics News

>

Jailbreaki...
source image

Unite

1d

read

150

img
dot

Image Credit: Unite

Jailbreaking Text-to-Video Systems with Rewritten Prompts

  • Researchers have tested a method for rewriting blocked prompts in text-to-video systems to bypass safety filters without altering their meaning, exposing the frailty of current safeguards.
  • Various closed-source video models aim to prevent users from generating unwanted content, but determined individuals have found ways to coerce systems into producing restricted material.
  • A new collaborative effort from Singapore and China has introduced an optimization-based jailbreak method for text-to-video models, successfully tricking systems like Kling by rewriting prompts discreetly.
  • The method focuses on rewriting prompts that circumvent safety filters while maintaining original meaning, achieved through an iterative optimization process with three objectives.
  • The approach, tested on platforms like Pika, Luma, Kling, and Open-Sora, outperformed previous methods in breaking system safeguards, highlighting limitations in current safety filters of text-to-video models.
  • The study conducted by eight researchers from various universities utilized ChatGPT-4o to rewrite prompts and bypass safety filters, showcasing the system's effectiveness in generating prompts that evade detection.
  • A prompt mutation strategy was implemented to enhance consistency in bypassing filters, leading the system to select prompts that remained effective across multiple uses.
  • The research methodology aimed to preserve original input meaning while bypassing safety filters, resulting in improved attack success rates and semantic alignment with the original prompts compared to baseline methods.
  • Notably, Open-Sora exhibited high vulnerability to adversarial prompts, emphasizing the need for improved safety mechanisms in such open-source models to mitigate risks posed by malicious prompts.
  • The method achieved higher attack success rates and maintained stronger semantic alignment with input prompts across various text-to-video models compared to baseline approaches, demonstrating its efficiency and effectiveness.
  • The study emphasizes the necessity for advanced safety measures in text-to-video models and suggests that the new method balances attack success with semantic integrity, enhancing the generation of safer content.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app