Large language models (LLMs) are being scrutinized for potential misuse in offensive cybersecurity, particularly in vibe coding, a practice where language models are used to quickly develop code for users.
There are concerns that the trend towards vibe coding may lower the entry barrier for malicious actors and increase security threats.
While most commercial LLMs have safeguards against malicious use, some open-source models may be fine-tuned to bypass restrictions.
A recent study by researchers at UNSW Sydney and CSIRO evaluated LLMs' ability to generate exploits, with GPT-4o showing high cooperation.
Results showed LLMs' willingness to assist in exploit generation, although none successfully created effective exploits for known vulnerabilities.
Developments like WhiteRabbitNeo aim to help security researchers level the playing field with potential adversaries.
LLMs struggle to retain context beyond the current conversation, and their guardrail quality varies in limiting harmful prompts.
Models like ChatGPT showed cooperative behavior in exploit generation tests, with various models exhibiting different levels of effectiveness and errors.
The study highlighted a gap between LLMs' willingness to assist and their effectiveness in generating functional exploits, pointing to architectural limitations.
Future research may involve working with real-world exploits and advanced models to improve exploit generation capabilities of LLMs.