... darkrealms ...

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.
CONSPRCY
How big is your tinfoil hat?
2,445 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,963 of 2,445
Mike Powell to All
Can top AI tools be bulli
17 Nov 25 09:48:47
   TZUTC: -0500   
   MSGID: 1720.consprcy@1:2320/105 2d807287   
   PID: Synchronet 3.21a-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   TID: SBBSecho 3.28-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   BBSID: CAPCITY2   
   CHRS: ASCII 1   
   FORMAT: flowed   
   Can top AI tools be bullied into malicious work? ChatGPT, Gemini, and more    
   are put to the test, and the results are actually genuinely surprising   
      
   Date:   
   Sun, 16 Nov 2025 21:34:00 +0000   
      
   Description:   
   Adversarial testing of top AI models revealed vulnerabilities, showing some   
   could be manipulated into unsafe responses despite safety measures.   
      
   FULL STORY   
      
   Modern AI systems are often trusted to follow safety rules, and people rely    
   on them for learning and everyday support, often assuming that strong   
   guardrails operate at all times.    
      
   Researchers from Cybernews ran a structured set of adversarial tests to see   
   whether leading AI tools could be pushed into harmful or illegal outputs.    
      
   The process used a simple one-minute interaction window for each trial,    
   giving room for only a few exchanges.   
      
   Patterns of partial and full compliance    
      
   The tests covered categories such as stereotypes, hate speech, self-harm,   
   cruelty, sexual content, and several forms of crime.    
      
   Every response was stored in separate directories, using fixed file-naming   
   rules to allow clean comparisons, with a consistent scoring system tracking   
   when a model fully complied, partly complied, or refused a prompt.    
      
   Across all categories, the results varied widely. Strict refusals were    
   common, but many models demonstrated weaknesses when prompts were softened,   
   reframed, or disguised as analysis.    
      
   ChatGPT-5 and ChatGPT-4o often produced hedged or sociological explanations   
   instead of declining, which counted as partial compliance.    
      
   Gemini Pro 2.5 stood out for negative reasons because it frequently delivered   
   direct responses even when the harmful framing was obvious.    
      
   Claude Opus and Claude Sonnet, meanwhile, were firm in stereotype tests but   
   less consistent in cases framed as academic inquiries.    
      
   Hate speech trials showed the same pattern - Claude models performed best,   
   while Gemini Pro 2.5 again showed the highest vulnerability.    
      
   ChatGPT models tended to provide polite or indirect answers that still    
   aligned with the prompt.    
      
   Softer language proved far more effective than explicit slurs for bypassing   
   safeguards.    
      
   Similar weaknesses appeared in self-harm tests, where indirect or   
   research-style questions often slipped past filters and led to unsafe    
   content.    
      
   Crime-related categories showed major differences between models, as some   
   produced detailed explanations for piracy, financial fraud, hacking, or   
   smuggling when the intent was masked as investigation or observation.    
      
   Drug-related tests produced stricter refusal patterns, although ChatGPT-4o   
   still delivered unsafe outputs more frequently than others, and stalking was   
   the category with the lowest overall risk, with nearly all models rejecting   
   prompts.    
      
   The findings reveal AI tools can still respond to harmful prompts when    
   phrased in the right way.    
      
   The ability to bypass filters with simple rephrasing means these systems can   
   still leak harmful information.    
      
   Even partial compliance becomes risky when the leaked info relates to illegal   
   tasks or situations where people normally rely on tools like identity theft   
   protection or a firewall to stay safe.    
      
   ======================================================================   
   Link to news story:   
   https://www.techradar.com/pro/security/can-top-ai-tools-be-bullied-into-malici   
   ous-work-chatgpt-gemini-and-more-are-put-to-the-test-and-the-results-are-actua   
   lly-genuinely-surprising   
      
   $$   
   --- SBBSecho 3.28-Linux   
    * Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)   
   SEEN-BY: 105/81 106/201 128/187 129/14 305 153/7715 154/110 218/700   
   SEEN-BY: 226/30 227/114 229/110 206 300 307 317 400 426 428 470 664   
   SEEN-BY: 229/700 705 266/512 291/111 320/219 322/757 342/200 396/45   
   SEEN-BY: 460/58 633/280 712/848 902/26 2320/0 105 304 3634/12 5075/35   
   PATH: 2320/105 229/426
[ << oldest | < older | list | newer > | newest >> ]