... darkrealms ...

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.
CONSPRCY
How big is your tinfoil hat?
2,445 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,755 of 2,445
Mike Powell to All
Researchers find a way to
16 Sep 25 10:35:13
   TZUTC: -0500   
   MSGID: 1504.consprcy@1:2320/105 2d2ebf91   
   PID: Synchronet 3.21a-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   TID: SBBSecho 3.28-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   BBSID: CAPCITY2   
   CHRS: ASCII 1   
   FORMAT: flowed   
   Researchers find a way to address the problem of AI forgetting how to behave   
   safely   
      
   Date:   
   Mon, 15 Sep 2025 23:00:00 +0000   
      
   Description:   
   Open-source AI used on phones and in cars can lose their safeguards, but   
   university scientists find retraining these reduced models restores the   
   protections.   
      
   FULL STORY   
      
   Researchers at the University of California, Riverside are addressing the   
   problem of weakened safety in open-source artificial intelligence models when   
   adapted for smaller devices.    
      
   As these systems are trimmed to run efficiently on phones, cars, or other   
   low-power hardware, they can lose the safeguards designed to stop them from   
   producing offensive or dangerous material.    
      
   The UCR team examined what happens when a models exit layer is changed from   
   its default position.   
      
   Weakened safety guardrails   
      
   Their results, presented at the International Conference on Machine Learning   
   in Vancouver, Canada, showed that safety guardrails weaken once the exit    
   point is moved, even if the original model had been trained not to provide   
   harmful information.    
      
   The reason models are adjusted in this way is simple. Exiting earlier makes   
   inference faster and more efficient, since the system skips layers. But those   
   skipped layers may have been critical to filtering unsafe requests.    
      
   Some of the skipped layers turn out to be essential for preventing unsafe   
   outputs, said Amit Roy-Chowdhury, professor of electrical and computer   
   engineering and senior author of the study. If you leave them out, the model   
   may start answering questions it shouldnt.    
      
   To solve this, the researchers retrained the models internal structure so    
   that it retains the ability to identify and block unsafe material, even when   
   trimmed.    
      
   This approach does not involve external filters or software patches, but   
   changes how the model interprets dangerous inputs.    
      
   Our goal was to make sure the model doesnt forget how to behave safely when   
   its been slimmed down, said Saketh Bachu, UCR graduate student and co-lead   
   author of the study.    
      
   The team tested their method on LLaVA 1.5, a vision language model.    
      
   When its exit layer was moved earlier than intended, the system responded to   
   harmful prompts, including detailed bomb-making instructions.    
      
   After retraining, the reduced model consistently refused to provide unsafe   
   answers.    
      
   This isnt about adding filters or external guardrails, Bachu said.    
      
   Were changing the models internal understanding, so its on good behavior by   
   default, even when its been modified.    
      
   Bachu and co-lead author Erfan Shayegani called the work benevolent hacking,    
   a way to strengthen models before vulnerabilities are exploited.    
      
   Theres still more work to do, Roy-Chowdhury said. But this is a concrete step   
   toward developing AI in a way thats both open and responsible.   
      
   ======================================================================   
   Link to news story:   
   https://www.techradar.com/pro/researchers-find-a-way-to-address-the-problem-of   
   -ai-forgetting-how-to-behave-safely   
      
   $$   
   --- SBBSecho 3.28-Linux   
    * Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)   
   SEEN-BY: 105/81 106/201 128/187 129/14 305 153/7715 154/110 218/700   
   SEEN-BY: 226/30 227/114 229/110 111 206 300 307 317 400 426 428 470   
   SEEN-BY: 229/664 700 705 266/512 291/111 320/219 322/757 342/200 396/45   
   SEEN-BY: 460/58 712/848 902/26 2320/0 105 304 3634/12 5075/35   
   PATH: 2320/105 229/426
[ << oldest | < older | list | newer > | newest >> ]