... darkrealms ...

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.
CONSPRCY
How big is your tinfoil hat?
2,445 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,933 of 2,445
Mike Powell to All
MS built a fake online ma
09 Nov 25 10:28:15
   TZUTC: -0500   
   MSGID: 1690.consprcy@1:2320/105 2d75efa9   
   PID: Synchronet 3.21a-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   TID: SBBSecho 3.28-Linux master/123f2d28a Jul 12 2025 GCC 12.2.0   
   BBSID: CAPCITY2   
   CHRS: ASCII 1   
   FORMAT: flowed   
   Microsoft built a fake online marketplace to see how its AI agents would work   
   selling unsupervised - and let's just say the results were... unsurprising   
      
   Date:   
   Sat, 08 Nov 2025 19:34:00 +0000   
      
   Description:   
   Microsofts Magentic Marketplace shows AI tools still cannot reliably act   
   independently in complex multi-agent simulations.   
      
   FULL STORY   
      
   A new Microsoft study has raised questions on the current suitability of AI   
   agents operating without full human supervision/    
      
   The company recently built a synthetic environment, the  Magentic Marketplace   
   ", designed to observe how AI agents perform in unsupervised situations.    
      
   The project took the form of a fully simulated ecommerce platform which   
   allowed researchers to study how AI agents behave as customers and businesses   
   - with possible predictable results.   
      
   Testing the limits of current AI models    
      
   The project included 100 customer-side agents interacting with 300   
   business-side agents, giving the team a controlled setting to test agent   
   decision-making and negotiation skills.    
      
   The source code for the marketplace is open source; therefore, other   
   researchers can adopt it to reproduce experiments or explore new variations.    
      
   Ece Kamar, CVP and managing director of Microsoft Researchs AI Frontiers Lab,   
   noted this research is vital for understanding how AI agents collaborate and   
   make decisions.    
      
   The initial tests used a mix of leading models, including GPT-4o, GPT-5, and   
   Gemini-2.5-Flash.    
      
   The results were not entirely unexpected, as several models showed    
   weaknesses.    
      
   Customer agents could easily be influenced by business-side agents into   
   selecting products, revealing potential vulnerabilities when agents interact   
   in competitive environments.    
      
   The agents efficiency dropped sharply when faced with too many options,   
   overwhelming their attention span and leading to slower or less accurate   
   decisions.    
      
   AI agents also struggled when asked to work toward shared goals, as the    
   models were often unsure which agent should take on which role, which reduced   
   their effectiveness in joint tasks.    
      
   However, their performance improved only when step-by-step instructions were   
   provided.    
      
   We can instruct the models - like we can tell them, step by step. But if we   
   are inherently testing their collaboration capabilities, I would expect these   
   models to have these capabilities by default, Kamar noted.    
      
   The results show AI tools still need substantial human guidance to function   
   effectively in multi-agent environments.    
      
   Often promoted as capable of independent decision-making and collaboration,   
   the results show unsupervised agent behavior remains unreliable, so humans   
   must improve coordination mechanisms and add safeguards against AI   
   manipulation.    
      
   Microsofts simulation shows that AI agents remain far from operating   
   independently in competitive or collaborative scenarios and may never achieve   
   full autonomy.    
      
   ======================================================================   
   Link to news story:   
   https://www.techradar.com/pro/microsoft-built-a-fake-marketplace-to-see-how-it   
   s-ai-agents-would-work-selling-unsupervised-and-lets-just-say-the-results-were   
   -unsurprising   
      
   $$   
   --- SBBSecho 3.28-Linux   
    * Origin: capitolcityonline.net * Telnet/SSH:2022/HTTP (1:2320/105)   
   SEEN-BY: 105/81 106/201 128/187 129/14 305 153/7715 154/110 218/700   
   SEEN-BY: 226/30 227/114 229/110 206 300 307 317 400 426 428 470 664   
   SEEN-BY: 229/700 705 266/512 291/111 320/219 322/757 342/200 396/45   
   SEEN-BY: 460/58 633/280 712/848 902/26 2320/0 105 304 3634/12 5075/35   
   PATH: 2320/105 229/426
[ << oldest | < older | list | newer > | newest >> ]