home bbs files messages ]

Just a sample of the Echomail archive

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.

   EARTH      Uhh, that 3rd rock from the sun?      8,931 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 8,780 of 8,931   
   ScienceDaily to All   
   Learning the language of molecules to pr   
   07 Jul 23 22:30:28   
   
   MSGID: 1:317/3 64a8e669   
   PID: hpt/lnx 1.9.0-cur 2019-01-08   
   TID: hpt/lnx 1.9.0-cur 2019-01-08   
    Learning the language of molecules to predict their properties    
      
     Date:   
         July 7, 2023   
     Source:   
         Massachusetts Institute of Technology   
     Summary:   
         A new framework uses machine learning to simultaneously predict   
         molecular properties and generate new molecules using only a small   
         amount of data for training.   
      
      
         Facebook Twitter Pinterest LinkedIN Email   
      
   ==========================================================================   
   FULL STORY   
   ==========================================================================   
   Discovering new materials and drugs typically involves a manual,   
   trial-and- error process that can take decades and cost millions   
   of dollars. To streamline this process, scientists often use machine   
   learning to predict molecular properties and narrow down the molecules   
   they need to synthesize and test in the lab.   
      
   Researchers from MIT and the MIT-Watson AI Lab have developed a new,   
   unified framework that can simultaneously predict molecular properties   
   and generate new molecules much more efficiently than these popular   
   deep-learning approaches.   
      
   To teach a machine-learning model to predict a molecule's biological   
   or mechanical properties, researchers must show it millions of labeled   
   molecular structures -- a process known as training. Due to the expense   
   of discovering molecules and the challenges of hand-labeling millions   
   of structures, large training datasets are often hard to come by, which   
   limits the effectiveness of machine-learning approaches.   
      
   By contrast, the system created by the MIT researchers can effectively   
   predict molecular properties using only a small amount of data. Their   
   system has an underlying understanding of the rules that dictate how   
   building blocks combine to produce valid molecules. These rules capture   
   the similarities between molecular structures, which helps the system   
   generate new molecules and predict their properties in a data-efficient   
   manner.   
      
   This method outperformed other machine-learning approaches on both   
   small and large datasets, and was able to accurately predict molecular   
   properties and generate viable molecules when given a dataset with fewer   
   than 100 samples.   
      
   "Our goal with this project is to use some data-driven methods to   
   speed up the discovery of new molecules, so you can train a model to do   
   the prediction without all of these cost-heavy experiments," says lead   
   author Minghao Guo, a computer science and electrical engineering (EECS)   
   graduate student.   
      
   Guo's co-authors include MIT-IBM Watson AI Lab research staff members   
   Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song   
   '23 and Adithya Balachandran '23; and senior author Wojciech Matusik, a   
   professor of electrical engineering and computer science and a member   
   of the MIT-IBM Watson AI Lab, who leads the Computational Design   
   and Fabrication Group within the MIT Computer Science and Artificial   
   Intelligence Laboratory (CSAIL). The research will be presented at the   
   International Conference for Machine Learning.   
      
   Learning the language of molecules To achieve the best results   
   with machine-learning models, scientists need training datasets with   
   millions of molecules that have similar properties to those they hope to   
   discover. In reality, these domain-specific datasets are usually very   
   small. So, researchers use models that have been pretrained on large   
   datasets of general molecules, which they apply to a much smaller,   
   targeted dataset. However, because these models haven't acquired much   
   domain- specific knowledge, they tend to perform poorly.   
      
   The MIT team took a different approach. They created a machine-learning   
   system that automatically learns the "language" of molecules -- what   
   is known as a molecular grammar -- using only a small, domain-specific   
   dataset. It uses this grammar to construct viable molecules and predict   
   their properties.   
      
   In language theory, one generates words, sentences, or paragraphs based   
   on a set of grammar rules. You can think of a molecular grammar the   
   same way. It is a set of production rules that dictate how to generate   
   molecules or polymers by combining atoms and substructures.   
      
   Just like a language grammar, which can generate a plethora of sentences   
   using the same rules, one molecular grammar can represent a vast number   
   of molecules.   
      
   Molecules with similar structures use the same grammar production rules,   
   and the system learns to understand these similarities.   
      
   Since structurally similar molecules often have similar properties,   
   the system uses its underlying knowledge of molecular similarity to   
   predict properties of new molecules more efficiently.   
      
   "Once we have this grammar as a representation for all the different   
   molecules, we can use it to boost the process of property prediction,"   
   Guo says.   
      
   The system learns the production rules for a molecular grammar using   
   reinforcement learning -- a trial-and-error process where the model is   
   rewarded for behavior that gets it closer to achieving a goal.   
      
   But because there could be billions of ways to combine atoms and   
   substructures, the process to learn grammar production rules would be   
   too computationally expensive for anything but the tiniest dataset.   
      
   The researchers decoupled the molecular grammar into two parts. The first   
   part, called a metagrammar, is a general, widely applicable grammar   
   they design manually and give the system at the outset. Then it only   
   needs to learn a much smaller, molecule-specific grammar from the domain   
   dataset. This hierarchical approach speeds up the learning process.   
      
   Big results, small datasets In experiments, the researchers' new system   
   simultaneously generated viable molecules and polymers, and predicted   
   their properties more accurately than several popular machine-learning   
   approaches, even when the domain-specific datasets had only a few hundred   
   samples. Some other methods also required a costly pretraining step that   
   the new system avoids.   
      
   The technique was especially effective at predicting physical properties   
   of polymers, such as the glass transition temperature, which is   
   the temperature required for a material to transition from solid to   
   liquid. Obtaining this information manually is often extremely costly   
   because the experiments require extremely high temperatures and pressures.   
      
   To push their approach further, the researchers cut one training set   
   down by more than half -- to just 94 samples. Their model still achieved   
   results that were on par with methods trained using the entire dataset.   
      
   "This grammar-based representation is very powerful. And because the   
   grammar itself is a very general representation, it can be deployed   
   to different kinds of graph-form data. We are trying to identify other   
   applications beyond chemistry or material science," Guo says.   
      
   In the future, they also want to extend their current molecular grammar   
   to include the 3D geometry of molecules and polymers, which is key   
   to understanding the interactions between polymer chains. They are   
   also developing an interface that would show a user the learned grammar   
   production rules and solicit feedback to correct rules that may be wrong,   
   boosting the accuracy of the system.   
      
   This work is funded, in part, by the MIT-IBM Watson AI Lab and its member   
   company, Evonik. Paper: "Hierarchical Grammar-Induced Geometry for Data-   
   Efficient Molecular Property Prediction"   
       * RELATED_TOPICS   
             o Matter_&_Energy   
                   # Materials_Science # Chemistry # Organic_Chemistry   
                   # Nature_of_Water # Engineering_and_Construction #   
                   Nanotechnology # Inorganic_Chemistry # Physics   
       * RELATED_TERMS   
             o Polymer o Periodic_table o Macromolecule o Microwave o   
             Fluid_mechanics o Mass o Wind_turbine o Nanotechnology   
      
   ==========================================================================   
      
    Print   
      
    Email   
      
    Share   
   ==========================================================================   
   ****** 1 ****** ***** 2 ***** **** 3 ****   
   *** 4 *** ** 5 ** Breaking this hour   
   ==========================================================================   
       * Cystic_Fibrosis:_Lasting_Improvement *   
       Artificial_Cells_Demonstrate_That_'Life_...   
      
       * Advice_to_Limit_High-Fat_Dairy_Foods_Challenged   
       * First_Snapshots_of_Fermion_Pairs *   
       Why_No_Kangaroos_in_Bali;_No_Tigers_in_Australia   
       * New_Route_for_Treating_Cancer:_Chromosomes *   
       Giant_Stone_Artefacts_Found:_Prehistoric_Tools   
       * Astonishing_Secrets_of_Tunicate_Origins *   
       Most_Distant_Active_Supermassive_Black_Hole *   
       Creative_People_Enjoy_Idle_Time_More_Than_Others   
      
   Trending Topics this week   
   ==========================================================================   
   SPACE_&_TIME Asteroids,_Comets_and_Meteors Big_Bang Jupiter   
   MATTER_&_ENERGY Construction Materials_Science Civil_Engineering   
   COMPUTERS_&_MATH Educational_Technology Communications   
   Mathematical_Modeling   
      
      
   ==========================================================================   
      
   Strange & Offbeat   
   ==========================================================================   
   SPACE_&_TIME   
   Quasar_'Clocks'_Show_Universe_Was_Five_Times_Slower_Soon_After_the_Big_Bang   
   First_'Ghost_Particle'_Image_of_Milky_Way   
   Gullies_on_Mars_Could_Have_Been_Formed_by_Recent_Periods_of_Liquid_Meltwater,   
   Study_Suggests MATTER_&_ENERGY Holograms_for_Life:_Improving_IVF_Success   
   Researchers_Create_Highly_Conductive_Metallic_Gel_for_3D_Printing   
   Artificial_Cells_Demonstrate_That_'Life_Finds_a_Way' COMPUTERS_&_MATH   
   Number_Cruncher_Calculates_Whether_Whales_Are_Acting_Weirdly   
   AI_Tests_Into_Top_1%_for_Original_Creative_Thinking   
   Growing_Bio-Inspired_Polymer_Brains_for_Artificial_Neural_Networks   
   Story Source: Materials provided by   
   Massachusetts_Institute_of_Technology. Original written by Adam   
   Zewe. Note: Content may be edited for style and length.   
      
      
   ==========================================================================   
      
      
   Link to news story:   
   https://www.sciencedaily.com/releases/2023/07/230707153847.htm   
      
   --- up 1 year, 18 weeks, 4 days, 10 hours, 50 minutes   
    * Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1:317/3)   
   SEEN-BY: 15/0 106/201 114/705 123/120 153/7715 218/700 226/30 227/114   
   SEEN-BY: 229/110 112 113 307 317 400 426 428 470 664 700 291/111 292/854   
   SEEN-BY: 298/25 305/3 317/3 320/219 396/45 5075/35   
   PATH: 317/3 229/426   
      

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca