Abstract:
Polymer materials, composed of fundamental elements such as carbon, hydrogen, oxygen, and nitrogen, exhibit remarkable structural diversity spanning from molecular to macroscopic scales, primarily due to variations in atomic connectivity patterns (including linear, branched, or crosslinked architectures), chain packing arrangements ( crystalline, semi-crystalline, or amorphous phases), and morphological features (such as porosity, surface roughness, and phase-separated domains). While this extensive chemical diversity provides tremendous opportunities for designing materials with precisely tailored mechanical properties, thermal characteristics, and electrical performance, it simultaneously creates significant challenges for conventional experimental characterization techniques and computational modeling approaches (such as molecular dynamics simulations and density functional theory calculations), making them inefficient for exploring the vast, high-dimensional chemical space of possible polymer structures that could number in the millions when considering all possible monomer combinations and architectural variations. This fundamental limitation has catalyzed a paradigm shift in polymer science toward data-driven informatics approaches, marking a decisive transition from traditional empirical trial-and-error methodologies, which often require months or years of iterative experimentation, to more predictive, efficient, and collaborative research frameworks powered by artificial intelligence (AI) and machine learning (ML) technologies that can rapidly screen potential candidates and identify promising material formulations in a fraction of the time. This comprehensive review systematically examines recent groundbreaking advances in AI-guided polymer informatics by analyzing representative literature published over the past three years, synthesizing the current state of the field across several critical dimensions: advanced monomer representation methods, including SMILES/BigSMILES strings for linear notation, graph-based encodings for topological relationships, and 3D representations for morphological features, data augmentation techniques and transfer learning strategies, accurate polymer property prediction models and inverse design algorithms, emerging applications of large language models (LLMs) for polymer literature mining and knowledge extraction, and crucial developments in model interpretability tools that help researchers understand and trust the AI's decision-making process. The review places particular emphasis on discussing cutting-edge computational technologies that are revolutionizing the field, such as graph neural networks (GNNs) for effectively capturing complex topological relationships between monomer units, Transformer architectures for processing sequential polymer representations and identifying critical structural motifs, and multimodal learning frameworks for integrating diverse data types, all of which provide innovative solutions to persistent challenges like the scarcity of high-quality labeled experimental date and the inherent complexity of polymer topological structures. These advanced computational methods are being implemented through user-friendly platforms and cloud-based services that make them accessible to researchers without extensive programming expertise, while maintaining rigorous validation protocols to ensure scientific reliability. Looking toward the future, the paper outlines several key research priorities that will shape the next generation of polymer informatics, including the development of standardized polymer data formats and repositories, advanced coupled physicochemical process modeling techniques, robust multiscale modeling frameworks, novel applications of generative artificial intelligence for materials discover, and application-oriented material design strategies, aiming at providing researchers across academia and industry with comprehensive reference resources and actionable insights to accelerate innovation in polymer science and engineering. The integration of these advanced computational approaches with experimental validation is creating unprecedented opportunities for accelerated materials discovery.