Abstract:
In the era of the digital economy, data has become a foundational production factor that drives social and technological innovation. Federated learning (FL) enables collaborative model training while keeping data localized (characterized by “moving models instead of data”). This shifts data circulation from a regime of “centralized data sharing” to one of “collaborative model computation.” However, existing FL architectures lack effective mechanisms for data rights confirmation, benefit allocation, and delineation of legal responsibilities associated with data authorization and FL execution. These deficiencies reveal an urgent need to integrate legal enforceability with technological transparency—a “law-chain integration” approach. To address this issue, we propose a smart legal contract-driven approach for data authorization and execution in FL. An FL governance framework is designed based on a specification language for smart contracts (SPESC), which facilitates the publication, assignment, and monitoring of FL tasks through contractual clauses. The SPESC language is crucial as it provides a formal bridge to map complex legal stipulations concerning usage rights, liability, and dispute resolution into verifiable, executable smart contract code on the blockchain. This framework introduces a “whole-chain” management concept for data elements, covering their life cycle from initial authorization through final model deployment. Within this framework, an authorization and execution management platform is designed to employ its data authorization module for implementing a cyclical “offer–acceptance–execution–arbitration” process via standardized contract templates. This automation transforms the traditionally ambiguous legal process into an auditable and predictable technical workflow. By integrating decentralized identifiers and blockchain technology, the platform ensures identity authentication of contracting parties and enforces data authorization through self-executing contract clauses. These clauses are encoded to specify the precise scope of data usage, the duration of authorization, and the terms for access and revocation. Breach and arbitration clauses are also incorporated to supervise data ownership confirmation and rights allocation, ensuring compliance in data usage and operational rights among local training and central model aggregation nodes. Furthermore, the federated computation module utilizes contract templates to configure computing tasks within the federated system and oversee the responsibilities and accountability of participants during execution. The contracts establish clear quality standards and ensure that the model updates adhere to predefined protocols, making the entire training process verifiable and accountable. Experimental evaluations demonstrate the feasibility of automated execution and on-chain traceability of data authorization clauses, ensuring identity compliance and transparency in FL. In addition, we propose a federated computing contract template that enables the evaluation of node-selection algorithms. Experimental results demonstrate that the model training process within this framework remains stable and achieves rapid convergence. Quantitatively, the proposed FedMSNS algorithm achieves an accuracy improvement of approximately 5% over traditional methods and reaches 98% convergence accuracy within just 30 rounds. These findings highlight the potential of the proposed framework to support the digital transformation of the data-factor market by establishing a credible, compliant, and technically robust foundation for data-factor circulation. Our work provides a foundational legal and technical solution for developing decentralized data collaboration ecosystems.