Issue 
Natl Sci Open
Volume 2, Number 1, 2023



Article Number  20220043  
Number of page(s)  17  
Section  Information Sciences  
DOI  https://doi.org/10.1360/nso/20220043  
Published online  10 January 2023 
RESEARCH ARTICLE
DeceFL: a principled fully decentralized federated learning framework
^{1} School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
^{2} School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
^{3} Department of Applied Mathematics, University of Waterloo, Waterloo N2L 3G1, Canada
^{4} State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China
^{5} School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430200, China
^{6} School of Electrical Engineering and Computer Science, and Digital Futures, KTH Royal Institute of Technology, Stockholm 10044, Sweden
^{7} Shenyang Institute of Automation (SIA), Chinese Academy of Sciences, Shenyang 110169, China
^{8} School of Mathematics, Frontiers Science Center for Mobile Information Communication and Security, Southeast University, Nanjing 211189, China
^{9} Purple Mountain Laboratories, Nanjing 211111, China
^{10} AVIC Chengdu Aircraft Industrial (Group) Co., Ltd., Chengdu 610091, China
^{*} Corresponding author (email: yye@hust.edu.cn)
Received:
2
August
2022
Revised:
5
October
2022
Accepted:
28
October
2022
Traditional machine learning relies on a centralized data pipeline for model training in various applications; however, data are inherently fragmented. Such a decentralized nature of databases presents the serious challenge for collaboration: sending all decentralized datasets to a central server raises serious privacy concerns. Although there has been a joint effort in tackling such a critical issue by proposing privacypreserving machine learning frameworks, such as federated learning, most stateoftheart frameworks are built still in a centralized way, in which a central client is needed for collecting and distributing model information (instead of data itself) from every other client, leading to high communication burden and high vulnerability when there exists a failure at or an attack on the central client. Here we propose a principled decentralized federated learning algorithm (DeceFL), which does not require a central client and relies only on local information transmission between clients and their neighbors, representing a fully decentralized learning framework. It has been further proven that every client reaches the global minimum with zero performance gap and achieves the same convergence rate O(1/T) (where T is the number of iterations in gradient descent) as centralized federated learning when the loss function is smooth and strongly convex. Finally, the proposed algorithm has been applied to a number of applications to illustrate its effectiveness for both convex and nonconvex loss functions, timeinvariant and timevarying topologies, as well as IID and NonIID of datasets, demonstrating its applicability to a wide range of realworld medical and industrial applications.
Key words: decentralized federated learning / smart manufacturing / control systems privacy
© The Author(s) 2023. Published by China Science Publishing & Media Ltd. and EDP Sciences.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
INTRODUCTION
An urgent challenge for artificial intelligence (AI) applications today consists of the following dilemma concerning data privacy: on the one hand, a large number of sophisticated algorithms have been proposed to broaden the applicability of AI to various applications such as medicine, manufacturing and more [13]; on the other hand, regulations such as general data protection regulation (GDPR) restrict data sharing, thus limiting the performance of AI algorithms [4]. As a result, models that are trained and evaluated on a limited amount of data since privacy could have biases [5]. This has become a wellknown bottleneck in AI applications, especially medical AI [6].
Promising privacypreserving methods such as federated learning can help maintain the performance of AI algorithms, while preserving the data stored locally [7]. Inspired by this, there has been a surge of interests in both the theory and applications of federated learning [8]. Federated averaging (FedAvg), the leading algorithm in the field of federated learning, was proposed in 2016 by researchers at Google [9, 10], and its variant, dynamic FedAvg, was proposed in ref. [11]. The studies [911] mentioned above were based on the analysis of firstorder stationary points. The secondorder optimality guarantee of federated learning was later established in ref. [12]. Through crowded efforts and comprehensive surveys [1318], widely used federated learning methods were established, the challenges and related applications were introduced, and a large number of valuable research directions were outlined. A notable example has been demonstrated in a report by Kaissis et al. [19], in which a convolutional neural network was trained over the public Internet with encryption from medical images using a secure federated learning framework.
Despite these breakthroughs, classical federated learning algorithms have a major drawback: the need for a central client, which could cause privacy, communication, computation, and resilience issues [20]. Much effort, therefore, has been invested to reduce the communication and computation complexity as well as increase the robustness and resilience of centralized federated learning algorithms. In order to deal with the challenge of system constraints, the authors of ref. [21] applied sparse technology to reduce the communication and computation costs in the training process. In addition, the authors of ref. [22] proposed an optimal tradeoff control algorithm between the local update and the global parameter aggregation in the resource constrained system. Recently, federated schemes were extended to a timevarying centralized scheme, where a changing leader is selected based on certain rules, which is a firm step to full decentralization [23].
The current research mainly discussed decentralized federated learning from three different perspectives. Firstly, the initial work [24] realized fully asynchronous and decentralized machine learning by introducing gossip protocol into deep learning. Furthermore, Daily et al. [25] proposed a random gradient descent algorithm based on GossipGraD communication protocol for extended deep learning algorithms on largescale systems. In addition, Hu et al. [26] proposed a decentralized federated learning system based on model segmented gossip aggregation. Secondly, being specialized for graph convolutional networks, Scardapane et al. [27] proposed an algorithmic framework for distributed training considering the case that data were collected by a group of agents communicating through the sparse network topology. Similarly, Pei et al. [28] proposed a new decentralized federated graph neural network model based on peertopeer network structure and realized the decentralized learning with privacy protection based on the DiffieHellman key exchange method. For decentralized federated learning in general, similar to refs. [21, 22], the refs. [29, 30] proposed a decentralized distributed learning algorithm by introducing the concept of belief into the parameter space of the client model in which the client updates its belief by aggregating the information from onehop neighbors to realize the optimal model of cooperative learning. A fully decentralized federated learning algorithm, IPLS, was proposed by Pappas et al. [31], which is based on model segmentation and partially based on the interplanetary file system. Based on the popular Time Slotted Channel Hopping radio interface of the IEEE 802.15.4e standard, Savazzi et al. [32] proposed a realtime framework for analyzing decentralized federated learning systems running on industrial wireless networks. Thirdly, in terms of decentralization via blockchain, similar to ref. [23], Wu et al. [33] proposed a decentralized federated learning framework that realized the decentralization of all aspects of system. The ref. [34] proposed a decentralized privacy protection indepth learning framework based on blockchain, which uses the local credibility mutual evaluation mechanism to realize the research on collaborative fairness in indepth learning. Yapp et al. [35] proposed a federated edge learning (FEL) framework with efficient blockchain authorization and communication, which enhances the security and scalability of FEL's largescale implementation.
However, these methods suffer from one of the following drawbacks: (1) lacks of theoretical guarantee, making it unreliable for practical use; (2) lacks of temporal flexibility of communication topology; (3) suffers from potential leakage of private information. More specifically, the comparison between these existing stuides and the decentralized framework method proposed in this paper is shown in Supplementary Information Table S1. This paper proposes a principled fully decentralized federated learning algorithm (DeceFL), in which each client is guaranteed to achieve the same performance as the centralized algorithm in terms of training/test accuracy, when the global objective function is smooth and strongly convex. Empirical experiments have demonstrated that the same claim also holds for nonconvex global objective functions, thus revealing its potential to be applied to a wider class of applications such as those using deep learning. In addition to all features that other stateoftheart federated learning and swarm learning algorithms [23] possess, DeceFL has additional desirable features beyond classical centralized or decentralized federated learning and swarm learning, namely: (1) full decentralization: at any iteration, there is no central client that receives all other clients’ information, therefore removing monopoly and preventing attacks; (2) principled algorithms: it has been proved that zero performance gap between decentralized and centralized algorithms can be achieved under certain condition, therefore guaranteeing global performance; (3) flexible topology design: any connected timevarying communication graph suffices to achieve the same performance, therefore allowing clients to decide when and who to communicate with subject to their physical constraints; (4) equal rights for all: all clients can have the trained model, therefore incentivizing clients to participate; (5) privacypreserving: every client does not communicate its gradient to other client, therefore preventing data leakage, even there is adversary in the communication channel [36].
THE PROPOSED FULLY DECENTRALIZED FEDERATED LEARNING FRAMEWORK
Problem formulation
We first formulate the decentralized federated learning problem. Assume that there are K clients with local data in the form for standard machine learning tasks: for . The conventional training of AI models (denoted by ) can be formulated as the following global learning problem (let and ): In such a centralized framework, clients have to send their datasets to a central client. In contrast, in a decentralized framework, clients would like to collaboratively train a model using the same objective function F, in which the kth client does not send its data to others. We define the performance gap as a nonnegative metric, which quantifies the degenerative performance between a centralized model and a decentralized one: The goal is to make Δ as small as possible and in the ideal case to make Δ=0.
The proposed DeceFL algorithm
To solve the optimization problem in a decentralized way, we model the communication network between clients as an undirected connected ^{1)} graph , where represents the set of clients, and represents the set of communication channels, each connecting two distinct clients. For each pair (i, j), the corresponding element in the adjacency matrix W, i.e., W_{ij}, indicates whether there is a communication channel between the ith client and the jth client. Specifically, when W_{ij}>0, there is information communication between clients i and j, while W_{ij}=0 means none. For client i, when W_{ij}>0, then client j is called a neighbor of client i. The set of all such clients j is represented as , i.e., . Define the local loss function as the userspecified loss function on the dataset with model parameters w in , then can be rewritten as Let the client k hold a local copy of the global variable w, which is denoted by . Specifically, the update rule of DeceFL is, for each client k=1, …, K, (1) where is the learning rate, and the initial condition can be arbitrarily chosen. Every client shares with its neighbors (which is a subset of all other clients) its model parameters rather than data. Specifically, every client runs its local training algorithm, e.g., gradient descent, and it only communicates its own estimate of the global parameter with its neighbors. Once a client receives other estimates from neighboring clients, it averages out other estimates, adds to its local gradient and generates its estimate in the next iteration. The above process will be repeated until convergence. As shown in Figure 1, in DeceFL, each client completes the update by receiving and transmitting directly with neighboring clients and local gradient calculation, without needing the aggregation and transmission of a thirdparty central client at any iteration. Thus, it is fully decentralized.
We stack the w_{k}(t) and in eq. (1) into vectors, i.e., define and . Then, we can compactly rewrite eq. (1) as (2) where and is the identity matrix. Next, we analyze the convergence of DeceFL under the following assumption about the global cost function, which is consistent with those made in the convergence analysis of FedAvg [37].
Assumption 1. For each k=1, …, K, assume that F_{k} is L_{k}smooth and μ_{k}strongly convex, where L_{k}, . That is, F_{k} is differentiable and the gradient is L_{k}Lipschitz continuous, i.e., for any , (3) and (4)
When Assumption 1 holds, the global objective function is Lsmooth and μstrongly convex, where and . Clearly, .
Assumption 1 is standard and satisfied by typical loss functions in machine learning, including l_{2} regularized linear and logistic regressions. In order to analyze the convergence of the algorithm, we define the average sequence , where is a vector in which all elements are 1. Denote as the spectral norm of , where in algebraic graph theory (see Supplementary Information).
Theorem 1. Consider algorithm (1), where the learning rate is chosen by , in which and satisfying . Denote the gap between the local function value at the initial point w(0) and local optimal value as , where . Then the following inequality can be obtained under Assumption 1: (5) and (6) where and .
Proof. See Supplementary Information.
In ref. [38], the authors studied the convergence of FedAvg and established the convergence rate O(1/T) (where T is the number of iterations in gradient descent) for strongly convex and smooth problems. Here Theorem 1 guarantees the proposed DeceFL algorithm has the same convergence rate as FedAvg under timeinvariant connected communication topologies. In addition, in the Supplementary Information, it has been further demonstrated that the proposed DeceFL converges even for timevarying topologies, in which case the underlying topology does not need to be connected for all iterations (as illustrated in Figure 1). Finally, FedAvg and SL can be viewed as special cases of DeceFL, the proof of convergence for DeceFL also warrants that of centralized federated learning and SL.
Figure 1 Illustration of key concepts in different stateoftheart federated learning frameworks. (A) Classical Federated Learning: a central client is needed to receive and transmit all essential information to other clients. It is equivalent to an alltoall network without such a central center, i.e., every client in the network can receive information from all other clients. (B) Swarm Learning: there is no such a universal central client, but a potentially different central client is selected in every iteration. Mathematically, it is equivalent to FedAvg with varying central clients. (C) The proposed Decentralized Federated Learning: there is no need for a central client in any iteration. Any connected timeinvariant or timevarying topology (under certain mild condition given in the Supplementary Information) has guaranteed performance, therefore unifying the classical federated learning and swarm learning. 
An illustrative example
We use a simple yet important example to illustrate the problem and demonstrate the advantages of the proposed decentralized framework as compared with centralized learning, federated learning and swarm learning. Consider the case where K clients would like to compute the average of every client's private value , i.e., . This task is simple if there exists a central client. However, it becomes challenging when a central client is not available. This is the wellknown decentralized consensus problem [3941], which has many engineering applications including synchronization, PageRank, state estimation, load balancing, and more.
We can convert the consensus problem to the following optimization problem (7) with its optimal value being coincided with w^{*}. Rather than computing the mean, we convert it to solve the optimization problem (7) using algorithms FedAvg, SL and the proposed DeceFL respectively.
FedAvg algorithm
In the classical federated learning algorithm, i.e., FedAvg, there is a central server to collect the local parameter information of each client for average aggregation, and to assign it to each client, which is equivalent to a complete graph of K clients without a central server after simple derivation (Figure 1A): (8) where t represents the tth iteration, w_{k}(t) represents the estimate of global optimum for the kth client at iteration t, and is learning rate in the gradient descent algorithm. In essence, every client iteratively updates its estimate based on all others’ estimates together with its current gradient. Using derivation in the Supplementary Information based on dynamical system and algebraic graph theory, it can be shown that the system reaches the steadystate, i.e., , if and is satisfied for any .
SL algorithm
The SL algorithm considers the situation where there is no central server: in each iteration, a random leader is dynamically selected from the clients to aggregate the model parameters from all clients (including itself) and assign them to each client as shown in Figure 1B. Mathematically, this is exactly the same as centralized federated learning as shown in the Supplementary Information.
DeceFL algorithm
Each client communicates parameters through the topology of the undirected connected graph shown in Figure 1C. According to the weighted aggregation of information obtained from neighboring clients, the local update is completed according to the following iteration: (9) where is the weighted matrix of the undirected connected graph at iteration . Using derivation in the Supplementary Information based on dynamical system and algebraic graph theory, it can be shown that the system reaches the steadystate, i.e., , if and are satisfied^{2)} for any .
Summary
It can be shown that all methods can reach consensus with zero performance gap, i.e., Δ=0. The convergence speeds of all methods are O(1/T). As shown in the Supplementary Information Figure S1, these methods converge to the consensus value exponentially. This is consistent with the theoretical results. The information used in three different algorithms is however distinct: FedAvg and SL need global information at every iteration, while DeceFL only needs local information from clients’ neighbors. In addition, the first two algorithms can be viewed as special cases of the proposed DeceFL algorithm by setting W to the corresponding adjacency matrix correspondingly. Specifically, when for all t, the proposed DeceFL algorithm becomes FedAvg and SL.
RESULTS
Experiments were carried out on realworld biomedical and industrial applications, which demonstrated the effectiveness and the wide applicability of DeceFL, as a fully decentralized framework. We benchmarked the performance of DeceFL, in comparison with FedAvg and SL that demand much more communication costs and strongly rely on a highly restricted communication topology. Furthermore, the superiority of DeceFL on robustness in the presence of communication topology interference (random node or edge malfunction) was shown in two experiments with timevarying communication topologies. The overall performance of DeceFL was corroborated by these practical applications.
Application to medicine: collaborative prediction of leukaemia
First, we used the dataset of peripheral blood mononuclear cell (PBMC) transcriptomes from ref. [23], named “dataset A2”, as a benchmark example to compare three federated learning frameworks: DeceFL, FedAvg and SL. The experiment setup is consistent with that of SL: the dataset is divided into a training set and a test set at the ratio of 8:2, and the dataset owned by each node was obtained from the training set. The logistic regression model with l_{2} regularization and the 8layer fully connected deep neural network (as in ref. [23]) were selected for our experiments detailed in the MATERIALS AND METHODS Section.
First, we benchmarked DeceFL against FedAvg and SL in the IID setup of dataset A2 (Figure 2A), that is, the sample size of each node is the training set sample size divided by the number of nodes, which ensures that each node has the same number of samples and the ratio of the positive to the negative samples is approximately 1∶1. DeceFL used multiple connected communication topologies with various connectivity probability values (p=0.3, 0.5, 0.7, 0.9). This benchmark shows that DeceFL can reach the same performance as FedAvg and SL which use a (temporary) central client to gather all information from every client. FedAvg and SL only perform better during the transient period that DeceFL takes to converge due to its decentralized nature. Whereas DeceFL relaxes significantly on the choice of communication topology, which, in contrast to FedAvg and SL, no longer requires a fixed topology and thus has higher flexibility to deal with various demands from clients (e.g., one client could refuse to directly communicate with certain clients). Second, the similar comparative study was repeated with the NonIID setup of dataset A2 (Figure 2B). The NonIID setup explicitly designs, for the local data associated with each node, the sample size and the ratio between positive and negative samples (Supplementary Information). It allows us to benchmark performances on balanced/unbalanced, sufficient/deficient local training data. We obtained very similar results as in the IID setup, where DeceFL presents a similar performance as FedAvg and SL, after DeceFL reaches consensus in decentralized computation. It also shows the superiority of DeceFL over SL, which however demands huge amounts of communication costs for the selected central client at every round and relies heavily on the strong assumption of a stable fullyconnected communication structure. Any bit of malfunction of clients or communication paths could melt down the whole SL process, since at each round a client is delegated to collect information from all other clients.
To show DeceFL functions well when an intervention to decentralized infrastructure happens, we conducted two experiments with timevarying graphs that take into account malfunction of clients and communication paths. First, communication topology was altered in runtime (Figure 3A), that is, the adjacency matrix that describes how clients communicate with each other varied over time. This timevarying experiment further shows that the conditions of DeceFL can even be weakened and generalized to such an extent that the communication graph at each time is not necessary to be connected as long as within a fixed period the information can be transmitted between any pair of clients. To calibrate the performance of DeceFL, we assume the ideal communication topology (no structural intervention) in the run of FedAvg (that is, a central node with consistent connections to every client). Surprisingly, both experimental results Figure 3C (IID) and Figure 3D (NonIID) show that DeceFL in such a scenario keeps similar performance as FedAvg (with an ideal setup). In other words, DeceFL can be so robust that random malfunction of a portion of edges may merely deteriorate DeceFL running processes.
As another source of structural intervention, the second experiment considers the removal and supplement of nodes, which is a significant issue of federated learning in practice and in business. In a longrunning and dataintensive machine learning project including hundreds of clients, that clients may drop out or new clients may join in the middle must not deteriorate or break such a timeconsuming computation process. As shown in Figure 3B, during the first 50 rounds, we used an ErdosRenyi graph of 6 nodes; then for the 51–100 rounds, the graph was of 8 nodes by adding 2 extra nodes; and in the rest rounds, the graph was randomly removed by 2 nodes. Under such node interventions, experimental results Figure 3E and 3F show robust performance of DeceFL which is similar to FedAvg (that does not consider node interventions). Two experiments manifest the robustness of DeceFL on interventions of computation infrastructure in a decentralized framework.
Figure 2 DeceFL to predict leukaemia from A2 benchmark dataset [23]. (A) Data were divided into IID samples for all clients. (B) Data were divided into NonIID unbalanced samples. (C) Different topologies for FedAvg, SL and DeceFL respectively. The topology for SL must hold in every iteration when any other node is selected as a central client. (D)–(G) Performance of three algorithms on IID/NonIID setups over logistic regression/neural networks. DeceFL presents a similar performance as FedAvg and SL in both IID and NonIID scenarios, except during the transient period that DeceFL takes to reach consensus. In contrast to fixed topology in FedAvg and SL, DeceFL provides much more freedom on the choice of communication topology. 
Figure 3 DeceFL to predict leukaemia from A2 benchmark dataset [23]. (A) Timevarying communication topology that consists of a sequence of graphs each of which is not connected while the lumpsum graph over a fixed period is connected. (B) Timevarying communication topology that adds or removes nodes over time. (C), (D) Performance of DeceFL with edgevarying graphs on the IID and NonIID setups of dataset A2 using logistic regression, with reference performance of FedAvg that uses full information. (E), (F) Performance of DeceFL with nodevarying graphs on the IID and NonIID setups of dataset A2 using logistic regression, with reference performance of FedAvg that uses full information. These timevarying experiments manifest the robustness of DeceFL in the presence of interventions to communication topology. The dropout or supplement of clients in the middle of DeceFL computation will not break or deteriorate the learning process, which is particularly essential in practice for largescale timeconsuming dataintensive applications. 
Application to smart manufacturing: collaborative detection of bearing faults
Another application of DeceFL lies in the manufacturing sector: modern manufacturing is heavily influenced by AI technologies with extraordinary increase of computational power and data size. To raise productivity and reduce operational costs, a critical challenge is fault diagnosis in machining operations [42]. AIbased algorithms have the potentials to detect fault locations and even to predict faults in advance, which allow replacing regular maintenance with realtime datadriven predictive maintenance and further reduce unnecessary maintenance costs and guarantee reliability. A general fault detection framework has been proposed in ref. [43], which, however, needs fullcycle measurements of large amounts of machines that are most likely unavailable for a single factory. Data generated by multiple factories could be sufficient to perform preventive maintenance, while sensitive data (security or business related) are less likely to be shared in practice. The fully decentralized framework DeceFL provides a way for multiple factories to develop a global model, which generates mutual benefit from private local data without having to resort to data sharing in public.
This experiment practices such a decentralized fault diagnosis application in manufacturing, using Case Western Reserve University's (CWRU) bearing data, which comprises ball bearing test data for normal and faulty bearings, specified in the METHODS Section. Specifically, we used three types of bearings data: 7 inches, 14 inches and 21 inches; and chose the drive end defects, which includes outer race defect, inner race defect, and ball defect. We chose the outer race defect appearing at the 6 o’clock (centered) position. Thus, there are in total ten distinct conditions: 9 faulty classes (3 bearing types times 3 defect types) and the normal condition. All data in use was collected at 12,000 samples/second for drive end bearing experiments. It was generated by using 4 types of motor speed: 1797 r/min, 1772 r/min, 1750 r/min and 1730 r/min. The data from 1730 r/min is reserved for test.
Assume that there are 4 factories, as clients illustrated in Figure 4C, which collect their private fullcycle bearing data. The training data associated with each client were prepared in the IID (Figure 4A) and the NonIID setup (Figure 4B). A 10way classification problem is considered, 9 fault cases (B007, IR007, OR007, B014, IR014, OR014, B021, IR021, OR021) and 1 normal case. Learning used two methods, regularized logistic regression as a strongly convex method, and deep neural network (DNN) as a nonconvex method. In the usage of logistic regression, as guaranteed in theory, DeceFL in Figure 4D and 4F confirms the similar performance as FedAvg after its transient periods. For the case of DNN, as a nonconvex method, although there is no theoretical guarantee, DeceFL in Figure 4E and 4G shows competitive performance to FedAvg. The slight performance gap in test between DeceFL and FedAvg in Figure 4F may be mostly caused by the chosen type of DNN, multilayer perceptrons as used in ref. [23], which has many wellknown defects; and more reasons are discussed and explored by more experiments in the Supplementary Information. Comprehensive experiments were conducted and can be found in the Supplementary Information, with more clients (that is, more factories), more learning methods, and another 4way classification problem. Overall DeceFL manifests competitive performance on multiclass classification for industrial fault diagnosis applications, with implementations of (non)convex methods in a fully decentralized framework that breaks through the barrier of data privacy.
Figure 4 DeceFL to detect bearing faults from CWRU benchmark dataset. (A) Data were divided into IID samples for all 4 clients. (B) Data were divided into NonIID unbalanced samples. Each client locally specified its data size and sample distribution. (C) Illustration of communication topology for FedAvg, SL and DeceFL. (D), (E) Performance of DeceFL on IID data using logistic regression and DNN, respectively, with reference performance of FedAvg and SL. (F), (G) Performance of DeceFL on NonIID data using logistic regression and DNN, respectively, with reference performance of FedAvg and SL. (D)–(G) The boxplots at bottom illustrate the performance comparison between DeceFL and each client that was trained independently (that is, each client trained its own model only using its associated local data without communicating with any other clients). The experiments on CWRU benchmark dataset confirms the similar performance of DeceFL, as a fully decentralized framework, in comparison to FedAvg and SL, for industrial datasets, while DeceFL offers more freedom on the choice of communication topology to handle practical issues in organizing clients for largescale federated learning applications. 
DISCUSSION
In this paper, we propose a decentralized federated learning framework for privacypreserving collaborative learning. The decentralized architecture eliminates a large number of drawbacks—due to having a central client—that stateoftheart privacypreserving algorithms have. The convergence of the DeceFL algorithm is analyzed in detail, showing that DeceFL guarantees convergence and has the similar convergence rate as the centralized federated learning algorithm. The convergence performance of the algorithm is verified by training neural networks over different datasets. Compared with other stateoftheart privacypreserving algorithms such as FedAvg and SL [23], the proposed DeceFL algorithm is guaranteed to reach the global optimum with a similar rate as the centralized federated learning algorithm under certain conditions. In addition, there has developed a sizable literature as surveyed in ref. [44], which can be adapted to cope with quantization errors and noises that could happen over communication networks.
There is no doubt that such a decentralized federated learning framework will become increasingly popular in the nearest future for almost all AI applications given the privacy regulations. Yet our algorithm has a number of limitations that need to be taken into consideration for future development.
First of all, application of privacy algorithms (for example, blockchain or homomorphic encryption [45]) has not been considered in this study. However, similar to the centralized federated learning and swarm learning framework, it should be straightforward to apply such techniques for data privacy protection in the proposed DeceFL framework to make communication secure.
Secondly, similar to the setup of federated learning and swarm learning, all clients in the network need to know the form of the global objective function. The proposed formulation is different from those used in multiparty computation [46], where model and data can be separated [47]. Future work lies in the integration of such techniques to the proposed DeceFL algorithm to make the global objective function unknown to clients.
Finally, all clients are assumed to be collaborative, it would be interesting to further investigate whether the proposed decentralized federated learning framework is vulnerable to semihonest or malicious clients that are not collaborative [48], which could exist in the realworld applications.
MATERIALS AND METHODS
Data preprocessing
Essential data preprocessing was performed for CWRU dataset, including class balancing and normalization, feature extraction by Fourier transform. The dataset has 10 classes in total, which vary in sample size. Hence samples in certain classes were deleted to balance sample sizes over all classes. The original data is timeseries data, which was firstly divided by every 300 points and resulted in a family of time series. Each time series chose DE and FE features respectively, and produced 600 points. For every time series of each feature, we performed fast Fourier transform (FFT), which yielded 150 points. Thus each time series of both DE and FE has in total 300 points. The motivation to use FFT is to handle the mismatch of time stamps of sequential data. After FFT the training and test data were then normalized by removing the mean and scaling to unit variance (the test data are normalized by the normalizer of the training).
Performance metrics
The common performance metric “accuracy” is used for assessment of classification, (10) where TP, TN, FP and FN denote the number of true positive, true negative, false positive and false negative samples, respectively.
Implementation of learning methods
To ensure in benchmark DeceFL can work well for either strongly convex learning methods or nonconvex methods, we adopted two algorithms: logistic regression with l_{2} regularization, and deep neural network (DNN), as used in ref. [23]. For logistic regression, every node runs 1 epoch in each communication round, with batchsize 64. It uses the SGD optimizer, with weight decay coefficient 10^{4} for the realization of l_{2} regularization. The learning rate is fixed to 0.001. For DNN, every node runs 5 epochs (dataset A2) or 10 epochs (CWRU dataset) in each communication round, with batchsize 64. It uses the SGD optimizer, with weight decay coefficient 10^{4}. The initial learning rate is 0.1, which is decayed by multiplying 0.1 every 20 communication rounds. This DNN has 8 hidden layers, whose dimensions are 256, 512, 512, 256, 256, 128, 128, 64, respectively. The dropout rate is set to 0.3. Most experiments use softmax as the activation function in the output layer for classification, except the logistic regression for dataset A2 uses sigmoid. The total number of running rounds is selected by visualization effects of convergence for all methods in comparison.
Data availability
The peripheral blood mononuclear cell (PBMC)derived transcriptome dataset, named as “dataset A2” in ref. [23], was used, which was originally generated with Affymetrix HGU133 2.0 microarrays (8348 individuals), by inspection of all publicly available datasets at National Center for Biotechnology Information (NCBI) Gene Expression Omnibus. To perform the IID experiments, the initial preparation of dataset randomly dropped negative samples, resulting in 5176 samples, such that the whole dataset is balanced, i.e., the ratio of the positive to the negative samples is 1∶1. CWRU Bearing dataset refers to the data of ball bearing test for normal and faulty bearings from the Case Western Reserve University, available on https://engineering.case.edu/bearingdatacenter. All source codes are openly available on GitHub (https://github.com/HAIRLAB/DeceFL).
Acknowledgments
We thank Mr. Anthony Haynes, Mr. Cai Huang for editing.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 92167201, 52188102, 62133003, 61991403, 61991404, and 61991400) and Jiangsu Industrial Technology Research Institute (JITRI)
Author contributions
Idea was conceived by Y.Y. Theory was developed by J.L., R.C., L.X., X.Y., T.Y., Y.Y. Simulation codes were developed by D.J., C.S., M.W. and were reviewed by Z.Y. Experiments were designed by Y.Y., Z.Y., and were performed by D.J., Z.Y., C.S., M.W., Y.Y., F.H., R.C. Projects were supervised by Y.Y., S.S., H.D. Funding was acquired by Y.Y., J.L., H.Z., H.D. The original draft was written by Y.Y., Z.Y., J.L., R.C., X.Y. and all authors provided critical review of the manuscript and approved the final draft.
Conflict of interest
The authors declare no competing interests.
Supplementary information
The supporting information is available online at https://doi.org/10.1360/nso/20220043. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
In this work, we consider that the information communication between clients is mutual for notational simplicity; therefore, the adjacency matrix is symmetric. Furthermore we assume that the underlying topology is connected, i.e., for any two clients k and j, there is at least one path from k to j.
References
 Lecun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436–444. [Article] [CrossRef] [PubMed] [Google Scholar]
 Yuan Y, Tang X, Zhou W, et al. Data driven discovery of cyber physical systems. Nat Commun 2019; 10: 4894. [Article] [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
 Yan L, Zhang HT, Goncalves J, et al. An interpretable mortality prediction model for COVID19 patients. Nat Mach Intell 2020; 2: 283–288. [Article] [CrossRef] [Google Scholar]
 Price Ii WN, Cohen IG. Privacy in the age of medical big data. Nat Med 2019; 25: 37–43. [Article] [CrossRef] [PubMed] [Google Scholar]
 DeGrave AJ, Janizek JD, Lee SI. AI for radiographic COVID19 detection selects shortcuts over signal. Nat Mach Intell 2021; 3: 610–619. [Article] [CrossRef] [Google Scholar]
 Roberts M, Driggs D, Thorpe M, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID19 using chest radiographs and CT scans. Nat Mach Intell 2021; 3: 199–217. [Article] [CrossRef] [Google Scholar]
 Konečný J, McMahan B, Ramage D. Federated optimization: Distributed optimization beyond the datacenter. ArXiv: 1511.03575. [Google Scholar]
 Yang Q, Liu Y, Chen T, et al. Federated machine learning. ACM Trans Intell Syst Technol 2019; 10: 1–19. [Article] [Google Scholar]
 Konečný J, Brendan McMahan H, Yu FX, et al. Federated learning: Strategies for improving communication efficiency. ArXiv: 1610.05492. [Google Scholar]
 McMahan B, Moore E, Ramage D, et al. Communicationefficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, 2017, 54: 12731282. [Google Scholar]
 Rizk E, Vlaski S, Sayed AH. Dynamic federated learning. ArXiv: 2002.08782. [Google Scholar]
 Vlaski S, Rizk E, Sayed AH. Secondorder guarantees in federated learning. In: Proceedings of the 2020 54th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, 2020, 915922. [Google Scholar]
 Bonawitz K, Eichner H, Grieskamp W, et al. Towards federated learning at scale: System design. ArXiv: 1902.01046. [Google Scholar]
 Gu R, Yang S, Wu F. Distributed machine learning on mobile devices: A survey. ArXiv: 1909.08329. [Google Scholar]
 Kairouz P, Brendan McMahan H, Avent B, et al. Advances and open problems in federated learning. ArXiv: 1912.04977. [Google Scholar]
 Li T, Sahu AK, Talwalkar A, et al. Federated learning: Challenges, methods, and future directions. IEEE Signal Process Mag 2020; 37: 50–60. [Article] [NASA ADS] [Google Scholar]
 Liu J, Huang J, Zhou Y, et al. From distributed machine learning to federated learning: a survey. Knowl Inf Syst 2022; 64: 885–917. [Article] [Google Scholar]
 Bellet A, Guerraoui R, Taziki M, et al. Personalized and private peertopeer machine learning. ArXiv: 1705.08435. [Google Scholar]
 Kaissis G, Ziller A, PasseratPalmbach J, et al. Endtoend privacy preserving deep learning on multiinstitutional medical imaging. Nat Mach Intell 2021; 3: 473–484. [Article] [CrossRef] [Google Scholar]
 Lian X, Zhang C, Zhang H, et al. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Proceedings of 31th Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 2017, 30: 53305340. [Google Scholar]
 Tang Z, Shi S, Chu X. Communicationefficient decentralized learning with sparsification and adaptive peer selection. ArXiv: 2002.09692. [Google Scholar]
 Wang S, Tuor T, Salonidis T, et al. Adaptive federated learning in resource constrained edge computing systems. IEEE J Sel Areas Commun 2019; 37: 1205–1221. [Article] [CrossRef] [Google Scholar]
 WarnatHerresthal S, Schultze H, Shastry KL, et al. Swarm Learning for decentralized and confidential clinical machine learning. Nature 2021; 594: 265–270. [Article] [Google Scholar]
 Blot M, Picard D, Cord M, et al. Gossip training for deep learning. ArXiv: 1611.09726. [Google Scholar]
 Daily JA, Vishnu A, Siegel CM, et al. Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent. ArXiv: 1803.05880. [Google Scholar]
 Hu C, Jiang J, Wang Z. Decentralized federated learning: A segmented gossip approach. ArXiv: 1908.07782. [Google Scholar]
 Scardapane S, Spinelli I, Lorenzo PD. Distributed training of graph convolutional networks. IEEE Trans Signal Inf Process over Networks 2021; 7: 87–100. [Article] [CrossRef] [MathSciNet] [Google Scholar]
 Pei Y, Mao R, Liu Y, et al. Decentralized federated graph neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Montreal, 2021. [Google Scholar]
 Lalitha A.Fully decentralized federated learning. In: Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), Montreal, 2018. [Google Scholar]
 Lalitha A, Kilinc OC, Javidi T, et al. Peertopeer federated learning on graphs. ArXiv: 1901.11173. [Google Scholar]
 Pappas C, Chatzopoulos D, Lalis S, et al. Ipls: A framework for decentralized federated learning. In: Proceedings of 2021 IFIP Networking Conference, Espoo, 2021. [Google Scholar]
 Savazzi S, Kianoush S, Rampa V, et al. A joint decentralized federated learning and communications framework for industrial networks. In: Proceedings of 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2020, 17. [Google Scholar]
 Wu X, Wang Z, Zhao J, et al. FedBC: Blockchainbased decentralized federated learning. In: Proceedings of 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, 2020, 217221. [Google Scholar]
 Lyu L, Yu J, Nandakumar K, et al. Towards fair and privacypreserving federated deep models. IEEE Trans Parallel Distrib Syst 2020; 31: 2524–2541. [Article] [CrossRef] [Google Scholar]
 Yapp AZH, et al. Communicationefficient and scalable decentralized federated edge learning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Montreal, 2021. [Google Scholar]
 Zhu L, Liu Z, Han S. Deep leakage from gradients. In: Proceedings Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, 2019. [Google Scholar]
 Koloskova A, Stich SU, Jaggi M. Decentralized stochastic optimization and gossip algorithms with compressed communication. ArXiv: 1902.00340. [Google Scholar]
 Li X, Huang K, Yang W, et al. On the convergence of FedAvg on nonIID data. ArXiv: 1907.02189. [Google Scholar]
 OlfatiSaber R, Murray RM. Consensus problems in networks of agents with switching topology and timedelays. IEEE Trans Automat Contr 2004; 49: 1520–1533. [Article] [CrossRef] [MathSciNet] [Google Scholar]
 Ren W, Beard RW. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans Automat Contr 2005; 50: 655–661. [Article] [CrossRef] [Google Scholar]
 Jadbabaie A, Jie Lin A, Morse AS. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans Automat Contr 2003; 48: 988–1001. [Article] [Google Scholar]
 Isermann R.Faultdiagnosis systems: an introduction from fault detection to fault tolerance. Heidelberg: Springer Science & Business Media, 2005. [Google Scholar]
 Yuan Y, Ma G, Cheng C, et al. A general endtoend diagnosis framework for manufacturing systems. Natl Sci Rev 2020; 7: 418–429. [Article] [CrossRef] [PubMed] [Google Scholar]
 Yang T, Yi X, Wu J, et al. A survey of distributed optimization. Annu Rev Control 2019; 47: 278–305. [Article] [CrossRef] [MathSciNet] [Google Scholar]
 DeMillo RA.Foundations of secure computation. Technical Report. Georgia Institute of Technology, 1978. [Google Scholar]
 Yao AC.Protocols for secure computations. In: Proceedings of 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), IEEE, Washington D C, 1982, 160164. [Google Scholar]
 Yu Y, Xie X. Privacypreserving computation in the postquantum era. Natl Sci Rev 2021; 8: nwab115. [Article] [CrossRef] [PubMed] [Google Scholar]
 Lyu L, Yu H, Yang Q. Threats to federated learning: A survey. ArXiv: 2003.02133. [Google Scholar]
All Figures
Figure 1 Illustration of key concepts in different stateoftheart federated learning frameworks. (A) Classical Federated Learning: a central client is needed to receive and transmit all essential information to other clients. It is equivalent to an alltoall network without such a central center, i.e., every client in the network can receive information from all other clients. (B) Swarm Learning: there is no such a universal central client, but a potentially different central client is selected in every iteration. Mathematically, it is equivalent to FedAvg with varying central clients. (C) The proposed Decentralized Federated Learning: there is no need for a central client in any iteration. Any connected timeinvariant or timevarying topology (under certain mild condition given in the Supplementary Information) has guaranteed performance, therefore unifying the classical federated learning and swarm learning. 

In the text 
Figure 2 DeceFL to predict leukaemia from A2 benchmark dataset [23]. (A) Data were divided into IID samples for all clients. (B) Data were divided into NonIID unbalanced samples. (C) Different topologies for FedAvg, SL and DeceFL respectively. The topology for SL must hold in every iteration when any other node is selected as a central client. (D)–(G) Performance of three algorithms on IID/NonIID setups over logistic regression/neural networks. DeceFL presents a similar performance as FedAvg and SL in both IID and NonIID scenarios, except during the transient period that DeceFL takes to reach consensus. In contrast to fixed topology in FedAvg and SL, DeceFL provides much more freedom on the choice of communication topology. 

In the text 
Figure 3 DeceFL to predict leukaemia from A2 benchmark dataset [23]. (A) Timevarying communication topology that consists of a sequence of graphs each of which is not connected while the lumpsum graph over a fixed period is connected. (B) Timevarying communication topology that adds or removes nodes over time. (C), (D) Performance of DeceFL with edgevarying graphs on the IID and NonIID setups of dataset A2 using logistic regression, with reference performance of FedAvg that uses full information. (E), (F) Performance of DeceFL with nodevarying graphs on the IID and NonIID setups of dataset A2 using logistic regression, with reference performance of FedAvg that uses full information. These timevarying experiments manifest the robustness of DeceFL in the presence of interventions to communication topology. The dropout or supplement of clients in the middle of DeceFL computation will not break or deteriorate the learning process, which is particularly essential in practice for largescale timeconsuming dataintensive applications. 

In the text 
Figure 4 DeceFL to detect bearing faults from CWRU benchmark dataset. (A) Data were divided into IID samples for all 4 clients. (B) Data were divided into NonIID unbalanced samples. Each client locally specified its data size and sample distribution. (C) Illustration of communication topology for FedAvg, SL and DeceFL. (D), (E) Performance of DeceFL on IID data using logistic regression and DNN, respectively, with reference performance of FedAvg and SL. (F), (G) Performance of DeceFL on NonIID data using logistic regression and DNN, respectively, with reference performance of FedAvg and SL. (D)–(G) The boxplots at bottom illustrate the performance comparison between DeceFL and each client that was trained independently (that is, each client trained its own model only using its associated local data without communicating with any other clients). The experiments on CWRU benchmark dataset confirms the similar performance of DeceFL, as a fully decentralized framework, in comparison to FedAvg and SL, for industrial datasets, while DeceFL offers more freedom on the choice of communication topology to handle practical issues in organizing clients for largescale federated learning applications. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.