Adaptive Federated Optimization
In non-federated settings, adaptive optimization methods have desirable convergence properties. Can federated versions of these adaptive optimizers, including Adagrad, Adam, and Yogi facilitate better convergence in the presence of heterogeneous data?