Statistical and Computational Aspects of Modern Machine Learning Problems



Continuous technological and communication advancements have enabled the collection of and access toever-growing large scale datasets. There is a significant amount of research and development being devotedto machine learning problems in general, and the underlying optimization algorithms in particular, that areformulated on these large scale datasets. However, lack of adequate computational resources, in particularstorage, can severely limit, or even prevent, any attempts at solving such optimization problems in a traditionalstand-alone way, e.g., using a single machine. This can be remedied through distributed computing, in whichresources across a network of stand-alone computational nodes are “pooled” together so as to scale to theproblem at hand. Optimization algorithms designed for distributed environments can efficiently leverage thecomputational resources of a network of machines. A significant issue with this framework of computing is thatinter-machine communication can be expensive. Therefore, in such environments it is necessary to designalgorithms that use a low number of communication rounds. In this talk, I will discuss the significant potential ofcommunication-efficient distributed second-order optimization methods and discuss my research in developingsuch methods that remedy many of the shortcomings of the existing methods.