Please note: This PhD defence will take place in DC 2310 and online.
Shubhankar Mohapatra, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Xi He
Data privacy is one of the top concerns in data science. The notion of privacy that is most used in practice is differential privacy. It offers a guaranteed bound on the loss of privacy even in worst-case assumptions. Multiple algorithms have been built to perform learning tasks such as training machine learning models or generating synthetic data using differential privacy. In practice, these algorithms need to be performed within the limit of an assigned privacy budget. This budget asks practitioners to limit the number of computations on the private dataset, including all routine procedures such as data cleaning, hyperparameter tuning, and model training. Several tools can perform these tasks in disjunction when the dataset is non-private. However, these tools do not translate easily to differential privacy and often do not consider the cumulative privacy costs.
In this thesis, we explore various pragmatic problems that a data science practitioner may face when deploying a differentially private learning framework from data collection to model training. In particular, we are interested in real-world data quality problems such as missing data, inconsistent data, wrongly labelled data, and machine learning pipeline requirements such as hyperparameter tuning. We envision building a general-purpose private learning framework that can handle real data as input and can be used in learning tasks such as generating a highly accurate private machine learning model or creating a synthetic version of the dataset with end-to-end differential privacy guarantees. We envision our work will make differentially private learning more accessible to data science practitioners and easily deployable in day-to-day applications.
To attend this PhD defence in person, please go to DC 2310. You can also attend virtually on Zoom.