Weekly Digs #2
Digging into private machine learning this week brought us training logistic and boosting models on encrypted data, and an update on how to ensure that the final model itself doesn't leak too much itself. And for those that prefer to first sit back an enjoy a presentation we also noticed a gem of a presentation on applying secure computation to real-world use cases, and some of the unexpected obstacles that can occur.
News
- @mvaria's talk about a real-world application of MPC at this year's ENIGMA conference is online and well worth a watch! Via @lcyqn.
Papers
- Scalable Private Learning with PATE
Follow-up work to the celebrated Student-Teacher way of ensuring privacy of training data via differential privacy, now with better privacy bounds and hence less added noise. This is partially achieved by switching to Gaussian noise and more advanced (trusted) aggregation mechanisms. - Privacy-Preserving Logistic Regression Training
Fitting a logistic model from homomorphically encrypted data using the Newton-Raphson iterative method, but with a fixed and approximated Hessian matrix. Performance is evaluated on the iDASH cancer detection scenario. - Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
Presents the SecureBoost framework for mixing boosting algorithms with secure computation. The former uses randomly generated linear classifiers at the base and the latter comes in three variants: RLWE+GC, Paillier+GC, and SecretSharing+GC. Performance experiments on both the model itself and on the secure versions are provided. - Machine learning and genomics: precision medicine vs. patient privacy
Non-technical paper illustrating that secure computation techniques are finding their way into otherwise unrelated research areas, and hitting home-run with "data access restrictions are a burden for researchers, particularly junior researchers or small labs that do not have the clout to set up collaborations with major data curators".
Blogs
- Uber's differential privacy .. probably isn't
@frankmcsherry looks at Uber's SQL differential privacy project and shares experience gained from implementing these things in Microsoft's PINQ.