Skip to content

Latest commit

 

History

History
41 lines (38 loc) · 2.4 KB

File metadata and controls

41 lines (38 loc) · 2.4 KB

home | copyright ©2016, tim@menzies.us

overview | syllabus | src | submit | chat


Review 8: Privacy

  1. Why do many organizations decline to share data?
  2. What is generalization for privacy? Given an example.
  3. What is suppression for privacy? Given an example.
  4. What is perturbation for privacy? Given an example.
  5. Define k-anonymity. For k=2, show 3 rows in a training set that - satisfies k=2 anonymity; - does not satisfy k anonymity.
  6. According to Brickell+Shmatikov and Grechanik et al., data mining efficacy drops as data privacy is increased (using standard methods). Why is this so?
  7. A data set contains F features, one of which is a class attribute. Explain a method could be used to replace that data set with a smaller data set.
  8. A data set contains R features. Explain a method could be used to replace that data with a smaller set.
  9. If a data set is replaced with 25% of its columns and 10% of its rows, what is its percent privacy (measured in terms of private cells)?
  10. Explain, with an example, the 1st law of trusted data sharing ("don’t share everything; just the 'corners');
  11. Explain why there needs to be a 2nd law of trusted data sharing ("anonymize the data in the “corners'");
  12. Explain, with an example, how to operationalize the 3rd law of trusted data sharing ("never mutate across 'decision boundary'"). In your explanation, make sure you also explain what happens if this third law is violated.
  13. Describe the LACE2 "pass the parcel" procedure. Your answer needs to mention how LACE2 uses privacy and anomaly detection.
  14. LACE2 is a "supervised privacy algorithm" while k-anonymity is an "unsupervised privacy algorithm".
    • What are the differences between these two classes of algorithms?
    • What can be done with unsupervised privacy algorithms that can't be done with supervised privacy?
    • Empirically, what are the results describing the benefits of unsupervised vs supervised privacy?