Программирование, 2023, № 4, стр. 58-74

ПРИМЕНЕНИЕ ИМИТАЦИОННОГО КОМПЬЮТЕРНОГО МОДЕЛИРОВАНИЯ К ЗАДАЧЕ ОБЕЗЛИЧИВАНИЯ ПЕРСОНАЛЬНЫХ ДАННЫХ. ОЦЕНКА СОСТОЯНИЯ И ОСНОВНЫЕ ПОЛОЖЕНИЯ

А. В. Борисов a*, А. В. Босов a**, А. В. Иванов a***

a Федеральный исследовательский центр “Информатика и управление” РАН
119333 Москва, ул. Вавилова, д. 44, кор. 2, Россия

* E-mail: aborisov@ipiran.ru
** E-mail: avbosov@ipiran.ru
*** E-mail: aivanov@ipiran.ru

Поступила в редакцию 20.01.2023
После доработки 25.02.2023
Принята к публикации 02.03.2023

Аннотация

В статье представлена первая часть исследования по проблеме автоматизированной обработки персональных данных с целью их обезличивания и анализа. Эта часть носит обзорный характер и ставит целью анализ состояния исследований в данной области и систематизацию имеющихся результатов. Представлены результаты анализа широкого круга вопросов обезличивания, сформировавшие системное понимание состояния исследований и обосновавшие выбор направления для дальнейшего изучения. Вначале сформулированы определения основных терминов и понятий, используемых в связи с обезличиванием персональных данных, в т.ч. в увязке с законодательством РФ. Направления исследований сгруппированы по четырем разделам: методы обезличивания, проблемы реализации, приложения обработки обезличенных данных, вопросы деобезличивания. По каждой из групп методов обезличивания – рандомизации, группировке, распределению данных и контролю приложений – даны описания основных алгоритмов, проанализированы их достоинства и недостатки. Проблемы реализации затрагивают такие понятия как полезность обезличенных данных, ограничения применимости универсальных алгоритмов и надежность в отношении сохранения анонимности субъектов персональных данных. В числе прикладных решений, сформировавших востребованность обработки обезличенных данных, обсуждаются медицинские, биологические, генетические исследования и охрана правопорядка. В заключительной части упоминаются наиболее резонансные факты деобезличивания и дается небольшой обзор прессы.

Список литературы

  1. Aggarwal C.C., Yu P.S. A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In: Aggarwal C.C., Yu P.S. (eds) Privacy-Preserving Data Mining. Advances in Database Systems. 2008. V. 34. Springer, Boston, MA.

  2. Domingo-Ferrer J., Farràs O., Ribes-González J., Sánchez D. Privacy-preserving cloud computing on sensitive data: A survey of methods, products and challenges // Computer Communications. 2019. V. 140–141. P. 38–60.

  3. Sahi M.A. et al. Privacy Preservation in e-Healthcare Environments: State of the Art and Future Directions // IEEE Access. 2018. V. 6. P. 464–478. https://doi.org/10.1109/ACCESS.2017.2767561

  4. Spiekermann S., Cranor L.F. Engineering Privacy // IEEE Transactions on Software Engineering. 2009. V. 35. № 1. P. 67–82. https://doi.org/10.1109/TSE.2008.88

  5. Verykios V.S., Bertino E., Fovino I.N., Provenza L.P., Saygin Y., Theodoridis Y. State-of-the-art in privacy preserving data mining // ACM SIGMOD Record. 2004. V. 33. № 1.

  6. Guide to Basic Data Anonymization Technique. Personal Data Protection Commission, Singapore. 2018.

  7. Newton E., Sweeney L., Malin B. Preserving Privacy by De-identifying Facial Images // IEEE Transactions on Knowledge and Data Engineering. 2005.

  8. Sweeney L. Privacy-Preserving Bio-terrorism Surveillance // AAAI Spring Symposium, AI Technologies for Homeland Security. 2005.

  9. Sweeney L. AI Technologies to Defeat Identity Theft Vulnerabilities // AAAI Spring Symposium, AI Technologies for Homeland Security. 2005.

  10. Sweeney L., Gross R. Mining Images in Publicly-Available Cameras for Homeland Security // AAAI Spring Symposium, AI Technologies for Homeland Security. 2005.

  11. Agrawal R., Srikant R. Privacy-Preserving Data Mining // Proceedings of the ACM SIGMOD Conference. 2000.

  12. Agrawal D., Aggarwal C.C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms // ACM PODS Conference. 2002.

  13. Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A. Approximation Algorithms for k-anonymity. Journal of Privacy Technology. 2005. № 20051120001.

  14. Aggarwal C.C. On k-anonymity and the curse of dimensionality // VLDB Conference. 2005.

  15. LeFevre K., DeWitt D., Ramakrishnan R. Incognito: Full Domain K-Anonymity // ACM SIGMOD Conference. 2005.

  16. Meyerson A., Williams R. On the complexity of optimal k-anonymity // ACM PODS Conference. 2004.

  17. Machanavajjhala A., Gehrke J., Kifer D., Venkitasubramaniam M. L-Diversity: Privacy Beyond k-Anonymity // ICDE Conference. 2006.

  18. Li N., Li T., Venkatasubramanian S. t-Closeness: Privacy beyond k-anonymity and l-diversity // ICDE Conference. 2007.

  19. Dwork C., Nissim K. Privacy-Preserving Data Mining on Vertically Partitioned Databases // CRYPTO. 2004.

  20. Vaidya J., Clifton C. Privacy-Preserving Decision Trees over vertically partitioned data // Lecture Notes in Computer Science. 2005. V. 3654.

  21. Yu H., Vaidya J., Jiang X. Privacy-Preserving SVM Classification on Vertically Partitioned Data // PAKDD Conference. 2006.

  22. Verykios V.S., Elmagarmid A., Bertino E., Saygin Y., Dasseni E. Association Rule Hiding // IEEE Transactions on Knowledge and Data Engineering. 2004. V. 16. № 4.

  23. Moskowitz I., Chang L. A decision theoretic system for information downgrading // Joint Conference on Information Sciences. 2000.

  24. Adam N., Wortmann J.C. Security-Control Methods for Statistical Databases: A Comparison Study // ACM Computing Surveys. 1989. V. 21. № 4.

  25. Liew C.K., Choi U.J., Liew C.J. A data distortion by probability distribution // ACM TODS. 1985. V. 10. № 3. P. 395–411.

  26. Warner S.L. Randomized Response: A survey technique for eliminating evasive answer bias // Journal of American Statistical Association. 1965. V. 60. № 309. P. 63–69.

  27. Silverman B.W. Density Estimation for Statistics and Data Analysis. Chapman and Hall. 1986.

  28. Aggarwal C.C. On Randomization, Public Information and the Curse of Dimensionality // ICDE Conference. 2007.

  29. Gambs S., Kegl B., Aimeur E. Privacy-Preserving Boosting // Knowledge Discovery and Data Mining Journal. 2007. V. 14. № 1. P. 131–170.

  30. Zhang P., Tong Y., Tang S., Yang D. Privacy-Preserving Naive Bayes Classifier // Lecture Notes in Computer Science. 2005. V. 3584.

  31. Evfimievski A., Srikant R., Agrawal R., Gehrke J. Privacy-Preserving Mining of Association Rules // ACM KDD Conference. 2002.

  32. Rizvi S., Haritsa J. Maintaining Data Privacy in Association Rule Mining // VLDB Conference. 2002.

  33. Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP // Proceedings of the ACM SIGMOD Conference. 2005.

  34. Polat H., Du W. SVD-based collaborative filtering with privacy // ACM SAC Symposium. 2005.

  35. Bertino E., Fovino I., Provenza L. A Framework for Evaluating Privacy-Preserving Data Mining Algorithms // Data Mining and Knowledge Discovery Journal. 2005. V. 11. P. 121–154.

  36. Evfimievski A., Gehrke J., Srikant R. Limiting Privacy Breaches in Privacy Preserving Data Mining // ACM PODS Conference. 2003.

  37. Huang Z., Du W., Chen B. Deriving Private Information from Randomized Data // ACM SIGMOD Conference. 2005. P. 37–48.

  38. Kargupta H., Datta S.,Wang Q., Sivakumar K. On the Privacy Preserving Properties of Radom Data Perturbation Techniques // ICDM Conference. 2003. P. 99–106.

  39. Johnson W., Lindenstrauss J. Extensions of Lipshitz Mapping into Hilbert Space // Contemporary Math. 1984. V. 26. P. 189–206.

  40. Oliveira S.R.M., Zaiane O. Privacy Preserving Clustering by Data Transformation // Proc. 18th Brazilian Symp. Databases. 2003. P. 304–318.

  41. Oliveira S.R.M., Zaiane O. Data Perturbation by Rotation for Privacy-Preserving Clustering // Technical Report TR04-17, Department of Computing Science, University of Alberta, Edmonton, AB, Canada. 2004.

  42. Chen K., Liu L. Privacy-preserving data classification with rotation perturbation // ICDM Conference. 2005.

  43. Liu K., Kargupta H., Ryan J. Random Projection Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining // IEEE Transactions on Knowledge and Data Engineering. 2006. V. 18. № 1.

  44. Kim J., Winkler W. Multiplicative Noise for Masking Continuous Data // Technical Report Statistics 2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C. 2003.

  45. Mukherjee S., Chen Z., Gangopadhyay S. A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier based transforms // VLDB Journal. 2006.

  46. Liu K., Giannella C. Kargupta H. An Attacker’s View of Distance Preserving Maps for Privacy-Preserving Data Mining // PKDD Conference. 2006.

  47. Fienberg S., McIntyre J. Data Swapping: Variations on a Theme by Dalenius and Reiss // Technical Report, National Institute of Statistical Sciences. 2003.

  48. Samarati P. Protecting Respondents’ Identities in Microdata Release // IEEE Trans. Knowl. Data Eng. 2001. V. 13. № 6. P. 1010–1027.

  49. Bayardo R.J., Agrawal R. Data Privacy through Optimal k-Anonymization // Proceedings of the ICDE Conference. 2005. P. 217–228.

  50. Fung B., Wang K., Yu P. Top-Down Specialization for Information and Privacy Preservation // ICDE Conference. 2005.

  51. Wang K., Yu P., Chakraborty S. Bottom-Up Generalization: A Data Mining Solution to Privacy Protection // ICDM Conference. 2004.

  52. Domingo-Ferrer J., Mateo-Sanz J. Practical data-oriented micro-aggregation for statistical disclosure control // IEEE TKDE. 2002. V. 14. № 1.

  53. Aggarwal G., Feder T., Kenthapadi K., Khuller S., Motwani R., Panigrahy R., Thomas D., Zhu A. Achieving Anonymity via Clustering // ACM PODS Conference. 2006.

  54. Aggarwal C.C., Yu P.S. A Condensation approach to privacy preserving data mining // EDBT Conference. 2004.

  55. Winkler W. Using simulated annealing for k-anonymity // Technical Report 7, US Census Bureau, Washington D.C. 20233. 2002.

  56. Iyengar V.S. Transforming Data to Satisfy Privacy Constraints // KDD Conference. 2002.

  57. Lakshmanan L., Ng R., Ramesh G. To Do or Not To Do: The Dilemma of Disclosing Anonymized Data // ACM SIGMOD Conference. 2005.

  58. Aggarwal C.C., Yu P.S. On Variable Constraints in Privacy-Preserving Data Mining // SIAM Conference. 2005.

  59. Xiao X., Tao Y. Personalized Privacy Preservation // ACM SIGMOD Conference. 2006.

  60. Wang K., Fung B.C.M. Anonymization for Sequential Releases // ACM KDD Conference. 2006.

  61. Pei J., Xu J., Wang Z., Wang W., Wang K. Maintaining k-Anonymity against Incremental Updates // Symposium on Scientific and Statistical Database Management. 2007.

  62. Aggarwal C.C., Yu P.S. On Privacy-Preservation of Text and Sparse Binary Data with Sketches // SIAM Conference on Data Mining. 2007.

  63. Aggarwal C.C., Yu P.S. On Anonymization of String Data // SIAM Conference on Data Mining. 2007.

  64. Martin D., Kifer D., Machanavajjhala A., Gehrke J., Halpern J. Worst-Case Background Knowledge // ICDE Conference. 2007.

  65. Pinkas B. Cryptographic Techniques for Privacy-Preserving Data Mining // ACM SIGKDD Explorations. 2002. V. 4. № 2.

  66. Even S., Goldreich O., Lempel A. A Randomized Protocol for Signing Contracts // Communications of the ACM. 1985. V. 28.

  67. Rabin M.O. How to exchange secrets by oblivious transfer // Washington D.C. 20233TR-81, Aiken Corporation Laboratory. 1981.

  68. Naor M., Pinkas B. Efficient Oblivious Transfer Protocols // SODA Conference. 2001.

  69. Yao A.C. How to Generate and Exchange Secrets // FOCS Conference. 1986.

  70. Chaum D., Crepeau C., Damgard I. Multiparty unconditionally secure protocols // ACM STOC Conference. 1988.

  71. Ioannidis I., Grama A., Atallah M. A secure protocol for computing dot-products in clustered and distributed environments // International Conference on Parallel Processing. 2002.

  72. Du W., Atallah M. Secure Multi-party Computation: A Review and Open Problems // CERIAS Technical Report 2001-51, Purdue University. 2001.

  73. Clifton C., Kantarcioglou M., Lin X., Zhu M. Tools for privacy preserving distributed data mining // ACM SIGKDD Explorations. 2002. V. 4. № 2.

  74. Lindell Y., Pinkas B. Privacy-Preserving Data Mining // CRYPTO. 2000.

  75. Kantarcioglu M., Vaidya J. Privacy-Preserving Naive Bayes Classifier for Horizontally Partitioned Data // IEEE Workshop on Privacy-Preserving Data Mining. 2003.

  76. Yu H., Jiang X., Vaidya J. Privacy-Preserving SVM using nonlinear Kernels on Horizontally Partitioned Data // SAC Conference. 2006.

  77. Yang Z., Zhong S., Wright R. Privacy-Preserving Classification of Customer Data without Loss of Accuracy // SDM Conference. 2006.

  78. Kantarcioglu M., Clifton C. Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data // IEEE TKDE Journal. 2004. V. 16. № 9.

  79. Inan A., Saygin Y., Savas E., Hintoglu A., Levi A. Privacy-Preserving Clustering on Horizontally Partitioned Data // Data Engineering Workshops. 2006.

  80. Jagannathan G., Wright R. Privacy-Preserving Distributed k-means clustering over arbitrarily partitioned data // ACM KDD Conference. 2005.

  81. Jagannathan G., Pillaipakkamnatt K., Wright R. A New Privacy-Preserving Distributed k-Clustering Algorithm // SIAM Conference on Data Mining. 2006.

  82. Polat H., Du W. Privacy-Preserving Top-N Recommendations on Horizontally Partitioned Data // Web Intelligence. 2005.

  83. Bawa M., Bayardo R.J., Agrawal R. Privacy-Preserving Indexing of Documents on the Network // VLDB Conference. 2003.

  84. Vaidya J., Clifton C. Privacy-Preserving Association Rule Mining in Vertically Partitioned Databases // ACM KDD Conference. 2002.

  85. Vaidya J., Clifton C. Privacy-Preserving Naive Bayes Classifier over vertically partitioned data // SIAM Conference. 2004.

  86. Vaidya J., Clifton C. Privacy-Preserving k-means clustering over vertically partitioned Data // ACM KDD Conference. 2003.

  87. Jiang W., Clifton C. Privacy-preserving distributed k-Anonymity // Proceedings of the IFIP 11.3 Working Conference on Data and Applications Security. 2005.

  88. Wang K., Fung B.C.M., Dong G. Integrating Private Databases for Data Analysis // Lecture Notes in Computer Science. 2005. V. 3495.

  89. Zhong S., Yang Z., Wright R. Privacy-enhancing k-anonymization of customer data // Proc. of the ACM SIGMOD-SIGACT-SIGART Principles of Database Systems, Baltimore, MD. 2005.

  90. Bettini C., Wang X.S., Jajodia S. Protecting Privacy against Location Based Personal Identification // Proc. of Secure Data Management Workshop, Trondheim, Norway. 2005.

  91. Gedik B., Liu L. A customizable k-anonymity model for protecting location privacy // ICDCS Conference. 2005.

  92. Mimoto T., Kiyomoto Sh., Miyaji A. Secure Data Management Technology // In Security Infrastructure Technology for Integrated Utilization of Big Data (T. Mimoto and A. Miyaji eds.), Singapore, Springer Open. 2020.

  93. Oliveira S.R.M., Zaiane O., Saygin Y. Secure Association-Rule Sharing // PAKDD Conference. 2004.

  94. Saygin Y., Verykios V., Clifton C. Using Unknowns to prevent discovery of Association Rules // ACM SIGMOD Record. 2001. V. 30. № 4.

  95. Atallah M., Elmagarmid A., Ibrahim M., Bertino E., Verykios V. Disclosure limitation of sensitive rules // Workshop on Knowledge and Data Engineering Exchange. 1999.

  96. Dasseni E., Verykios V., Elmagarmid A., Bertino E. Hiding Association Rules using Confidence and Support // 4th Information Hiding Workshop. 2001.

  97. Chang L., Moskowitz I. An integrated framework for database inference and privacy protection. Data and Applications Security. Kluwer. 2000.

  98. Saygin Y., Verykios V., Elmagarmid A. Privacy-Preserving Association Rule Mining // 12th International Workshop on Research Issues in Data Engineering. 2002.

  99. Wu Y.-H., Chiang C.-M., Chen A.L.P. Hiding Sensitive Association Rules with Limited Side Effects // IEEE Transactions on Knowledge and Data Engineering. 2007. V. 19. № 1.

  100. Aggarwal C., Pei J., Zhang B. A Framework for Privacy Preservation against Adversarial Data Mining // ACM KDD Conference. 2006.

  101. Chang L., Moskowitz I. Parsimonious downgrading and decision trees applied to the inference problem // New Security Paradigms Workshop. 1998.

  102. Natwichai J., Li X., Orlowska M. A Reconstruction-based Algorithm for Classification Rules Hiding // Australasian Database Conference. 2006.

  103. Kenthapadi K., Mishra N., Nissim K. Simulatable Auditing // ACM PODS Conference. 2005.

  104. Nabar S., Marthi B., Kenthapadi K., Mishra N., Motwani R. Towards Robustness in Query Auditing // VLDB Conference. 2006.

  105. Chawla S., Dwork C., McSherry F., Smith A., Wee H. Towards Privacy in Public Databases // TCC. 2005.

  106. Mishra N., Sandler M. Privacy vs Pseudorandom Sketches // ACM PODS Conference. 2006.

  107. Blum A., Dwork C., McSherry F., Nissim K. Practical Privacy: The SuLQ Framework // ACM PODS Conference. 2005.

  108. Dinur I., Nissim K. Revealing Information while preserving privacy // ACM PODS Conference. 2003.

  109. Dwork C., Kenthapadi K., McSherry F., Mironov I., Naor M. Our Data, Ourselves: Privacy via Distributed Noise Generation // EUROCRYPT. 2006.

  110. Dwork C., McSherry F., Nissim K., Smith A. Calibrating Noise to Sensitivity in Private Data Analysis // TCC. 2006.

  111. Wang K., Fung B.C.M., Yu P. Template based Privacy-Preservation in classification problems // ICDM Conference. 2005.

  112. Kifer D., Gehrke J. Injecting utility into anonymized datasets // SIGMOD Conference. 2006. P. 217–228.

  113. Xu J., Wang W., Pei J., Wang X., Shi B., Fu A.W.C. Utility Based Anonymization using Local Recoding // ACM KDD Conference. 2006.

  114. LeFevre K., DeWitt D., Ramakrishnan R. Workload Aware Anonymization // KDD Conference. 2006.

  115. Koudas N., Srivastava D., Yu T., Zhang Q. Aggregate Query Answering on Anonymized Tables // ICDE Conference. 2007.

  116. Malin B., Sweeney L. Re-identification of DNA through an automated linkage process // Proc. AMIA Symp. 2001. P. 423–427.

  117. Malin B. Why methods for genomic data privacy fail and what we can do to fix it // AAAS Annual Meeting, Seattle, WA. 2004.

  118. ARTICLE 29 DATA PROTECTION WORKING PARTY. Opinion 05/2014 on Anonymisation Techniques. Adopted on 10 April 2014.

  119. Sweeney L. Replacing Personally Identifiable Information in Medical Records, the Scrub System // Proc. AMIA Annu Fall Symp. 1996. P. 333–337.

  120. Sweeney L. Guaranteeing Anonymity while Sharing Data, the Datafly System // Proc. AMIA Annu Fall Symp. 1997. P. 51–55.

  121. Sweeney L. Privacy Technologies for Homeland Security // Testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security, Boston, MA, June 15. 2005.

  122. Malin B., Sweeney L. Detrmining the identifiability of DNA database entries // Proc. AMIA Symp. 2000. P. 537–541.

  123. Malin B. Protecting DNA Sequence Anonymity with Generalization Lattices // Methods of Information in Medicine. 2005. V. 44. № 5. P. 687–692.

  124. Hodson H. Revealed: Google AI has access to huge haul of NHS patient data // New Scientist, 29 Apr 2016.

  125. Cadwalladr C., Graham-Harrison E. Revealed: 50 million facebook profiles harvested for Cambridge Analytica in major data breach // The Guardian, 17 Mar 2018.

  126. Harmon A. Indian tribe wins fight to limit research of its DNA // New York Times. 2010, April, 22.

  127. Meyer M. Law, Ethics & Science of Re-identification Demonstrations // Bill of Health: Examining the Intersection of Health Law, Biotechnology and Bioethics, Petrie Flom Center at Harvard University. 2021.

  128. Ohm P. Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization // UCLA Law Review. 2010. V. 57. P. 1700–1777.

  129. de Montjoye Y.-A., Radaelli L., Singh V.K., Pentland A. Unique in the shopping mall: on the reidentifiability of credit card metadata // Science. 2015. V. 347. P. 536–539.

  130. Golle P. Revisiting the uniqueness of simple demographics in the U.S. population // Workshop on privacy in the electronic society, New York, Association for Computive Machinery. 2006.

  131. Rocher L., Hendrickx J.M., de Montjoye Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models // Nat. Commun.. 2019. V. 10. № 1 (3069).

  132. Culnane C., Rubinstein B.I.P., Teague V. Health data in an open world // Preprint at: https://arxiv.org/abs/ 1712.05627. 2017.

  133. Siddle J. I know where you were last summer: London’s public bike data is telling everyone where you’ve been // vartree.blogspot.com. 2014.

  134. Lavrenovs A., Podins K. Privacy violations in Riga open data public transport system // 2016 IEEE 4th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), Vilnius, Lithuania. 2016. P. 1–6.

  135. Narayanan A., Shmatikov V.  Robust De-anonymization of Large Sparse Datasets // IEEE Symposium on Security and Privacy. 2008. P. 111–125.

Дополнительные материалы отсутствуют.