Sharing big data safely : managing data security / Ted Dunning and Ellen Friedman
- Author:
- Dunning, Ted, 1956-
- Published:
- Sebastopol : O'Reilly Media, 2015.
- Edition:
- First edition.
- Physical Description:
- 1 online resource
- Additional Creators:
- Friedman, B. Ellen
Access Online
- Contents:
- Cover; Copyright; Table of Contents; Preface; ; Who Should Use This Book; Chapter 1. So Secure It's Lost; Safe Access in Secure Big Data Systems; Chapter 2. The Challenge: Sharing Data Safely; Surprising Outcomes with Anonymity; The Netflix Prize; Unexpected Results from the Netflix Contest; Implications of Breaking Anonymity; Be Alert to the Possibility of Cross-Reference Datasets; New York Taxicabs: Threats to Privacy; Sharing Data Safely; Chapter 3. Data on a Need-to-Know Basis; Views: A Secure Way to Limit What Is Seen; Why Limit Access?; Apache Drill Views for Granular Security., How Views WorkSummary of Need-to-Know Methods; Chapter 4. Fake Data Gives Real Answers; The Surprising Thing About Fake Data; Keep It Simple: log-synth; Log-synth Use Case 1: Broken Large-Scale Hive Query; Log-synth Use Case 2: Fraud Detection Model for Common Point of Compromise; What Thieves Do; Why Machine Learning Experts Were Consulted; Using log-synth to Generate Fake User Histories; Summary: Fake Data and log-synth to Safely Work with Secure Data; Chapter 5. Fixing a Broken Large-Scale Query; A Description of the Problem; Determining What the Synthetic Data Needed to Be., Schema for the Synthetic DataGenerating the Synthetic Data; Tips and Caveats; What to Do from Here?; Chapter 6. Fraud Detection; What Is Really Important?; The User Model; Sampler for the Common Point of Compromise; How the Breach Model Works; Results of the Entire System Together; Handy Tricks; Summary; Chapter 7. A Detailed Look at log-synth; Goals; Maintaining Simplicity: The Role of JSON in log-synth; Structure; Sampling Complex Values; Structuring and De-structuring Samplers; Extending log-synth; Using log-synth with Apache Drill; Choice of Data Generators; R is for Random., and Benchmark SystemsProbabilistic Programming; Differential Privacy Preserving Systems; Future Directions for log-synth; Chapter 8. Sharing Data Safely: Practical Lessons; Appendix A. Additional Resources; Log-synth Open Source Software; Apache Drill and Drill SQL Views; General Resources and References; Cheapside Hoard and Treasures; Codes and Cipher; Netflix Prize; Problems with Data Sharing; Additional O'Reilly Books by Dunning and Friedman; About the Authors; Strata+Hadoop World.
- Summary:
- "Many big data-driven companies today are moving to protect certain types of data against intrusion, leaks, or unauthorized eyes. But how do you lock down data while granting access to people who need to see it? In this practical book, authors Ted Dunning and Ellen Friedman offer two novel and practical solutions that you can implement right away. Ideal for both technical and non-technical decision makers, group leaders, developers, and data scientists, this book shows you how to: share original data in a controlled way so that different groups within your organization only see part of the whole. Youll learn how to do this with the new open source SQL query engine Apache Drill; provide synthetic data that emulates the behavior of sensitive data. This approach enables external advisors to work with you on projects involving data that you can't show them"--Back cover.
- Subject(s):
- Genre(s):
- ISBN:
- 9781491953648 electronic bk.
1491953640 electronic bk.
9781491953631 electronic bk.
1491953632 electronic bk.
9781491952122
1491952121 - Bibliography Note:
- Includes bibliographical references.
View MARC record | catkey: 37439589