This question is part of a case study — click to read the full scenario(Case 11)
CASE STUDY: HealthData Inc
Overview:
Industry: Healthcare Analytics
Size: 1000 employees
Environment:
- Co-located data center
- Hadoop cluster
- SFTP servers
- 50 TB patient data
Requirements:
- ML models for diagnostics
- Secure data sharing portals
- Break data silos
Exec Statements:
- CEO: Need compute for ML.
- CRO: HIPAA compliance is top priority.
- CTO: Managed services needed to replace Hadoop.
Tech Reqs:
- Strict HIPAA compliance
- Automated PHI de-identification
- Comprehensive audit logging
- CMEK
- Network isolation (no public internet)
Constraints:
- US data sovereignty
- 7-year retention (immutable)
- Easy auditor access
QUESTION: To replace the on-premises Hadoop cluster with a managed service while minimizing migration effort, which GCP service should you recommend?
GCP PCA · Question 14 · Security Design
CASE STUDY: HealthData Inc
Overview:
Industry: Healthcare Analytics
Size: 1000 employees
Environment:
- Co-located data center
- Hadoop cluster
- SFTP servers
- 50 TB patient data
Requirements:
- ML models for diagnostics
- Secure data sharing portals
- Break data silos
Exec Statements:
- CEO: Need compute for ML.
- CRO: HIPAA compliance is top priority.
- CTO: Managed services needed to replace Hadoop.
Tech Reqs:
- Strict HIPAA compliance
- Automated PHI de-identification
- Comprehensive audit logging
- CMEK
- Network isolation (no public internet)
Constraints:
- US data sovereignty
- 7-year retention (immutable)
- Easy auditor access
QUESTION: How should you design the architecture to automate the de-identification of Protected Health Information (PHI) as data is ingested?
CASE STUDY: HealthData Inc
Overview:
Industry: Healthcare Analytics
Size: 1000 employees
Environment:
- Co-located data center
- Hadoop cluster
- SFTP servers
- 50 TB patient data
Requirements:
- ML models for diagnostics
- Secure data sharing portals
- Break data silos
Exec Statements:
- CEO: Need compute for ML.
- CRO: HIPAA compliance is top priority.
- CTO: Managed services needed to replace Hadoop.
Tech Reqs:
- Strict HIPAA compliance
- Automated PHI de-identification
- Comprehensive audit logging
- CMEK
- Network isolation (no public internet)
Constraints:
- US data sovereignty
- 7-year retention (immutable)
- Easy auditor access
QUESTION: How should you design the architecture to automate the de-identification of Protected Health Information (PHI) as data is ingested?
Answer options:
Use Cloud Storage triggers to invoke a Cloud Function that calls the Cloud DLP API to de-identify data before moving it to BigQuery.
Write a custom Python script on a Compute Engine VM that uses regex to find and replace PHI.
Use BigQuery authorized views to hide columns containing PHI from analysts.
Enable default encryption at rest (Google-managed keys) on the BigQuery dataset.
How to approach this question
Full Answer
Common mistakes
Practice the full GCP Professional Cloud Architect Practice Exam 6
50 questions · hints · full answers · grading
Expert