Easy1 markMultiple Choice
Subtask 1.2: Technical RequirementsData & AnalyticsDataprocHadoopMigration

GCP PCA · Question 11 · Technical Requirements

CASE STUDY: HealthData Inc

Overview:
Industry: Healthcare Analytics
Size: 1000 employees

Environment:

  • Co-located data center
  • Hadoop cluster
  • SFTP servers
  • 50 TB patient data

Requirements:

  • ML models for diagnostics
  • Secure data sharing portals
  • Break data silos

Exec Statements:

  • CEO: Need compute for ML.
  • CRO: HIPAA compliance is top priority.
  • CTO: Managed services needed to replace Hadoop.

Tech Reqs:

  • Strict HIPAA compliance
  • Automated PHI de-identification
  • Comprehensive audit logging
  • CMEK
  • Network isolation (no public internet)

Constraints:

  • US data sovereignty
  • 7-year retention (immutable)
  • Easy auditor access

QUESTION: To replace the on-premises Hadoop cluster with a managed service while minimizing migration effort, which GCP service should you recommend?

Answer options:

A.

Cloud Dataflow

B.

Cloud Dataproc

C.

BigQuery

D.

Compute Engine with custom Hadoop installation

How to approach this question

Identify the GCP managed service that runs Apache Hadoop and Spark ecosystems.

Full Answer

B.Cloud Dataproc✓ Correct
Cloud Dataproc
Cloud Dataproc is Google's fully managed Apache Spark and Hadoop service. It allows organizations to migrate existing Hadoop workloads to the cloud quickly without rewriting them, while Google handles cluster provisioning, management, and scaling.

Common mistakes

Choosing Dataflow (A) because it's 'modern', ignoring the constraint to minimize migration effort for existing Hadoop jobs.

Practice the full GCP Professional Cloud Architect Practice Exam 6

50 questions · hints · full answers · grading

More questions from this exam