Diversified Data-Engineer-Associate Study Materials

Detail	Info
Time	180 minutes (3 hours)
Passing Score	700/1000 (around 70%)
Cost	$150 (may vary by location)
Format	65 questions—multiple-choice or multi-answer, online or in-person

Area	Weight	What It Covers
Data Pipelines	30-35%	Moving and transforming data with AWS
Data Stores	25-30%	Storing and retrieving data efficiently
Security & Rules	20-25%	Protecting data and meeting regulations
Monitoring & Fixes	15-20%	Watching for issues and fixing them

ROLE	YEARLY PAY (2025 EST.)
DATA ENGINEER	$100,000–$130,000
AWS DATA SPECIALIST	$95,000–$125,000
DATA SECURITY PRO	$90,000–$120,000

Answer: B,D

Explanation: AWS Lake Formation is a service that helps you build, secure, and manage

data lakes on Amazon S3. You can use AWS Lake Formation to register the S3 path as a

data lake location, and enable fine-grained access control to limit access to the records

based on the HR department’s Region. You can use data filters to specify which S3

prefixes or partitions each HR department can access, and grant permissions to the IAM

roles of the HR departments accordingly. This solution will meet the requirement with the

least operational overhead, as it simplifies the data lake management and security, and

leverages the existing IAM roles of the HR departments12.

The other options are not optimal for the following reasons:

A. Use data filters for each Region to register the S3 paths as data locations. This

option is not possible, as data filters are not used to register S3 paths as data

locations, but to grant permissions to access specific S3 prefixes or partitions

within a data location. Moreover, this option does not specify how to limit access to

the records based on the HR department’s Region.

C. Modify the IAM roles of the HR departments to add a data filter for each

department’s Region. This option is not possible, as data filters are not added to

IAM roles, but to permissions granted by AWS Lake Formation. Moreover, this

option does not specify how to register the S3 path as a data lake location, or how

to enable fine-grained access control in AWS Lake Formation.

E. Create a separate S3 bucket for each Region. Configure an IAM policy to allow

S3 access. Restrict access based on Region. This option is not recommended, as

it would require more operational overhead to create and manage multiple S3

buckets, and to configure and maintain IAM policies for each HR department.

Moreover, this option does not leverage the benefits of AWS Lake Formation, such

as data cataloging, data transformation, and data governance.

References:

1: AWS Lake Formation

2: AWS Lake Formation Permissions

: AWS Identity and Access Management

: Amazon S3

Answer: B

Explanation: The streaming ingestion feature of Amazon Redshift enables you to ingest

data from streaming sources, such as Amazon Kinesis Data Streams, into Amazon

Redshift tables in near real-time. You can use the streaming ingestion feature to process

the streaming data from the wearable devices, hospital equipment, and patient records.

The streaming ingestion feature also supports incremental updates, which means you can

append new data or update existing data in the Amazon Redshift tables. This way, you can

store the data in an Amazon Redshift Serverless warehouse and support near real-time

analytics of the streaming data and the previous day’s data. This solution meets the

requirements with the least operational overhead, as it does not require any additional

services or components to ingest and process the streaming data. The other options are

either not feasible or not optimal. Loading data into Amazon Kinesis Data Firehose and

then into Amazon Redshift (option A) would introduce additional latency and cost, as well

as require additional configuration and management. Loading data into Amazon S3 and

then using the COPY command to load the data into Amazon Redshift (option C) would

also introduce additional latency and cost, as well as require additional storage space and

ETL logic. Using the Amazon Aurora zero-ETL integration with Amazon Redshift (option D)

would not work, as it requires the data to be stored in Amazon Aurora first, which is not the

case for the streaming data from the healthcare company. References:

Using streaming ingestion with Amazon Redshift

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,

Chapter 3: Data Ingestion and Transformation, Section 3.5: Amazon Redshift

Streaming Ingestion

Answer: B

Explanation: AWS Glue is a fully managed serverless ETL service that can handle data

deduplication with minimal operational overhead. AWS Glue provides a built-in ML

transform called FindMatches, which can automatically identify and group similar records in

a dataset. FindMatches can also generate a primary key for each group of records and

remove duplicates. FindMatches does not require any coding or prior ML experience, as it

can learn from a sample of labeled data provided by the user. FindMatches can also scale

to handle large datasets and optimize the cost and performance of the ETL job.

References:

AWS Glue

FindMatches ML Transform

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

Answer: D

Explanation: Option D is the best solution to meet the requirements with the least

operational overhead because AWS Lake Formation is a fully managed service that

simplifies the process of building, securing, and managing data lakes. AWS Lake Formation allows you to define granular data access policies at the row and column level

for different users and groups. AWS Lake Formation also integrates with Amazon Athena,

Amazon Redshift Spectrum, and Apache Hive on Amazon EMR, enabling these services to

access the data in the data lake through AWS Lake Formation.

Option A is not a good solution because S3 access policies cannot restrict data access by

rows and columns. S3 access policies are based on the identity and permissions of the

requester, the bucket and object ownership, and the object prefix and tags. S3 access

policies cannot enforce fine-grained data access control at the row and column level.

Option B is not a good solution because it involves using Apache Ranger and Apache Pig,

which are not fully managed services and require additional configuration and

maintenance. Apache Ranger is a framework that provides centralized security

administration for data stored in Hadoop clusters, such as Amazon EMR. Apache Ranger

can enforce row-level and column-level access policies for Apache Hive tables. However,

Apache Ranger is not a native AWS service and requires manual installation and

configuration on Amazon EMR clusters. Apache Pig is a platform that allows you to analyze

large data sets using a high-level scripting language called Pig Latin. Apache Pig can

access data stored in Amazon S3 and process it using Apache Hive. However,Apache Pig

is not a native AWS service and requires manual installation and configuration on Amazon

EMR clusters.

Option C is not a good solution because Amazon Redshift is not a suitable service for data

lake storage. Amazon Redshift is a fully managed data warehouse service that allows you

to run complex analytical queries using standard SQL. Amazon Redshift can enforce rowlevel

and column-level access policies for different users and groups. However, Amazon

Redshift is not designed to store and process large volumes of unstructured or semistructured

data, which are typical characteristics of data lakes. Amazon Redshift is also

more expensive and less scalable than Amazon S3 for data lake storage.

References:

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

What Is AWS Lake Formation? - AWS Lake Formation

Using AWS Lake Formation with Amazon Athena - AWS Lake Formation

Using AWS Lake Formation with Amazon Redshift Spectrum - AWS Lake

Formation

Using AWS Lake Formation with Apache Hive on Amazon EMR - AWS Lake

Formation

Using Bucket Policies and User Policies - Amazon Simple Storage Service

Apache Ranger

Apache Pig

What Is Amazon Redshift? - Amazon Redshift

Answer: B

Explanation: Changing the distribution key to the table column that has the largest

dimension will help to balance the load more evenly across all five compute nodes. The

distribution key determines how the rows of a table are distributed among the slices of the

cluster. If the distribution key is not chosen wisely, it can cause data skew, meaning some

slices will have more data than others, resulting in uneven CPU load and query

performance. By choosing the table column that has the largest dimension, meaning the

column that has the most distinct values, as the distribution key, the data engineer can

ensure that the rows are distributed more uniformly across the slices, reducing data skew

and improving query performance.

The other options are not solutions that will meet the requirements. Option A, changing the

sort key to be the data column that is most often used in a WHERE clause of the SQL

SELECT statement, will not affect the data distribution or the CPU load. The sort key

determines the order in which the rows of a table are stored on disk, which can improve the

performance of range-restricted queries, but not the load balancing. Option C, upgrading

the reserved node from ra3.4xlarge to ra3.16xlarge, will not maintain the current number of

compute nodes, as it will increase the cost and the capacity of the cluster. Option D,

changing the primary key to be the data column that is most often used in a WHERE

clause of the SQL SELECT statement, will not affect the data distribution or the CPU load

either. The primary key is a constraint that enforces the uniqueness of the rows in a table,

but it does not influence the data layout or the query optimization. References:

Choosing a data distribution style

Choosing a data sort key

Working with primary keys

Answer: C

Explanation: Amazon EC2 instances can use two types of storage volumes: instance

store volumes and Amazon EBS volumes. Instance store volumes are ephemeral, meaning

they are only attached to the instance for the duration of its life cycle. If the instance is

stopped, terminated, or fails, the data on the instance store volume is lost. Amazon EBS

volumes are persistent, meaning they can be detached from the instance and attached to

another instance, and the data on the volume is preserved. To meet the requirement of

persisting the data even if the EC2 instances are terminated, the data engineer must use

Amazon EBS volumes to store the application data. The solution is to launch new EC2

instances by using an AMI that is backed by an EC2 instance store volume, which is the

default option for most AMIs. Then, the data engineer must attach an Amazon EBS volume

to each instance and configure the application to write the data to the EBS volume. This

way, the data will be saved on the EBS volume and can be accessed by another instance if

needed. The data engineer can apply the default settings to the EC2 instances, as there is

no need to modify the instance type, security group, or IAM role for this solution. The other

options are either not feasible or not optimal. Launching new EC2 instances by using an

AMI that is backed by an EC2 instance store volume that contains the application data

(option A) or by using an AMI that is backed by a root Amazon EBS volume that contains

the application data (option B) would not work, as the data on the AMI would be outdated

and overwritten by the new instances. Attaching an additional EC2 instance store volume

to contain the application data (option D)would not work, as the data on the instance store

volume would be lost if the instance is terminated. References:

Amazon EC2 Instance Store

Amazon EBS Volumes

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,

Chapter 2: Data Store Management, Section 2.1: Amazon EC2

Answer: D

Explanation: Amazon Athena is a serverless interactive query service that allows you to

analyze data in Amazon S3 using standard SQL. Athena supports various data formats,

such as CSV,JSON, ORC, Avro, and Parquet. However, not all data formats are equally

efficient for querying. Some data formats, such as CSV and JSON, are row-oriented,

meaning that they store data as a sequence of records, each with the same fields. Roworiented

formats are suitable for loading and exporting data, but they are not optimal for

analytical queries that often access only a subset of columns. Row-oriented formats also

do not support compression or encoding techniques that can reduce the data size and

improve the query performance.

On the other hand, some data formats, such as ORC and Parquet, are column-oriented,

meaning that they store data as a collection of columns, each with a specific data type.

Column-oriented formats are ideal for analytical queries that often filter, aggregate, or join

data by columns. Column-oriented formats also support compression and encoding

techniques that can reduce the data size and improve the query performance. For

example, Parquet supports dictionary encoding, which replaces repeated values with

numeric codes, and run-length encoding, which replaces consecutive identical values with

a single value and a count. Parquet also supports various compression algorithms, such as

Snappy, GZIP, and ZSTD, that can further reduce the data size and improve the query

performance.

Therefore, creating an AWS Glue extract, transform, and load (ETL) job to read from the

.csv structured data source and writing the data into the data lake in Apache Parquet

format will meet the requirements most cost-effectively. AWS Glue is a fully managed service that provides a serverless data integration platform for data preparation, data

cataloging, and data loading. AWS Glue ETL jobs allow you to transform and load data

from various sources into various targets, using either a graphical interface (AWS Glue

Studio) or a code-based interface (AWS Glue console or AWS Glue API). By using AWS

Glue ETL jobs, you can easily convert the data from CSV to Parquet format, without having

to write or manage any code. Parquet is a column-oriented format that allows Athena to

scan only the relevant columns and skip the rest, reducing the amount of data read from

S3. This solution will also reduce the cost of Athena queries, as Athena charges based on

the amount of data scanned from S3.

The other options are not as cost-effective as creating an AWS Glue ETL job to write the

data into the data lake in Parquet format. Using an AWS Glue PySpark job to ingest the

source data into the data lake in .csv format will not improve the query performance or

reduce the query cost, as .csv is a row-oriented format that does not support columnar

access or compression. Creating an AWS Glue ETL job to ingest the data into the data

lake in JSON format will not improve the query performance or reduce the query cost, as

JSON is also a row-oriented format that does not support columnar access or compression.

Using an AWS Glue PySpark job to ingest the source data into the data lake in Apache

Avro format will improve the query performance, as Avro is a column-oriented format that

supports compression and encoding, but it will require more operational effort, as you will

need to write and maintain PySpark code to convert the data from CSV to Avro format.

References:

Amazon Athena

Choosing the Right Data Format

AWS Glue

[AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide],

Chapter 5: Data Analysis and Visualization, Section 5.1: Amazon Athena

Answer: B

Explanation: Amazon Redshift Serverless is a new feature of Amazon Redshift that

enables you to run SQL queries on data in Amazon S3 without provisioning or managing

any clusters. You can use Amazon Redshift Serverless to automatically process the

analytics workload, as it scales up and down the compute resources based on the query

demand, and charges you only for the resources consumed. This solution will meet the

requirements with the least operational overhead, as it does not require the data engineer

to create, delete, pause, or resume any Redshift clusters, or to manage any infrastructure

manually. You can use the Amazon Redshift Data API to run queries from the AWS CLI,

AWS SDK, or AWS Lambda functions12.

The other options are not optimal for the following reasons:

A. Use Amazon Step Functions to pause the Redshift cluster when the analytics

processes are complete and to resume the cluster to run new processes every

month. This option is not recommended, as it would still require the data engineer

to create and delete a new Redshift provisioned cluster every month, which can

incur additional costs and time. Moreover, this option would require the data

engineer to use Amazon Step Functions to orchestrate the workflow of pausing

and resuming the cluster, which can add complexity and overhead.

C. Use the AWS CLI to automatically process the analytics workload. This option

is vague and does not specify how the AWS CLI is used to process the analytics

workload. The AWS CLI can be used to run queries on data in Amazon S3 using

Amazon Redshift Serverless, Amazon Athena, or Amazon EMR, but each of these

services has different features and benefits. Moreover, this option does not

address the requirement of not managing the infrastructure manually, as the data

engineer may still need to provision and configure some resources, such as

Amazon EMR clusters or Amazon Athena workgroups.

D. Use AWS CloudFormation templates to automatically process the analytics

workload. This option is also vague and does not specify how AWS

CloudFormation templates are used to process the analytics workload. AWS

CloudFormation is a service that lets you model and provision AWS resources

using templates. You can use AWS CloudFormation templates to create and

delete a Redshift provisioned cluster every month, or to create and configure other

AWS resources, such as Amazon EMR, Amazon Athena, or Amazon Redshift

Serverless. However, this option does not address the requirement of not

managing the infrastructure manually, as the data engineer may still need to write

and maintain the AWS CloudFormation templates, and to monitor the status and

performance of the resources.

References:

1: Amazon Redshift Serverless

2: Amazon Redshift Data API

: Amazon Step Functions

: AWS CLI

: AWS CloudFormation

Answer: B

Explanation: The best solution to cost optimize the company’s use of Amazon Athena

without adding any additional infrastructure costs is to use the query result reuse feature of

AmazonAthena for the SQL queries. This feature allows you to run the same query multiple

times without incurring additional charges, as long as the underlying data has not changed

and the query results are still in the query result location in Amazon S31. This feature is

useful for scenarios where you have a petabyte-scale dataset that is updated infrequently,

such as once a day, and you have a BI application that runs the same queries repeatedly,

such as every hour. By using the query result reuse feature, you can reduce the amount of

data scanned by your queries and save on the cost of running Athena. You can enable or

disable this feature at the workgroup level or at the individual query level1.

Option A is not the best solution, as configuring an Amazon S3 Lifecycle policy to move

data to the S3 Glacier Deep Archive storage class after 1 day would not cost optimize the

company’s use of Amazon Athena, but rather increase the cost and complexity. Amazon

S3 Lifecycle policies are rules that you can define to automatically transition objects

between different storage classes based on specified criteria, such as the age of the

object2. S3 Glacier Deep Archive is the lowest-cost storage class in Amazon S3, designed

for long-term data archiving that is accessed once or twice in a year3. While moving data to

S3 Glacier Deep Archive can reduce the storage cost, it would also increase the retrieval

cost and latency, as it takes up to 12 hours to restore the data from S3 Glacier Deep

Archive3. Moreover, Athena does not support querying data that is in S3 Glacier or S3

Glacier Deep Archive storage classes4. Therefore, using this option would not meet the

requirements of running on-demand SQL queries on the dataset.

Option C is not the best solution, as adding an Amazon ElastiCache cluster between the BI

application and Athena would not cost optimize the company’s use of Amazon Athena, but

rather increase the cost and complexity. Amazon ElastiCache is a service that offers fully

managed in-memory data stores, such as Redis and Memcached, that can improve the

performance and scalability of web applications by caching frequently accessed data.

While using ElastiCache can reduce the latency and load on the BI application, it would not

reduce the amount of data scanned by Athena, which is the main factor that determines the

cost of running Athena. Moreover, using ElastiCache would introduce additional infrastructure costs and operational overhead, as you would have to provision, manage,

and scale the ElastiCache cluster, and integrate it with the BI application and Athena.

Option D is not the best solution, as changing the format of the files that are in the dataset

to Apache Parquet would not cost optimize the company’s use of Amazon Athena without

adding any additional infrastructure costs, but rather increase the complexity. Apache

Parquet is a columnar storage format that can improve the performance of analytical

queries by reducing the amount of data that needs to be scanned and providing efficient

compression and encoding schemes. However,changing the format of the files that are in

the dataset to Apache Parquet would require additional processing and transformation

steps, such as using AWS Glue or Amazon EMR to convert the files from their original

format to Parquet, and storing the converted files in a separate location in Amazon S3. This

would increase the complexity and the operational overhead of the data pipeline, and also

incur additional costs for using AWS Glue or Amazon EMR. References:

Query result reuse

Amazon S3 Lifecycle

S3 Glacier Deep Archive

Storage classes supported by Athena

[What is Amazon ElastiCache?]

[Amazon Athena pricing]

[Columnar Storage Formats]

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

Answer: B

Explanation: Concurrency scaling is a feature that allows you to support thousands of

concurrent users and queries, with consistently fast query performance. When you turn on

concurrency scaling, Amazon Redshift automatically adds query processing power in

seconds to process queries without any delays. You can manage which queries are sent to

the concurrency-scaling cluster by configuring WLM queues. To turn on concurrency

scaling for a queue, set the Concurrency Scaling mode value to auto. The other options are

either incorrect or irrelevant, as they do not enable concurrency scaling for the existing

Redshift cluster on RA3 nodes. References:

Working with concurrency scaling - Amazon Redshift

Amazon Redshift Concurrency Scaling - Amazon Web Services

Configuring concurrency scaling queues - Amazon Redshift

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

(Chapter 6, page 163)c

Answer: D

Explanation: Amazon Kinesis Data Streams is a service that enables you to collect,

process, and analyze real-time streaming data. You can use Kinesis Data Streams to

ingest data from various sources, such as Amazon CloudWatch Logs, and deliver it to

different destinations, such as Amazon S3 or Amazon Redshift. To use Kinesis Data

Streams to deliver the security logs from the production AWS account to the security AWS

account, you need to create a destination data stream in the security AWS account. This

data stream will receive the log data from the CloudWatch Logs service in the production

AWS account. To enable this cross-account data delivery, you need to create an IAM role

and a trust policy in the security AWS account. The IAM role defines the permissions that

the CloudWatch Logs service needs to put data into the destination data stream. The trust

policy allows the production AWS account to assume the IAM role. Finally, you need to

create a subscription filter in the production AWS account. A subscription filter defines the

pattern to match log events and the destination to send the matching events. In this case,

the destination is the destination data stream in the security AWS account. This solution

meets the requirements of using Kinesis Data Streams to deliver the security logs to the

security AWS account. The other options are either not possible or not optimal. You cannot

create a destination data stream in the production AWS account, as this would not deliver

the data to the security AWS account. You cannot create a subscription filter in the security

AWS account, as this would not capture the log events from the production AWS account.

References:

Using Amazon Kinesis Data Streams with Amazon CloudWatch Logs

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide,

Chapter 3: Data Ingestion and Transformation, Section 3.3: Amazon Kinesis Data

Streams

Answer: A

Explanation: AWS Glue is a fully managed serverless ETL service that can handle

petabytes of data in seconds. AWS Glue can run Apache Spark and Apache Flink jobs

without requiring any infrastructure provisioning or management. AWS Glue can also

integrate with Apache Pig, Apache Oozie, and Apache Hbase using AWS Glue Data

Catalog and AWS Glue workflows. AWS Glue can reduce the overall operational overhead

by automating the data discovery, data preparation, and data loading processes. AWS

Glue can also optimize the cost and performance of ETL jobs by using AWS Glue Job

Bookmarking, AWS Glue Crawlers, and AWS Glue Schema Registry. References:

AWS Glue

AWS Glue Data Catalog

AWS Glue Workflows

[AWS Glue Job Bookmarking]

[AWS Glue Crawlers]

[AWS Glue Schema Registry]

[AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide]

Answer: B

Explanation: The STL ALERT EVENT LOG table view records anomalies when the query

optimizer identifies conditions that might indicate performance issues. These conditions

include skewed data distribution, missing statistics, nested loop joins, and broadcasted

data. The STL ALERT EVENT LOG table view can help the data engineer to identify and

troubleshoot the root causes of performance issues and optimize the query execution plan.

The other table views are not relevant for this requirement. STL USAGE CONTROL

records the usage limits and quotas for Amazon Redshift resources. STL QUERY

METRICS records the execution time and resource consumption of queries. STL PLAN

INFO records the query execution plan and the steps involved in each query. References:

STL ALERT EVENT LOG

System Tables and Views

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

Answer: A

Explanation: AWS Data Exchange is a service that makes it easy to find, subscribe to,

and use third-party data in the cloud. It provides a secure and reliable way to access and

integrate data from various sources, such as data providers, public datasets, or AWS

services. Using AWS Data Exchange, you can browse and subscribe to data products that

suit your needs, and then use API calls or the AWS Management Console to export the

data to Amazon S3, where you can use it with your existing analytics platform. This solution

minimizes the effort and time required to incorporate third-party datasets, as you do not

need to set up and manage data pipelines, storage, or access controls. You also benefit

from the data quality and freshness provided by the data providers, who can update their

data products as frequently as needed12.

The other options are not optimal for the following reasons:

B. Use API calls to access and integrate third-party datasets from AWS. This

option is vague and does not specify which AWS service or feature is used to

access and integrate third-party datasets. AWS offers a variety of services and

features that can help with data ingestion, processing, and analysis, but not all of

them are suitable for the given scenario. For example, AWS Glue is a serverless

data integration service that can help you discover, prepare, and combine data

from various sources, but it requires you to create and run data extraction,

transformation, and loading (ETL) jobs, which can add operational overhead3.

C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets

from AWS CodeCommit repositories. This option is not feasible, as AWS

CodeCommit is a source control service that hosts secure Git-based repositories,

not a data source that can be accessed by Amazon Kinesis Data Streams.

Amazon Kinesis Data Streams is a service that enables you to capture, process,

and analyze data streams in real time, suchas clickstream data, application logs,

or IoT telemetry. It does not support accessing and integrating data from AWS

CodeCommit repositories, which are meant for storing and managing code, not

data .

D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets

from Amazon Elastic Container Registry (Amazon ECR). This option is also not

feasible, as Amazon ECR is a fully managed container registry service that stores,

manages, and deploys container images, not a data source that can be accessed

by Amazon Kinesis Data Streams. Amazon Kinesis Data Streams does not

support accessing and integrating data from Amazon ECR, which is meant for

storing and managing container images, not data .

References: 1: AWS Data Exchange User Guide

2: AWS Data Exchange FAQs

3: AWS Glue Developer Guide

: AWS CodeCommit User Guide

: Amazon Kinesis Data Streams Developer Guide

: Amazon Elastic Container Registry User Guide

: Build a Continuous Delivery Pipeline for Your Container Images with Amazon

ECR as Source

Answer: B

Explanation: AWS Database Migration Service (AWS DMS) is a cloud service that makes

it possible to migrate relational databases, data warehouses, NoSQL databases, and other

types of data stores to AWS quickly, securely, and with minimal downtime and zero data loss1. AWS DMS supports migration between 20-plus database and analytics engines,

such as Microsoft SQL Server to Amazon RDS for SQL Server2. AWS DMS takes

overmany of the difficult or tedious tasks involved in a migration project, such as capacity

analysis, hardware and software procurement, installation and administration, testing and

debugging, and ongoing replication and monitoring1. AWS DMS is a cost-effective solution,

as you only pay for the compute resources and additional log storage used during the

migration process2. AWS DMS is the best solution for the company to migrate the financial

transaction data from the on-premises Microsoft SQL Server database to AWS, as it meets

the requirements of minimal downtime, zero data loss, and low cost.

Option A is not the best solution, as AWS Lambda is a serverless compute service that lets

you run code without provisioning or managing servers, but it does not provide any built-in

features for database migration. You would have to write your own code to extract,

transform, and load the data from the source to the target, which would increase the

operational overhead and complexity.

Option C is not the best solution, as AWS Direct Connect is a service that establishes a

dedicated network connection from your premises to AWS, but it does not provide any

built-in features for database migration. You would still need to use another service or tool

to perform the actual data transfer, which would increase the cost and complexity.

Option D is not the best solution, as AWS DataSync is a service that makes it easy to

transfer data between on-premises storage systems and AWS storage services, such as

Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, but it does not

support Amazon RDS for SQL Server as a target. You would have to use another service

or tool to migrate the data from Amazon S3 to Amazon RDS for SQL Server, which would

increase the latency and complexity. References:

Database Migration - AWS Database Migration Service - AWS

What is AWS Database Migration Service?

AWS Database Migration Service Documentation

AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide

Amazon Data-Engineer-Associate Exam Dumps - Latest AWS Certified Data Engineer - Associate (DEA-C01) Practice Test

PDF + Test Engine

Total question : 152

Test Engine Only

DEMO

PDF Only

Total question : 152

AWS Certified Data Engineer - Associate (DEA-C01) This Week Result

126+

Customers Passed

They can't be wrong

95%

Average Score

Score in Real Exam at Testing Centre

92%

Exact Questions

Questions came word by word from this dumps

Pass the Data Engineer Associate Exam with Confidence Using Our DEA-C01 Dumps

What’s the DEA-C01 Exam Like? Time, Cost, and Key Details

Points to remember:

Why Data Engineer Associate Certification Counts

What’s on the DEA-C01 Exam? A Clear Look

How Our DEA-C01 Dumps Set You Up for Success

Jobs and Pay: What’s Waiting After You Pass

Related Exam

Data-Engineer-Associate

Amazon Data-Engineer-Associate Sample Question Answers

Question # 1

Question # 2

Question # 3

Question # 4

Question # 5

Question # 6

Question # 7

Question # 8

Question # 9

Question # 10

Question # 11

Question # 12

Question # 13

Question # 14

Question # 15

FREQUENTLY ASKED QUESTIONS

1. What practical experience do I need for the Data Engineer Associate exam?

2. How do I start a career in data engineering with this certification?

3. Which cloud platforms should I focus on for the DEA-C01 exam?

4. How can I gain hands-on experience while preparing for the exam?

5. What are the most common use cases for data engineering in real-world scenarios?

6. How do I stay updated with the latest trends and technologies in data engineering?

7. What are the career growth opportunities after earning this certification?

What our clients say about Data-Engineer-Associate Practice Test

Rate Your Experience

Top Selling