Cloudera Certified Developer for Apache Hadoop Sample Questions:
1. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
A) Pig
B) Sqoop
C) Oozie
D) Hive
E) Flume
F) Hue
G) Hadoop Streaming
2. You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?
A) Pig
B) Sqoop
C) Oozie
D) Hive
E) Flume
F) Hue
G) HBase
3. Your cluster has 10 DataNodes, each with a single 1 TB hard drive. You utilize all your disk capacity for HDFS, reserving none for MapReduce. You implement default replication settings. What is the storage capacity of your Hadoop cluster (assuming no compression)?
A) about 10 TB
B) about 3 TB
C) about 11 TB
D) about 5 TB
4. In a large MapReduce job with m mappers and r reducers, how many distinct copy operations will there be in the sort/shuffle phase?
A) r
B) mxr (i.e., m multiplied by r)
C) m+r (i.e., m plus r)
D) m
E) mr (i.e., m to the power of r)
5. You need to create a GUI application to help your company's sales people add and edit customer information. Would HDFS be appropriate for this customer information file?
A) No, because HDFS is optimized for write-once, streaming access for relatively large files.
B) Yes, because HDFS is optimized for fast retrieval of relatively small amounts of data.
C) Yes, because HDFS is optimized for random access writes.
D) No, because HDFS can only be accessed by MapReduce applications.
Solutions:
| Question # 1 Answer: D | Question # 2 Answer: G | Question # 3 Answer: B | Question # 4 Answer: B | Question # 5 Answer: A |

We're so confident of our products that we provide no hassle product exchange.


By Abner


