1) What is  meant by Hadoop?

Hadoop  is written in Java ,it is a distributed in computing platform . It consists of the following  features like Google File System and MapReduce.

2) Describe the platform and Java version ,which is  required  to run the Hadoop?

Java( 1.6.x )or any advanced  versions of java are good for Hadoop work ,   Linux and Windows are the preffered operating system for Hadoop environment , but Mac OS/X,  BSD, and Solaris are more famous for working.

3) What are Hardware specifications for Hadoop?

Hadoop  runs on both dual processor/ dual core machines along  with  4-8 GB RAM using ECC memory. It depends on the workflow designs

4)Describe most common input formats defined in Hadoop?

 The common input formats  in Hadoop 

  1.  TextInputFormat
  2. KeyValueInputFormat
  3. SequenceFileInputFormat

TextInputFormat is  default input format.

5) How do you categorize a big data?

Big data is  categorized  based on the  features like:

  • Volume
  • Velocity
  • Variety

10)Name the command used for  retrieval of  status of daemons running the Hadoop cluster?

The  command ‘jps’ used for  retrieval of  status of the daemons running  Hadoop cluster.

11) What is InputSplit ? Explain.

While running Hadoop job,  splits its input files in to chunks and will assign each split to mapper for the processing. which is also called the InputSplit.


12) Explain textInputFormat?

The text file is a textInputFormat   which is a record. Value obtained is the content of line while Key is the byte offset of the line. For example  Key: longWritable, Value: text

13) What is meant by  SequenceFileInputFormat in Hadoop?

SequenceFileInputFormat  in Hadoop is used to read files in sequence. which is the  compressed binary file format which passes the data between the output of one Map Reduce job to the input of some another Map Reduce the job.

14) How many InputSplits can be  made by a Hadoop Framework?

Hadoop makes total 5 splits :

  • One split for 64K files
  • Two splits for 65MB files, and
  • Two splits for 127MB files

15) Describe the use of RecordReader in Hadoop?

InputSplit  assigned with  work but doesn’t know how to access . The record holder class is totally responsible for performing loading the data from its source and convert it to keys pair suitable for reading by  Mapper. 

16)Describe JobTracker in Hadoop?

The service JobTracker  is with in the Hadoop which runs the MapReduce jobs on cluster.

17) Explain WebDAV in Hadoop?

WebDAV is set of extension to HTTP which used to support editing and uploading the files. In most of the operating system WebDAV shares can be  mounted as filesystems, so it is always  possible to access HDFS as a standard filesystem by exposing the HDFS over  WebDAV.

18) what is  Sqoop in Hadoop?

Sqoop is used to transfer data between  Hadoop HDFS and  Relational Database Management System . Using Sqoop you  can transfer data from RDBMS like Oracle/MySQL into HDFS as well exporting data from HDFS file to RDBMS.

19)List functionalities of JobTracker?

This  are the main tasks of JobTracker:

  •  accepting  jobs from the client.
  • communicating  with the NameNode to determine the location of the data.
  • To locate TaskTracker Nodes with free slots.
  • To submit  work to the chosen TaskTracker node and monitor  progress of each task.

20) Use of  TaskTracker.

TaskTracker is a node in the cluster which accepts jobs like MapReduce and Shuffle operations from  the JobTracker .

