InstanceOfJava

September 2016

Big data Hadoop interview questions answers freshers and experienced - Part 2

Posted by: Instanceofjava Posted date: September 25, 2016 / comment : 0 Big data Hadoop interview questions and answers for freshers and experienced

Big data hadoop interview questions and answers freshers and experienced

Hadoop interview questions and answers for freshers and experienced - Part 1

31.Using linux command line. how will you Copy file from your local directory to HDFS

hadoop fs -put localfile hdfsfile

32.What platforms and Java versions does Hadoop run on?

Java 1.6.x or higher, preferably from Sun. Linux and Windows are the supported operating systems, but BSD, Mac OS/X, and OpenSolaris are known to work. (Windows requires the installation of Cygwin).

33.Is there an easy way to see the status and health of a cluster?

There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system.
By default, these are located at http://job.tracker.addr:50030/ and http://name.node.addr:50070/.
The JobTracker status page will display the state of all nodes, as well as the job queue and status about all currently running jobs and tasks.
The NameNode status page will display the state of all nodes and the amount of free space, and provides the ability to browse the DFS via the web.
You can also see some basic HDFS cluster health data by running:
$ bin/hadoop dfsadmin –report

34.Do I have to write my job in Java?

No. There are several ways to incorporate non-Java code.

35.How do I submit extra content (jars, static files, etc) for my job to use during runtime?

The distributed cache feature is used to distribute large read-only files that are needed by map/reduce jobs to the cluster. The framework will copy the necessary files from a URL (either hdfs: or http:) on to the slave node before any tasks for the job are executed on that node.
The files are only copied once per job and so should not be modified by the application.
Copying content into lib is not recommended and highly discouraged. Changes in that directory will require Hadoop services to be restarted.

36.How do I change final output file name with the desired name rather than in partitions like part-00000, part-00001?

You can subclass the OutputFormat.java class and write your own. You can look at the code of TextOutputFormat MultipleOutputFormat.java etc. for reference. It might be the case that you only need to do minor changes to any of the existing Output Format classes.
To do that you can just subclass that class and override the methods you need to change.

37.How do you gracefully stop a running job?

hadoop job -kill <JOBID>

38.How the HDFS Blocks are replicated?

A. HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size.
The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.
The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a data block on HDFS, 2 copies are stored on data nodes on same rack and 3rd copy on a different track.

39.How the Client communicates with HDFS?

A. The Client communication to HDFS happens using Hadoop HDFS API. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives.
Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

40.What is HDFS Block size? How is it different from traditional file system block size?

In HDFS data is split into blocks and distributed across multiple nodes in the cluster.
Each block is typically 64Mb or 128Mb in size. Each block is replicated multiple times.
Default is to replicate each block three times. Replicas are stored on different nodes.
HDFS utilizes the local file system to store each HDFS block as a separate file. HDFS
Block size can not be compared with the traditional file system block size.

41.When is the reducers are started in a MapReduce job?

In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.

42.If reducers do not start before all mappers finish then why does the progress on Map Reduce job shows something like Map(60%) Reduce(15%)? Why reducers progress percentage is displayed when mapper is not finished yet?

Reducers start copying intermediate key-value pairs from the mappers as soon as they are available.
The progress calculation also takes in account the processing of data transfer which is done by reduce process, therefore the reduce progress starts showing up as soon as any intermediate key-value pair for a mapper is available to be transferred to reducer.
Though the reducer progress is updated still the programmer defined reduce method is called only after all the mappers have finished.

43.What is the Hadoop MapReduce API contract for a key and value Class?

The Key must implement the org.apache.hadoop.io.WritableComparable interface.
The value must implement the org.apache.hadoop.io.Writable interface.

44.What are combiners? When should I use a combiner in my MapReduce Job?

Combiners are used to increase the efficiency of a MapReduce program.
They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers.
You can use your reducer code as a combiner if the operation performed is commutative and associative.
The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your
MapReduce jobs should not depend on the combiners execution.

45.Where is the Mapper Output (intermediate kay-value data) stored ?

A. The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes.
This is typically a temporary directory location which can be setup in config by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.

46.Name the most common InputFormats defined in Hadoop? Which
one is default ?

Following 2 are most common InputFormats defined in Hadoop

TextInputFormat
KeyValueInputFormat
SequenceFileInputFormat

TextInputFormatis the hadoop default

47. What is the difference between TextInputFormat and KeyValueInputFormat class

TextInputFormat: It reads lines of text files and provides the offset of the line as key to the Mapper and actual line as Value to the mapper KeyValueInputFormat: Reads text file and parses lines into key, val pairs.
Everything up to the first tab character is sent as key to the Mapper and the remainder of the line is sent as value to the mapper.

48. What is InputSplit in Hadoop

When a hadoop job is run, it splits input files into chunks and assign each split to a mapper to process. This is called Input Split

49. How is the splitting of file invoked in Hadoop Framework

It is invoked by the Hadoop framework by running getInputSplit()method of the Input format class (like FileInputFormat) defined by the user

50. Consider case scenario: In M/R system,

HDFS block size is 64 MB
Input format is FileInputFormat
We have 3 files of size 64K, 65Mb and 127Mb
then how many input splits will be made by Hadoop framework?
Hadoop will make 5 splits as follows
1 split for 64K files
2 splits for 65Mb files
2 splits for 127Mb file

51. What is the purpose of RecordReader in Hadoop

The InputSplithas defined a slice of work, but does not describe how to access it. The RecordReaderclass actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper. The RecordReader instance is defined by the InputFormat

52. After the Map phase finishes, the hadoop framework does
"Partitioning, Shuffle and sort". Explain what happens in this phase?

Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same
Shuffle
After the first map tasks have completed, the nodes may still be performing several more map tasks each. But they also begin exchanging the intermediate outputs from the map tasks to where they are required by the reducers. This process of moving map outputs to the reducers is known as shuffling.
Sort
Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer

53. If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer?

The default partitioner computes a hash value for the key and assigns the partition based on this result.

54. What is a Combiner

The Combiner is a "mini-reduce" process which operates only on data generated by a mapper.
The Combiner will receive as input all data emitted by the Mapper instances on a given node.
The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers.

55. Give an example scenario where a combiner can be used and where it cannot be used

There can be several examples following are the most common ones
Scenario where you can use combiner
Getting list of distinct words in a file
Scenario where you cannot use a combiner
Calculating mean of a list of numbers

56.What is job tracker

Job Tracker is the service within Hadoop that runs Map Reduce jobs on the cluster

57. What are some typical functions of Job Tracker

The following are some typical tasks of Job Tracker
Accepts jobs from clients
It talks to the NameNode to determine the location of the data
It locates TaskTracker nodes with available slots at or near the data
It submits the work to the chosen Task Tracker nodes and monitors progress of each task by receiving heartbeat signals from Task tracker

58.What is task tracker

Task Tracker is a node in the cluster that accepts tasks like Map, Reduce and Shuffle operations - from a JobTracker

59. Whats the relationship between Jobs and Tasks in Hadoop

One job is broken down into one or many tasks in Hadoop.

60. Suppose Hadoop spawned 100 tasks for a job and one of the task
failed. What will hadoop do ?

It will restart the task again on some other task tracker and only if the task fails more than 4 (default setting and can be changed) times will it kill the job

61.Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this

Speculative Execution

62. How does speculative execution works in Hadoop

Job tracker makes different task trackers process same input.
When tasks complete, they announce this fact to the Job Tracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the Task Trackers to abandon the tasks and discard their outputs.
The Reducers then receive their inputs from whichever Mapper completed successfully, first.

How to call non static method from static method java

Posted by: Instanceofjava Posted date: September 24, 2016 / comment : 1 Core java Interview Questions , java programming interview questions

Class is a template and when we create object instance variables gets memory.
If we create two objects variables get memory in two objects. So instance variables gets memory whenever object is created.
When we create variables as static , memory will not be created in objects because static means class level and it belongs to class not object but we can access static variables and static methods from objects.
Calling static method from non static method in java
In our scenario calling a non static method from static method in java.
If we are calling a non static method then we need to use object so that it will call corresponding object non static method.
Non static methods will be executed or called by using object so whenever we want to call a non static method from static method we need to create an instance and call that method.
If we are calling non static method directly from a static method without creating object then compiler throws an error.

Program #1: Java example program to call non static method from static method.

calling non static method from static method.png

In the above program we are trying to call non static method of class from a static method so compiler throwing error.
Can not make a static reference to the non- static method nonStaticMethod() from the type StaticMethodDemo
So without object we can not cal non static method of a class.
Check the below example program calling non static method from static method by creating object of that class and on that object calling non static method.

Program #2: Java example program to call non static method from static method.

package com.instanceofjava.staticinterviewquestions;
//www.instanceofjava.com
public class StaticMethodDemo {
void nonStaticMethod(){
System.out.println("non static method");
}
public static void staticMethod(){
new StaticMethodDemo().nonStaticMethod();
}
public static void main(String[] args) {
StaticMethodDemo.staticMethod();
}
}

Output:

non static method

Calling static method from non static method in java

Posted by: Instanceofjava Posted date: September 24, 2016 / comment : 0 Core java Interview Questions , java programming interview questions

Static means class level and non static means object level.
Non static variable gets memory in each in every object dynamically.
Static variables are not part of object and while class loading itself all static variables gets memory.
Like static variables we have static methods. Without creating object we can access static methods.
Static methods are class level. and We can still access static methods in side non static methods.
We can call static methods without using object also by using class name.
And the answer to the question of "is it possible to call static methods from non static methods in java" is yes.
If we are calling a static method from non static methods means calling a single common method using unique object of class which is possible.

Program #1: Java example program to call static method from non static method.

package com.instanceofjava.staticinterviewquestions;
public class StaticMethodDemo {
void nonStaticMethod(){
System.out.println("Hi i am non static method");
staticMethod();
}
public static void staticMethod(){
System.out.println("Hi i am static method");
}
public static void main(String[] args) {
StaticMethodDemo obj= new StaticMethodDemo();
obj.nonStaticMethod();
}
}

Output:

Hi i am non static method
Hi i am static method

In the above program we have created object of the class and called a non static method on that object and in side non static method called a static method.
So it is always possible to access static variables and static methods in side non static methods

Program #2: Java example program to call static method from non static method.

Big data Hadoop interview questions answers freshers and experienced - Part 2

How to call non static method from static method java

Calling static method from non static method in java

Top 60 Hadoop interview questions and answers for freshers and experienced - Part 1