hadoop中利用GenericOptionsParser 和ToolRunner来进行配置文件的读取

hadoop中利用GenericOptionsParser 和ToolRunner来进行配置文件的读取

import java.util.Map.Entry;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Test {

 public static class ConfigurationPrinter extends Configured implements Tool{
  public int run(String args[])throws Exception{
   Configuration conf=getConf();
   for(Entry<String,String> entry:conf){
    System.out.printf(“%s = %sn”,entry.getKey(),entry.getValue());
   }
   return 0;
  }
 }
 public static void main(String[] args) throws Exception {
  // TODO Auto-generated method stub
          int exitedCode=ToolRunner.run( new ConfigurationPrinter(), args);
          System.exit(exitedCode);
 }

}

We make ConfigurationPrinter a subclass of Configured, which is an implementation
of the Configurable interface. All implementations of Tool need to implement
Configurable (since Tool extends it), and subclassing Configured is often the easiest way
to achieve this. The run() method obtains the Configuration using Configurable’s
getConf() method, and then iterates over it, printing each property to standard output.
The static block makes sure that the HDFS and MapReduce configurations are picked
up in addition to the core ones (which Configuration knows about already).
ConfigurationPrinter’s main() method does not invoke its own run() method directly.
Instead, we call ToolRunner’s static run() method, which takes care of creating a

Configuration object for the Tool, before calling its run() method. ToolRunner also uses
a GenericOptionsParser to pick up any standard options specified on the command
line, and set them on the Configuration instance. We can see the effect of picking up
the properties specified in conf/hadoop-localhost.xml by running the following
command:
% hadoop ConfigurationPrinter -conf conf/hadoop-localhost.xml
| grep mapred.job.tracker=
mapred.job.tracker=localhost:8021

GenericOptionsParser also allows you to set individual properties. For example:
% hadoop ConfigurationPrinter -D color=yellow | grep color
color=yellow

The -D option is used to set the configuration property with key color to the value
yellow. Options specified with -D take priority over properties from the configuration
files. This is very useful: you can put defaults into configuration files, and then override
them with the -D option as needed. A common example of this is setting the number
of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the
number of reducers set on the cluster, or if set in any client-side configuration files.

 

GenericOptionsParser and ToolRunner options
Option Description
-D property=value
   Sets the given Hadoop configuration property to the given value. Overrides any default
or site properties in the configuration, and any properties set via the -conf option.
-conf filename …

     Adds the given files to the list of resources in the configuration. This is a convenient way
to set site properties, or to set a number of properties at once.
-fs uri

    Sets the default filesystem to the given URI. Shortcut for -D fs.default.name=uri
-jt host:port

    Sets the jobtracker to the given host and port. Shortcut for -D mapred.job.tracker=host:port
-files file1,file2,…

   Copies the specified files from the local filesystem (or any filesystem if a scheme is
specified) to the shared filesystem used by the jobtracker (usually HDFS) and makes
them available to MapReduce programs in the task’s working directory. (See “Distributed
Cache” on page 239 for more on the distributed cache mechanism for copying files to
tasktracker machines.)
-archives     archive1,archive2,…
    Copies the specified archives from the local filesystem (or any filesystem if a scheme is
specified) to the shared filesystem used by the jobtracker (usually HDFS), unarchives
them, and makes them available to MapReduce programs in the task’s working
directory.
-libjars jar1,jar2,…     

    Copies the specified JAR files from the local filesystem (or any filesystem if a scheme is
specified) to the shared filesystem used by the jobtracker (usually HDFS), and adds them
to the MapReduce task’s classpath. This option is a useful way of shipping JAR files that
a job is dependent on.

此条目发表在Uncategorized分类目录,贴了标签。将固定链接加入收藏夹。