NAME

Do It Yourself Annotator - diya.pm


VERSION

1.0


SYNOPSIS

A simple diya script:

  use diya;
  $pipeline = diya->new;
  $pipeline->read_conf;
  $pipeline->run;

The script can be run like this:

  diya-script.pl --conf diya.conf seq1.fa seq2.fa ...


DESCRIPTION

diya is an open source tool used to build annotation pipelines. A pipeline is a series of steps linking the various stages of sequence annotation into a concise process. The software is designed to use sequences as input. These could be complete genomes or the result of shotgun sequencing of a genome library. A possible output would be a fully annotated genomic sequence in Genbank format.

You can also use this Genbank file as input and load GFF into a backend database for viewing with tools like GBrowse.

A pipeline may be executed on a single computer or on a cluster. Currently diya only supports the Sun Grid Engine platform if you are using a cluster.

In a nutshell

All diya pipelines are made up of parser or script steps that are executed in a specific order. The details on the parser and script steps for a given pipeline are contained in a single XML configuration, or conf, file.

The diya.pm Perl module is the controller module for a diya annotation pipeline. This module reads the configuration file that describes the pipeline, executes each step in the pipeline, and launches specific parser modules when required. It also keeps track of the input and output files and keeps all these files in a single output directory.

parser steps

A parser step is what is doing the analysis in the pipeline. For every diya parser step there will be a bioinformatics application that will produce output and a corresponding Perl module that parses the application output and creates an annotated Genbank file. A parser step can act at any time in the pipeline.

script steps

A script step in diya is simpler than a parser step. Its output is not parsed so it does not required a corresponding Perl module. A script step may do something like move a file, format a database, or send an alert. A script step can act at any time in the pipeline.

Code functionality

This is a rough description of how a pipeline works in the diya.pm code:

  1. A diya script is launched from the command-line
  2. A pipeline object is created by new()
  3. Variables such as the input sequence files and the conf file name are set
  4. The diya home directory is set using $DIYAHOME or the current working directory
  5. The pipeline object reads the configuration file with read_conf()
  6. The run() method is called, it will iterate over all the steps in the pipeline
  7. The pipeline object locates or creates the output directory
  8. The pipeline object sets the input sequence file
  9. The pipeline performs format conversion on the input file, if necessary
  10. A command string is constructed using the information in the configuration file
  11. The command is executed, creating a program output file
  12. The pipeline creates a parser object and the output file is parsed
  13. The pipeline proceeds to the next step
  14. If the pipeline finishs then run() starts again with the next sequence

A good way to watch what diya is doing is to run it with verbose set to 1. For example:

  diya.pl --conf diya.conf --verbose 1


INSTALLATION

The details are in the INSTALL file. diya uses BioPerl, and you will need to install some other Perl modules from CPAN in addition.

The $DIYAHOME variable

Consider setting the $DIYAHOME environment variable. By default diya uses this directory when it looks for a diya configuration file and when it creates output directories. If you do not have this set then make sure to tell diya where your configuration file is using -conf or -use_conf, see more about this below.

diya tests

This package comes with a number of test scripts in the t/ directory that run automatically if you type:

  >perl Makefile.PL
  >make
  >make test

Most of the test scripts run bioinformatics applications, specifically blastall, formatdb, tRNAscan-SE, and glimmer3. The scripts are written such that they will skip many tests if these applications are not found in /usr/local/bin. If you want to run these tests and you have these applications installed then you may need to edit the *conf files found in t/data to enter the correct paths.


THE CONFIGURATION FILE

Most of the information about the pipeline is stored in a configuration file in XML format. The configuration file that comes with the package is called diya.conf but you can create your own configuration files and call them whatever you want. There are also example *conf files in the t/data and examples directory in this package.

The configuration file contains different sections. These sections can appear in any order in the file.

An example configuration file

 <?xml version="1.0" encoding="UTF-8"?>
 <conf>
   <script>
     <name>download</name>
     <executable>download-genome.pl</executable>
     <command>-id MYID -out OUTPUTFILE</command>
     <home>/Users/bosborne/diya/diya/branches/0.4.0/examples</home>
     <inputfrom></inputfrom>
   </script>
   <parser>
     <name>blastp</name>
     <executable>blastall</executable>
     <command>-i INPUTFILE -d MYDB -p blastp -o OUTPUTFILE</command>
     <home>/usr/local/bin</home>
     <inputformat>fasta</inputformat>
     <inputfrom>download</inputfrom>
   </parser>
   <run>
     <mode>serial</mode>
   </run>
   <order>
     <names>download blastp</names>
   </order>
 </conf>

You might run the pipeline using this configuration file like this:

 diya.pl -conf download-blastp.conf -set MYDB=/opt/gb/at.fa -set MYID=3

The order section

This section tell diya what script and parser steps to run and in what order. The names of the steps are separated by spaces. For example:

  <order>
    <names>tRNAscanSE glimmer blastall</names>
  </order>

You will see below that every parser or script section has a line for its name, like this:

  <name>tRNAscanSE</name>

You use that same name in the names line. This means that there has to be a corresponding parser or script section for each name in the names section.

The run section

This section tells diya whether to run the pipeline on a cluster or not. For example:

  <run>
    <mode>serial</mode>
  </run>

You can run diya to run in sge or serial mode. These are the only 2 possible values.

The parser sections

This section describes a parser step. An example for the application tRNAscan-SE:

  <parser>
    <executable>tRNAscan-SE</executable>
    <home>/usr/local/bin</home>
    <command>-B -o OUTPUTFILE INPUTFILE</command>
    <name>tRNAscanSE</name>
    <inputformat>fasta</inputformat>
    <inputfrom></inputfrom>
  </parser>
executable

The name of the application. This should be the actual name, not a synonym. Required.

home

The directory where the application is found. Required.

command

The command that has to be run, without the application name. Note that these do not have to be real file names. Instead you can substitute actual input and output file names with INPUTFILE and OUTPUTFILE. See more on this in WRITING YOUR OWN conf* FILES.

name

The arbitrary name for the step. It does not have to be the same as the executable but if the step is a parser then this has to be the same as the name of the Perl module that parses the executable output. The only rule is that no punctuation or spaces are allowed in the name. For example, a name could be tRNAscanSE, but not tRNAscan-SE (the reason for this is that spaces and punctuation are not allowed in a Perl module name). Required.

In addition, you may want to have different steps in a pipeline that use the same application or script, but in different ways. This way you can assign a different name to each of these steps.

inputformat

The sequence format for the input file. Optional, if there is no inputformat set then fasta format is assumed.

If inputformat is set then diya will determine the format of the input file for the given step. If this format is different from the inputformat of the step then diya will create a new file of the correct format and make it the new input file for the step.

inputfrom

This is optional. Use this if you want the output file from one parser or script step to be used as the input file for another parser or script step. For example, if the input file for 'stepA' should be created by 'stepB' do this:

  <parser>
    <name>stepA</name>
    <inputfrom>stepB</inputfrom>
    <executable>mixmaster</executable>
    <home>/usr/local/bin</home>
    <command>INPUTFILE</command>
    <inputformat></inputformat>
  </parser>

If you do not specify inputfrom for any step then it is assumed that the input file comes from the command-line. For example, if you run diya like this:

   diya-script.pl --conf diya.conf seq1.fa

then the input file will be 'seq1.fa' when there is no inputfrom for a given step.

The script sections

A script step simply executes and its output is not parsed. For example, you may need to copy a sequence file from some location before running a pipeline. Or you may want to send the pipeline output somewhere or do an email alert after the pipeline is done, you would write script steps for these purposes. An example:

 <script>
   <name>formatdb</name>
   <executable>formatdb.sh</executable>
   <command>INPUTFILE</command>
   <home>/Users/bosborne/diya/branches/0.4.0/examples</home>
   <inputfrom>extractCDS</inputfrom>
 </script>
executable

The name of the script. This should be the actual name, not a synonym. Required.

name

The name for this step in the pipeline. Required.

home

The directory where the script is found. Required.

command

The command that has to be run, without the executable name. Optional.

inputfrom

This is optional. Use this if you want the output file from a parser or script step to be used as the input file for a script step.


INITIALIZATION OPTIONS

You can pass options to your pipeline object when you create it with the new() method.

verbose

When you set verbose to 1 the pipeline object will print out useful diagnostic messages. Set verbose to true like this:

  my $pipeline = diya->new(-verbose => 1)

Setting verbose is optional, the default value is 0 or false

mode

The values are serial and sge. Set mode like this:

  my $pipeline = diya->new(-mode => 'sge')

Setting mode is optional, the default value is serial.

use_conf

Specify the configuration file for the pipeline. The conf file can have any sort of name as long as it has the correct format. An example:

  my $pipeline = diya->new( -use_conf => "~/myconf.conf" )

Setting use_conf is optional. If it is not set then diya will look for a file named diya.conf in your $DIYAHOME directory or in the current working directory.

outputdir

Specify the output directory for the pipeline. An example:

  my $pipeline = diya->new( -outputdir => "~/myfiles" )

Setting outputdir is optional. If it is not set then diya will create an output directory in your $DIYAHOME directory using a timestamp, for example ``2008-06-29-11:35:38-diya''.


USING A DIYA SCRIPT

You will run diya using a fairly simple script since most of the details are in the configuration file. The diya.pl script that comes with diya is an example.

diya scripts are run like this:

        % diya.pl [options] [input files]

Command-line options

--verbose

Set the verbosity level, 0 or 1.

 diya.pl --verbose 1
--mode [serial|sge]

Run the batch in serial mode, or sge mode if SGE is available.

 diya.pl --mode serial
--conf

Use the given conf file. If you use this option then this given conf file will be used, if there is a conf file specified in the new() method it will be ignored.

 diya.pl --conf new-diya.conf
--outputdir

Set the output directory.

 diya.pl --outputdir /tmp/mypipeline

Using --set to modify a command

You can also modify your commands dynamically from the command-line. For example, you might want to run blastall and create an output file with a specific name. Here is an example blastall command from a *conf file:

  <command>-p blastp -d ran.fa -i INPUTFILE -o MYOUTPUTFILE</command>

You could run diya.pl like this:

  diya.pl --set MYOUTPUTFILE=blastp.out

And an output file called blastp.out would be created.

You can add these ``wild card'' words anywhere you want to in the command line of the *conf file. The only rule is that you should not use the words INPUTFILE, OUTPUTFILE, and OUTPUTDIR. These are already being used by diya. One way to make sure your ``wild card'' is unique is to prefix it with 'MY'. We also suggest capitalizing these words, for clarity.

Using --set to set a variable in a Perl module

Suppose you want your Perl module to be able to get some value from the command-line and use it as a variable, e.g. $MYDATABASE. First add the variable name to the @EXPORT_OK array in diya.pm. Then modify the use diya; line in your Perl module, for example:

 use diya qw($MYDATABASE);

After these modifications you should be able to do the following:

 diya.pl --set MYDATABASE=ncbi Seq.fa

And the variable $MYDATABASE will have the value ncbi in your Perl module when diya.pl runs.

When you use --set you are creating global variables that can be used in your own Perl modules so make sure that your variable names do not collide with diya variables. One way to do this is to use variable names that are all capitalized, or prefix the name with 'MY'.


WRITING YOUR OWN conf* FILES

THE CONFIGURATION FILE section discusses the structure of the *conf file but in order to create your own files you will need to understand some of the internal details of diya.

When diya runs it can create the names of input and output files. This makes it easy for diya to keep track of files since one of its jobs is to pass the output of one step to the next step as input. diya uses a timestamp and the name of the step to create file names, for example:

  2008_08_07_10_19_53-create-fasta-db.out

The meaning of INPUTFILE

The file above was created by the create-fasta-db step, as you can see from its name. This file could be the input for some other step, and you would indicate this by using the inputfrom field. For example:

  <parser>
    <inputfrom>create-fasta-db</inputfrom>
    <executable>blastall</executable>
    <home>/usr/local/bin</home>
    <command>-p blastp -i INPUTFILE -d MYDATABASE -o OUTPUTFILE </command>
    <name>blastpCDS</name>
    <inputformat></inputformat>
  </parser>

The block above says that the INPUTFILE should come from the create-fasta-db step. When diya runs and the actual command is constructed this part of the command line:

    -p blastp -i INPUTFILE

Will be transformed into something like:

    -p blastp -i /tmp/2008_08_07_10_19_53-create-fasta-db.out

INPUTFILE has a second meaning, which is the name of the sequence file passed to the diya script. Recall that you can run a diya script like this:

  mydiya.pl --conf my.conf NC_123456.fa

If a given step has no inputfrom value then the value of INPUTFILE will be the name of the sequence file set from the command-line, or ``NC_123456.fa'' in the example above.

This does not mean that you have to use INPUTFILE in each step. It means that when INPUTFILE is present in a command line it will substituted in one of these 2 ways, depending on whether or not there is an inputfrom value.

The meaning of OUTPUTFILE

The OUTPUTFILE from one step will frequently be used as the INPUTFILE to another step. Thus you may need to explicitly create a file with the name contained in OUTPUTFILE using the command line. An example script block:

 <script>
   <name>create-fasta-db</name>
   <executable>createdb.pl</executable>
   <command>-o OUTPUTFILE</command>
   <home>~/scripts</home>
   <inputfrom></inputfrom>
 </script>

When this step runs a command like this will be run and executed:

  ~/scripts/createdb.pl -o /tmp/2008_08_07_10_19_53-create-fasta-db.out

Here ~/scripts/createdb.pl will use the file name provided by diya and put its output into that file. An alternative is to redirect the output of an application into an output file. For example:

 <script>
   <name>create-fasta-db</name>
   <executable>createdb.pl</executable>
   <command> > OUTPUTFILE</command>
   <home>~/scripts</home>
   <inputfrom></inputfrom>
 </script>

The meaning of OUTPUTDIR

By default diya creates an output directory to contain all the input and output files created during a pipeline run, something like:

   2008_11_21_22_02_19_diya/

You can get this name in order to use it in your script or parser steps, something like:

 <script>
   <name>move-file</name>
   <executable>move-file.pl</executable>
   <command>-o OUTPUTDIR</command>
   <home>~/scripts</home>
   <inputfrom></inputfrom>
 </script>

In this example the name of the diya output directory will be passed to the script.


CREATING AND LOADING GFF

The authors use diya to annotate sequences and save these annotations as GenBank files. They routinely convert these files to GFF, load the GFF into Bio::DB::GFF databases, and visualize the annotations using GBrowse. The conversion script used is diya-genbank2gff3.pl in the scripts directory, this script is a modification of the genbank2gff3.pl script that comes with BioPerl.


AUTHORS

Andrew Stewart, andrew.stewart@med.navy.mil Brian Osborne, briano@bioteam.net


CONTRIBUTORS

Tim Read, timothy.read@med.navy.mil


METHODS

Public methods are listed first, followed by private methods prefixed with '_'.

new

 Name    : new
 Usage   : $diya = diya->new()
 Function: create a diya pipeline object
 Returns : a diya object
 Args    : -verbose   (optional), 1 or 0, set the verbosity level
           -use_conf  (optional), name of the conf file to be used
           -outputdir (optional), the directory where results will reside
           -mode      (optional), 'serial' is default 
 Example : my $pipeline = diya->new(-verbose   => 1,
                                    -use_conf  => "latest.conf",
                                    -outputdir => 'mydir',
                                    -mode      => 'sge' );

read_conf

 Name    : read_conf
 Usage   : $diya->read_conf("my_conf_file")
 Function: read a diya conf file
 Returns : 1 on success
 Args    : the name of the conf file to be read (optional)
 Example : $pipeline->read_conf()  or  $pipeline->read_conf("latest.conf")

run

 Name    : run
 Usage   : $diya->run()
 Function: run a diya pipeline
 Returns : 1 on success
 Args    : 
 Example : $pipeline->run

new_parser

 Name    : new_parser
 Usage   : $parser = $module->new_parser
 Function: instantiate a new parser object 
 Returns : a new parser object
 Args    : none
 Example :

order

 Name    : order
 Usage   : $diya->order( @array ) or $order = $diya->order
 Function: get or set the order of the steps to be run
 Returns : array of step names
 Args    : To set pass an array of one or more step names the parsers and
           scripts must exist in the conf file in the <parser> and
           <script> sections
 Example : $pipeline->order( qw(tRNAscanSE blastall) )  or 
           $pipeline->order('tRNAscanSE') or
           @my_order = $self->order

write_conf

 Name    : write_conf
 Usage   : $diya->write_conf("my_new_conf_file")
 Function: write a conf file - if no name is supplied then the new file will
           be given a name of format <timestamp>-diya.conf
           (e.g.  2008-06-29-11:35:38-diya.conf )
 Returns : the name of the conf file that was written
 Args    : the name of the conf file that will be written (optional)
 Example : $pipeline->write_conf()  or $pipeline->write_conf("version2.conf")

verbose

 Name    : verbose
 Usage   : $diya->verbose($num) or $verbose_level = $diya->verbose
 Function: get or set the verbose level
 Returns : the verbose level
 Args    : 
 Example : $pipeline->verbose(1)

project

 Name    : project
 Usage   : $diya->project($num) or $project = $diya->project
 Function: get or set the NCBI project number
 Returns : the NCBI project number
 Args    : 
 Example : $pipeline->project(1355)

outputdir

 Name    : outputdir
 Usage   : $diya->outputdir()
 Function: get or set the name of the output directory, where all of 
           the files created by the pipeline will be written - the 
           object will try and create the directory if it does not exist
           if no output directory is specified then an output directory will
           be created based on a timestamp, e.g. "2008-06-29-11:35:38-diya"
 Returns : name of the output directory or 0 if not output directory is set
 Args    : 
 Example : $diya->outputdir("pipe-output")

mode

 Name    : mode
 Usage   : 
 Function: get or set the mode corresponding to a pipeline
 Returns : "serial" or "sge"
 Args    : 
 Example : $pipeline->mode("serial")

cleanup

 Name    : cleanup
 Usage   : $diya->cleanup
 Function: remove extraneous files created when a pipeline is run
 Returns : 1 on success
 Args    : none
 Example :

inputfile

 Name    : inputfile
 Usage   : $diya->inputfile('NC.gbk')
 Function: Get or set the names of the input sequence files
 Returns : 
 Args    : 
 Example : $self->inputfile("234.fa") or $self->inputfile( qw(234.fa AB.fa) )

_next_inputfile

 Name    : _next_inputfile
 Usage   : 
 Function: Get the name of the next input sequence file, remove the 
           last from the queue
 Returns : 
 Args    : 
 Example :

_execute

 Name    : _execute
 Usage   : $self->_execute($command)
 Function: encapsulate the serial and sge execution logic 
 Returns : none
 Args    : command
 Example :

_reconstruct_sequence

 Usage   : _reconstruct_sequence
 Function: reconstruct the sequence object. Used only when mode is sge. 
           When mode is sge, the _execute() will generate an intermediate
           script performing the $parse->parse($diya). In the intermediate script,
           this method is called. Please see _execute 
 Returns : none
 Args    : none
 Example : $self->_reconstruct_sequence()

_check_executable

 Usage   : _check_executable
 Function: checks to see that the executable exists
 Returns : 1 or die
 Args    : Name of step
 Example : $self->_check_executable($step)

_check_input_sequence

 Usage   : _check_input_sequence
 Function: checks to see that the format of the input sequence file is correct,
           if the format is not correct then it creates a sequence file
           of the correct format
 Returns : The name of the file of the correct format
 Args    : Name of step
 Example : $self->_check_input_sequence($step)

_check_inputfile

 Usage   : _check_inputfile
 Function: records the input file name for the step - this may be
           the input sequence for the entire pipeline but a step may also
           use the output of another step as input
 Returns : the name of the input file for the step
 Args    : name of step
 Example : $self->_check_inputfile($step)

_check_outputdir

 Name    : _check_outputdir
 Usage   : $diya->_check_outputdir
 Function: checks that there is an output directory - if no output directory is defined
           then a directory name will be made using a timestamp 
           (e.g.  "2008-06-29-11:35:38-diya") and this directory will be in the diya home
           directory - if an output directory is defined but does not exist then we will 
           attempt to create it
 Returns : name of output directory, on success
 Args    : none
 Example :

_make_command


 Name    : _make_command
 Usage   : $pipeline->_make_command($step)
 Function: to create a complete command using information from the conf file 
           and any command-line options - private method called by run()
 Returns : a command, ready to execute
 Args    : the name of the parser step (e.g. "tRNAscanSE")
 Example : $pipeline->_make_command($parser)

_make_outputfilename

 Name    : _make_outputfilename
 Usage   : $diya->_make_outputfilename($parser)
 Function: create an output file name with parser step name and timestamp,
           private method, called by run()
 Returns : output file name
 Args    : step name
 Example :

_lastsgeid

 Name    : _lastsgeid
 Usage   : $diya->_lastsgeid()
 Function: set or get the last sge job id submitted by current process,
                          used only internally for job id tracking.
 Returns : last sge job id submitted by current process.
 Args    : 
 Example : $pipeline->_lastsgeid(53)

_outputfile

 Name    : _outputfile
 Usage   : $file = $self->_outputfile($parser)
 Function: get the output file name for a given parser step, 
           private method called by run() or a parser module
 Returns : output file name
 Args    : step name
 Example : $file = $self->_outputfile($step)

_greeting

 Name    : _greeting
 Usage   : $diya->_greeting
 Function: print a greeting - private method, called by new()
 Returns : nothing
 Args    : 
 Example :

_help

 Name    : _help
 Usage   : $diya->_help
 Function: print the POD - works only if DIYA is installed
 Returns : nothing
 Args    : 
 Example :

_diyahome

 Name    : _diyahome
 Usage   : 
 Function: add the path to the diya home directory to the object -  the path to the 
           diya package comes from the env $DIYAHOME. If this not set then try to 
           use the current working directory. Private method, called by new()
 Returns : the diya home directory
 Args    : none
 Example :

_conf

 Name    : _conf
 Usage   : 
 Function: get or set a hash representing the conf file - private method, 
           called by read_conf() and write_conf()
 Returns : a hash reference representing the conf file
 Args    : 
 Example : $conf = $diya->_conf

_executable

  Name    : _executable
  Usage   : 
  Function: return the executable name corresponding to a parser
            or script, private method called by _make_command
            and _check_executable
  Returns : executable name
  Args    : parser or script name 
  Example : $exe = $self->_executable($name)

_parsers

 Name    : _parsers
 Usage   : 
 Function: return all the parser names in the conf file,
           private method called by order()
 Returns : array of parser names  
 Args    : none
 Example : @parsers = $self->_parsers

_scripts

 Name    : _scripts
 Usage   : 
 Function: return all the script names in the conf file,
           private method called by order()
 Returns : array of script names  
 Args    : none
 Example : @scripts = $self->_scripts

_sequence

 Name    : _sequence
 Usage   : $diya->_sequence($seq) or $seq = $diya->_sequence
 Function: get or set the DNA sequence object, private method used by parser
           modules and by _check_input_sequence() 
 Returns : the sequence object
 Args    : 
 Example : $pipeline->_sequence($seq)

_home

 Name    : _home
 Usage   : 
 Function: return the home or location corresponding to an executable, 
           private method called by _make_command and _check_executable
 Returns :  
 Args    : parser or script name 
 Example : $path = $self->_home($exe)

_inputfrom

 Name    : _inputfrom
 Usage   : 
 Function: return the inputfrom field corresponding to a parser or
           script
 Returns :  
 Args    : parser or script name
 Example : $path = $self->_inputfrom($module)

_command

 Name    : _command
 Usage   : 
 Function: return the command corresponding to a step,
           private method called by _make_command()
 Returns :  
 Args    : step name
 Example : $args = $self->_command($step)

_inputformat

 Name    : _inputformat
 Usage   : 
 Function: return the file format required by a parser step,
           private method called by run()
 Returns : format name, or 0 if no value is found
 Args    : parser step name
 Example : $format = $self->_inputformat($step)

_use_conf

 Name    : _use_conf
 Usage   : $diya->_use_conf("conf_file")
 Function: add the name of the conf file being used to the object - private method, 
           called by read_conf()
 Returns : the name of the conf file being used
 Args    : 
 Example :

_initialize

 Name    : _initialize
 Usage   : $diya->_initialize
 Function: add parameters to the diya object - strips the dash used by named 
           parameters, private method called by new()
 Returns : 
 Args    : 
 Example :

_get_type

 Name    : _get_type
 Usage   : $type = $diya->_get_type($step)
 Function: return 'script' or 'parser', private method called by run()
 Returns : 'script' or 'parser'
 Args    : 
 Example :

_load_app_module

 Name    : _load_app_module
 Usage   : $diya->_load_app_module($module)
 Function: call require on the given module, private method called by run()
 Returns : full name of module, e.g. "diya::tRNAscanSE"
 Args    : name of the module, e.g. 'tRNAscanSE'
 Example :

_get_options

 Name    : _get_options
 Usage   : 
 Function: get command-line options - the --set option is used to
           create globals that can be imported into a 
           parser module or used directly in this module
 Returns : 1 on success 
 Args    : none 
 Example : At the command-line: diya.pl --set REFD=~/mydb.ref