Skip to content

Latest commit

 

History

History
66 lines (43 loc) · 3.16 KB

File metadata and controls

66 lines (43 loc) · 3.16 KB

Examples

parameter check: ##preparation. ###Register your job and input/output data sets

	 export PASSKEY=mysecretkey

         ./utility.sh RunUpdate "insert into job values ('paramjob', 'OUT','paramsfile','',now())"
         ./utility.sh RunUpdate "insert into job values ('paramjob', 'IN','sometable','',now())"
	 ./utility.sh RunUpdate "insert into dataset values('sometable','localhost','dataflow','testdata','testtable','etl',null)";
	 ./utility.sh RunUpdate "insert into dataset values('paramsfile','localhost','file:','testdata','testtable','etl',null)"
 ### force the depency sometable to be met
 ./utility.sh RunUpdate "delete from datastatus where tableid='sometable'"
 ./utility.sh RunUpdate "insert into datastatus values('sometable','otherjob','1.0','OUT','READY',now())"

### what we just did
  Setting the environment with PASSKEY allows us to run these utilities without passing it on the command line

  We registered a job with an id of 'paramjob', assigned the dataset named 'sometable' to the input and the dataset named 'paramsfile' to output
  We associated some values for hostname,schema,tablename,user and pssword to the 'sometable' data set
  We associated some values for hostname,schema,tablename,user and pssword to the 'paramfile' data set
  Our script could use the values for 'sometable' to  connect to a jdbc database.
  Our script could use the values for 'paramfile' to specify a path and basefilename to the output.
  We cleaned up the datastatus to make this test repeatable. 
  We added a datastatus with arbitrary dataid, status READY and OUT for data set sometable so that the partition 1.0 can be consumed.
  

    ## Running the job
  ``` ./RunJob paramjob 'bash -c "set|grep -e sometable -e paramsfile -e ^dataid"' ```

### what we just did  

  We provided the jobid='paramjob' so that the framework can look up the associated data sets.
  We provided a command to execute in that environment.
  In real life, we would put a path to a shell script or a binary, but the command works as well and we use it to peek into the environment.
  The automatic variables will always be prefixed by the datasetid, so we grep for those. 

 
     ### What we expect to see:

   All the automatic variables that are supplied to this jobid when it runs.	
   This lets us know what variables we can use in our script and to smoketest that they were assigned properly

## Variation. Second test

  Run the RunJob command a second time without touching any tables. Now it correctly refuses to run, because it has already run for this dataid.
  Verify this by listing the datastatus table.

  ```      ./utility.sh RunSQL "select * from datastatus where jobid='paramjob'" ```

  See that we now have a line indicating that job finished successfully 

## Variation.  Writing to the output file

     We can use the output file parameters to simulate a real extract and see how one might use the OUT dataset variables:

  ``` ./RunJob paramjob 'bash -c "set|grep -e sometable -e paramsfile -e ^dataid" >  ${paramsfile_tablename}_${dataid}.dat"' ```

 Now we see a file named 'testtable_1.0.dat' after the run.