Please enable JavaScript to view this site.

Navigation: Quickstart Guides > HDFS Quickstart > Create HDFS Apply Engine Script

Construct the HDFS Apply Engine Script

Scroll

The following examples replicate DB2 changed data (CDC) for the IVP EMP and DEPT tables into HDFS in JSON or AVRO format. An example of the JSON output can be seen above in Determine HDFS Output Format. The example also includes a filter for the EMPLOYEE table, only updates to employees with a bonus over $5,000 will cause the record to be written to HDFS.  All changes to the DEPT table are Replicated with no filter applied.

Example 1

----------------------------------------------------------------------

-- Name: DB2TOHDFS z/OS DB2 To HDFS on Linux

-- Client/Project: client/project

----------------------------------------------------------------------

--  SUBSTITUTION PARMS USED IN THIS SCRIPT:

--    %(ENGINE) - ENGINE Name

--    %(SHOST) - Source HOST of the Capture/Publisher

--    %(SPORT) - Source HOST SQDaemon PORT

--    %(PUBNM) - Source Capture/Publisher Agent Name

--    %(THOST) - Target HDFS HOST

--    %(TPORT) - Target HDFS HOST PORT

----------------------------------------------------------------------

--       Change Log:

----------------------------------------------------------------------

-- 2017-06-01 INITIAL RELEASE

----------------------------------------------------------------------

JOBNAME DB2TOHDFS;

 

OPTIONS

   CDCOP('I','U','D')                  -- Set CHANGE OP Constants

  ,USE AVRO COMPATIBLE NAMES

;

----------------------------------------------------------------------

--       Data Definition Section

----------------------------------------------------------------------

----------------------------------------------------------------------

--       Source Descriptions

----------------------------------------------------------------------

BEGIN GROUP DB2_SOURCE;

DESCRIPTION DB2SQL ./DB2DDL/EMP.ddl AS EMP;

DESCRIPTION DB2SQL ./DB2DDL/DEPT.ddl AS DEPT;

END GROUP;

 

----------------------------------------------------------------------

--       Target Descriptions

----------------------------------------------------------------------

 

----------------------------------------------------------------------

--       Datastore Section

----------------------------------------------------------------------

----------------------------------------------------------------------

--       Source Datastore

----------------------------------------------------------------------

DATASTORE cdc://%(SHOST):%(SPORT)/%(PUBNM)/%(ENGINE)

        OF UTSCDC

        AS CDCIN

        DESCRIBED BY GROUP DB2_SOURCE

;

----------------------------------------------------------------------

--       Target Datastore(s)

----------------------------------------------------------------------

DATASTORE hdfs://%(THOST):%(TPORT)/*.dat

        OF JSON                           -- specify JSON format

        AS TARGET

        DESCRIBED BY GROUP DB2_SOURCE     -- use source for REPLICATE

        STAGING SIZE 4G                   -- file size 4G max

        STAGING DELAY 60                 -- rotate every 60 minutes

;

----------------------------------------------------------------------

--       Field Specification Section

----------------------------------------------------------------------

DATEFORMAT 'ISO';

 

----------------------------------------------------------------------

--       Procedure Section

----------------------------------------------------------------------

CREATE PROC P_EMP AS SELECT

{

  IF EMP.BONUS > '5000'

  {

      REPLICATE(TARGET, EMP)

  }

}

FROM CDCIN;

 

----------------------------------------------------------------------

--       Main Section

----------------------------------------------------------------------

PROCESS INTO TARGET

SELECT

{

  CASE RECNAME(CDCIN)

      WHEN 'EMP' CALLPROC(P_EMP)

      WHEN 'DEPT' REPLICATE(TARGET, DEPT)

}

FROM CDCIN;

 

 

Example 2

Switching the Output format to AVRO, including a Schema Repository, requires only the following elements of the script above to be added or altered:

------------------------------------------------------------

--       SCHEMA ID LIST

------------------------------------------------------------

SCHEMA ID FOR

  CDC HEADER 1

,EMP <employee_Schema_ID>

,DEPT <department_Schema_IDn>

 

 

----------------------------------------------------------------------

--       Target Datastore(s)

----------------------------------------------------------------------

DATASTORE hdfs://%(THOST):%(TPORT)/*.dat

        OF AVRO                           -- specify AVRO format

        AS TARGET

        DESCRIBED BY GROUP DB2_SOURCE     -- use source for REPLICATE

        STAGING SIZE 4G                   -- file size 4G max

        STAGING DELAY 60                 -- rotate every 60 minutes