Please enable JavaScript to view this site.

While very nature of Kafka and its cluster based architecture is to accommodate large volumes of data from Producers, issues can occur. Various factors may make a contingency plan for a Kafka cluster desirable or even a requirement. One such contingency scenario is described below.

 

The principal issue associated with an extended Kafka outage traces back to the source of the data where the Connect CDC SQData Capture and Publishing occur. Achieving high end to end throughput is accomplished through the careful management of the data that has been captured and the efforts made to avoid the I/O operations required when that transient data must be "landed", in other words written to disk before it is consumed and written to its eventual target, in this case Kafka.

 

When the Apply or Replicator Engine is unable to write to Kafka that eventually translates to the need to hold the captured data and/or slow down its capture at the source. That can become problematic, particularly when the source itself generates a very high volume of Change Data. When an Engine stops, data cannot be published, committed units-of-work (UOWs) are captured and the data ordinarily  held in memory must be written to a transient Storage area. Depending on the Capture that may be a zOS high performance LogStream or disk storage dedicated for this purpose. Eventually the transient data area will be exhausted. When that happens the Capture will slow and eventually stop reading the database log. This is considered an acceptable state albeit not normal or desirable but manageable. The problem that can occur is that the source database log files may be archived, moved to slower storage or even deleted. When that happens Capture will be unable to continue from where it left off and the possibility of data loss becomes reality. Talk to Precisely about the best practices we recommend for log archiving.

 

If a contingency plan for this situation is required, the following is two approach's may be considered. While it does "land" the data, it will do so on disk often much lower in cost than the on the source machine:

 

Using SQDUtil

Stop the Apply Engine, if it is still running.

1.Start SQDUtil, using the same source URL as the Engine using the "move" parameter to reconnect to the MF publisher, which will resume publishing from the start of the last "uncommitted" Unit-of-Work. The file system written to by SQDUTIL must be able to accommodate the published "cdcraw" data which will be maintained in its original form, EBCDIC in the case of zOS source change data.

2.When Kafka becomes operational Start a special version of the Engine where the source URL points to the file created by SQDUTIL. If multiple files had been created, the engine would need to be restarted after each file was processed, using the same input file name or a reparsed special Engine (depending on how the files are managed).

3.Once the single or last file has been processed, the special Engine will stop normally.

4.Start the normal Apply Engine to resume normal operation.

Using a special Engine and a Kafka utility

1.Start a special version of the engine where the target URL has been modified to write JSON formatted data to files. Depending on the nature of your normal target URL (does it use a single topic, an "*" dynamic topic, or SETURL specified topics), one or more files will created that can contain timestamped file names.

2.When Kafka becomes operational, use a Kafka utility (Kcat comes to mind) to read the JSON file(s) and write all the data to Kafka.

3.Once the last file has been processed, Start the normal Apply Engine to resume operation.

 

Some things to be considered:

1.None of this may be necessary, given both Kafka's fault tolerance and the capacity of the source to hold the unpublished data. You should consult your Kafka support folks before going down either path.

2.The size of the files created by the SQDUTIL approach may require more or less space than the size of the JSON formatted files created by the second approach. This is a factor of source data volume and your significantly fewer number of target fields derived from each source segment.

3.The number of files created by the second approach will vary based on the style of your normal target URL and the need to create more than one file based on size or time constraints, potentially complicating the scheduling/operation of the Kafka utility.

4.If the normal destination utilized AVRO rather than traditional JSON formatting, there will be no Confluent Schema Registry in that case only the SQDUtility option can be utilized.

Please review the real need for any sort of contingency and then schedule a call to discuss what steps to take next with Precisely Support.