[SOLVED] How to convert a non-templated beam job to templated job and run it on GCP Dataflow runner?

Issue

This Content is from Stack Overflow. Question asked by Siddhanta Rath

I was able to run the non-templated beam job directly on the GCP dataflow runner by using the below command :

java -jar <jar_name> 
--runner=DataFlowRunner 
--gcpTempLocation=gs://some/gcs/location 
--stagingLocation=gs://some/gcs/location/stage 
--tempLocation=gs://some/gcs/location/temp 
--region=<region_name> 
--project=<project_name> 
--subnetwork=<subnet_name> 
--jobName=<job_name>

I wanted to templatize the same job by using the below command to stage the template in the gcs bucket :

java -jar <jar_name> 
--runner=DataFlowRunner 
--gcpTempLocation=gs://some/gcs/location 
--stagingLocation=gs://some/gcs/location/stage 
--templateLocation=gs://some/gcs/location/templates/<job_name>
--region=<region_name> 
--project=<project_name>

but I am receiving the below error while creating the job template instance:

18:11:05.004 [main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Template successfully created.
Exception in thread "main" java.lang.UnsupportedOperationException: The result of template creation should not be used.
    at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:41)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:559)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:540)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:324)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:253)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:212)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:206)
    at com.gojek.de.jobs.EventFilterJob.main(EventFilterJob.java:72)

and upon running the dataflow job from the GCS template, the dataflow job runner cannot launch the job instance from the template.

I am able to see the template creation at the GCS bucket. I am not sure, why the job run failed.
also, can we directly convert a non-template beam job to a template job?

Note: I cannot run the maven command given in the document as our project is Gradle based.



Solution

When you are creating a template, you can not use DataflowPipelineJob::waitUntilFinish, as there is no job attached to that run — which seems to be the case here.

Take a look at the WordCount example to see an example of a working template.


This Question was asked in StackOverflow by Siddhanta Rath and Answered by Bruno Volpato It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?