PipelineCsvAvroProtobuf - Read from CSV file present in GCS, convert it to Protobuf and write back to GCS. Uses AvroIO to write the file.
PipelineAvroProtobufParquet - Read from Protobuf file present in GCS, convert it to parquet and write back to GCS. PipelineCsvAvroProtobuf was used to generate the file.
PipelineProtobufParquet - Read from Protobuf file present in GCS, convert it to parquet and write back to GCS. The Protobuf file was NOT written using Avro.
PipelineCsvParquet - Read from CSV file present in GCS, convert it to Parquet and write back to GCS.
PipelinePubSubBtBq - Read from Pub/Sub the device telemetry data and write it to BigQuery and Bigtable. Avro is used to define the data schema.
PipelineDbBq - Read from MySQL database using JDBCIO and write to BigQuery using BigQueryIO. Uses Employee database employees table.
PipelineDbNestedBQ - Read from MySQL database using JDBCIO, create nested repeating tables and write to BigQuery using BigQueryIO. Uses Employee database employees table.
PipelineCsvAvroBq - Read from CSV file present in GCS, use OpenCSV to parse the line and write to BigQuery using BigQueryIO. Uses Employee database employees table as CSV data. Avro is used to define the data schema.
PipelineDbInterleaveSpanner - Read from MySQL database using JDBCIO, write to Spanner using SpannerIO. Uses Employee database employees table. Two interleaved tables are to be pre-created in Spanner. Avro is used to define the data schema.
- emp Table with schema: *emp_id: INT64 NOT NULL birth_date: STRING first_name: STRING - dept Table with schema: emp_id: INT64 NOT NULL *dept: INT64 join_date: STRING
mvn clean package
Check the deployDF.py file