How to convert HDFS text files to Parquet using Talend

How to convert HDFS text files to Parquet using Talend

On the palette add the three following components

tHdfsConnection

tFileInputDelimited

tFileOutputParquet

How to convert HDFS text files to Parquet using Talend

How to convert HDFS text files to Parquet using Talend

 

PS : You can do this in a standard job or in a mapreduce job.

Spark on Talend doesn’t support Parquet ( yet  in v6.0.1)

Which big data formats are supported ?

2 Responsesso far.

  1. Paul Friedman says:

    Quick question:

    I can see how to add the Parquet components to a MapReduce job, but the disappear from the palette when editing a standard job.

    Is there a way to add tFileInputParquet to a standard job?

    Thanks.

    —Paul

    • ahallam says:

      Hi Paul,

      You can’t use a tParquet files in a standard job. it should be removed from the palette for the standard jobs.
      You can use parquet format only from a mapreduce or spark job in Talend.

      You need to put your data using tHdfsOuput into your Big data platform then convert it to Parquet or Avro format with a big data batch ( mapreduce or spark)
      For any other question, please let us know.

      Regards
      Amine Hallam

Leave a Reply