How to convert HDFS text files to Parquet using Talend On the palette add the three following components tHdfsConnection tFileInputDelimited tFileOutputParquet   PS : You can do this in a standard job or in a mapreduce job. Spark on Talend doesn’t support Parquet ( yet  in v6.0.1) Which big data formats are supported ?

Hadoop Map reduce Tutorial with Python Pre-requisities   Download Canopy On the console download Map reduce package : !pip install mrjob   Create map reduce code class :  from mrjob.job import MRJob class MRRatingCounter(MRJob): def mapper(self, key, line): (userID, movieID, rating, timestamp) = line.split(‘\t’) yield rating, 1 def reducer(self, rating, occurences): yield rating, sum(occurences) if __name__