How to convert HDFS text files to Parquet using Talend

How to convert HDFS text files to Parquet using Talend On the palette add the three following components tHdfsConnection tFileInputDelimited tFileOutputParquet   PS : You can do this in a standard job or in a mapreduce job. Spark on Talend doesn’t support Parquet ( yet  in v6.0.1) Which big data formats are supported ?

Read More »

Map reduce Tutorial with Python

Hadoop Map reduce Tutorial with Python Pre-requisities   Download Canopy On the console download Map reduce package : !pip install mrjob   Create map reduce code class :  from mrjob.job import MRJob class MRRatingCounter(MRJob): def mapper(self, key, line): (userID, movieID, rating, timestamp) = line.split(‘\t’) yield rating, 1 def reducer(self, rating, occurences): yield rating, sum(occurences) if __name__ […]

Read More »