Split file rows into multiple files depending on a column’s values
In this tutorial we will show you how to split a file into multiple small files depending on a value in one specific column.
Collect all the values of our “pivot” column chosen for the decomposition of the file
Then, for each value of this column, extracting the corresponding records to the current value and save them in a new file.
Let’s take a file ( Chicago crime data ) , we will choose only a small sample of rows ( around 20 000 rows ) and depending on the value of District we will create a separate file.
Here is the schema of the file :
for this we need the following components :
Download this job :