Talend data preparation tutorial [Chicago Crimes data]

Talend data preparation tutorial [Chicago Crimes data]

In this Tutorial, we talk about Talend Data prep and we will be using Chicago Crimes Data from 2001 up to present.

First thing, import the data,you might have noticed Talend Data Prep can not deal with Huge files , Chicago Crime Data is a 2Gb file, in this case the import will fail.

Import the Dataset




We will try to reduce the sample of Data ( which i think Talend data prep will do automatically in the future ) using Talend open Studio.

1-Filter the first N rows of a file using Talend

Let’s take the first 20 000.






Click on “Add Dataset”

This is how it looks now



Some analysis on the file, extracted from the statistics that Data prep offers.

With one click on each column we will answer a real analysis question.

What are the most dangerous streets in Chicago ?

By clicking on the block volumn which represent the Address , we will get the following image on the bottom right :



We can clearly identify the most dangerous Street in Chicago with only one click. Great.

What are the most frequent Crimes ?



Leave a Reply