Analyzing Yelp data with Pig

Close Analyzing Yelp data with Pig 5 questions 1 point 1.  The reading for this module includes a Pig script for yelp data, which has the commands and example from the video ‘Other Pig Commands’. Run the script for yelp data in your environment. See the comments at the beginning to remind you how to run […]

Read More »

Classification Algorithms – Naive Bayes Review

Close Classification Algorithms – Naive Bayes Review 5 questions 1 point 1.  What is the main Naive Bayes assumption? Knowledge about the value of the class attribute indicates value of another attribute Knowledge about the value of a particular attribute doesn’t tell us anything about the value of another attribute Knowledge about the value of a […]

Read More »

MDM CDC : Custom redundancy check

This component create a hidden column to generated a hash key based on the combination of multiple columns. This is used when you want to compare an incoming feed to the existing MDM, so to avoid String to String comparisons on the lookup which makes your process slow, you need to use the CDC component

Read More »