Ambari
-
Ambari provides Dashboard:
- Enable Admin Users:
ssh yourname@127.0.0.1 -p 2222
su root
ambari-admin-password-reset
Pig Concept
- Usage of Pig:
- Grunt
- Script
- Ambari
example -> find the oldest 5-star movie
- New Script in Pig View
- Load data:
ratings = LOAD 'ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime: int);
metadata = LOAD 'ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray,
videoRelease:chararray, imdbLink:chararray);
- FOREACH/GENERATE:
nameLookup = FOREACH metadata GENERATE movieTitle, ToUnixTime(ToDate(releaseDate, 'dd-MM-yyyy')) AS releaseTime;
- Group By
ratingsByMovie = Group ratings BY movieID;
*Return Result:
avgRatings = Foreach ratingsByMovie Generate group AS movieID, AVG(ratings.rating) AS avgRating;
fiveStarMovies= Filter avgRatings By avgRating > 4.0;
fiveStarsWithData = join fiveStarMovies by movieID, nameLookup by movieID;
oldestFiveStarMovie = order fiveStarsWithData by nameLookup::releaseTime;
dump oldestFiveStarMovie;
-
Result: (Runtime - several minutes)
-
With Tez, it can shrink into 1 minute