MapReduce
- MapReduce is a framework to develop distrubuted applications using Java,Scala, and other programming languages.
- MapReduces's distributed filesystem is not a replacement for a relational database management systen.
Hadoop and Spark
Hadoop and Spark are two different distributed software frameworks.
- A spark job can be an arbitrary DAG(Directed Acyclic Graph) of map and/or reduce/shuffle phase. Spark may run with/without Hadoop.
Benefits for big data processing
- reliability
- scalability
- distributed processing
- parallelism