Aggregation Pipeline:
- A pipeline of predefined operators
- Some operators ($match, $project) can be executed in parallel when sharding is used
- very fast
- you can iterate over the output or use the $out operator to store it into a collection
- easy to develop and debug: Start with just one operator, look at the output, continue with the next operator, and so on.
- collection-oriented: For the whole collection, execute the first operator in the pipeline. The output is a collection. For that collection, execute the second operator, and so on.
MapReduce:
- user-defined map and reduce function; written in JavaScript
- you can code “anything” in it (please do not access the database within map and reduce!)
- map and reduce functions are executed in parallel when sharding is used
- the reduce output will be written into a sharded collection (shard key is _id)
- when you use the mapReduce method e.g. in the Java Client API, you can iterate over the result. But even there, map and reduce are JavaScript functions (as Strings) and not Java functions.
- document-oriented: for each document in the collection, call the map function. When finished, for each distinct key emitted in the map function calls, call the reduce function with the list of values as a second parameter. The output of the reduce function is one document.
It is recommended to use the aggregation pipeline instead of MapReduce. It is fast, powerful and there are more possibilities how MongoDB can optimize queries. If you need to do write your own code (e.g. for String manipulation, generating random numbers, word count, …), it can be easier to write a MapReduce job.
solved NoSQL && mongodb database