=================================== $out Aggregation Pipeline Operator =================================== :Title: $out Aggregation Pipeline Operator :Author: Emily Stolfo :Status: Draft :Type: Standards :Last Modified: January 12, 2015 .. contents:: -------------------- Server Specification -------------------- The aggregation framework is used to process data and return computed results. Up until release 2.6, the result from an aggregation was a single document. That single document had a 'result' field with an array containing the aggregation results. It was therefore subject to the BSON Document size limit, which is 16 megabytes at the time of release 2.6. The **$out** pipeline option is new in 2.6 and allows you to specify the name of a collection to which the result of the aggregation should be written. With the **$out** option, there is no 16mb limit on the result set. A new collection will be created if the one specified in **$out** does not already exist in the current database. Note that if the collection already exists, it will be dropped before written to. The **$out** option is sometimes referred to as "unsharded **$out**" because it only allows results to be piped to a non-sharded collection. On the other hand, the collection on which the aggregation is performed can be sharded. References: * SERVER ticket: https://jira.mongodb.org/browse/SERVER-3253 * DRIVERS ticket: https://jira.mongodb.org/browse/DRIVERS-111 Server Return Value '''''''''''''''''''' If the aggregation with **$out** specified completes successfully, the result will be a document in the following format:: { "result" : [ ], "ok" : 1 } Fields ~~~~~~ **result** This field will be an empty array. **ok** 1 indicates successful aggregation, 0 indicates failure. Notes and Restrictions '''''''''''''''''''''' **1. The $out collection cannot be sharded** The collection to which the aggregation is piped cannot be sharded. An error will be returned if an aggregation is attempted with **$out** specified as a sharded collection, or if that collection becomes sharded while the aggregation is running. The error is the following:: aggregate failed: { "errmsg" : "exception: namespace 'records.users' is sharded so it can't be used for $out'", "code" : 17017, "ok" : 0 } **2. The $out option must be the last pipeline operator** An ordered list is presumably already used to specify the pipeline passed to the aggregation. The **$out** option must be last in the list, otherwise, an error will be returned:: aggregate failed: { "code" : 16991, "ok" : 0, "errmsg" : "exception: $out can only be the final stage in the pipeline" } **3. Both the cursor option and $out option are specified** If both a cursor and **$out** are requested, the results will be written to the collection specified but no cursor will be created (cursor id == 0):: { "cursor" : { "id" : NumberLong(0), "ns" : "records.users", "firstBatch" : [ ] }, "ok" : 1 } Driver API ---------- The **$out** option is provided just as any other pipeline operator is:: pipeline = [{ $project: { uid: 1, email: 1 } }, { $out: "users" }] collection.aggregate(pipeline) Driver return value ''''''''''''''''''' The driver will return the raw document received from the server:: { "result" : [ ], "ok" : 1 } The user can decide whether to instantiate a collection using the name specified in the **$out** operator. Read preferences '''''''''''''''' The only replica set member that can be used with **$out** is the primary because the operation writes to a collection. If a read preference other than primary is specified, the driver will route the aggregation to the primary and log a warning that it has done so. See DRIVERS ticket: https://jira.mongodb.org/browse/DRIVERS-84 If **$out** is not specified, the read preference will be respected. Recall what your driver does with Map-Reduce if out is specified and it's not 'inline' while thinking about how to handle this scenario. Reason for warning: Rerouting the aggregation with **$out** to the primary could present a problem such that the collection is written and then queried by the user with read preference non-primary before replication has completed. The user risks querying the collection before it is fully replicated.