I decided to practice with parsers and working with mongodb, but I encountered the following problem.

There is a collection with the goods. For example, take the following documents from it:

{title: 'acer aspire 6420', source: 'amazon', price: 300} {title: 'acer aspire 6420', source: 'ebay', price: 320} 

Of these, you must obtain the following document:

 {title: 'acer aspire 6420', amazon: 300, ebay: 320} 

that is, group by name and display the price in each of the stores.

During the trial I wrote the following code on python:

 from bson.code import Code reducer = Code(""" function(origin, res){ res[origin.source] = origin.price } """) db.products.group(key={"source":1, 'title': 1, 'price': 1}, condition={'title': 'acer aspire 6420'}, initial={}, reduce=reducer) 

but in the end I get this result:

 {title: 'acer aspire 6420', source: 'amazon', price: 300, amazon: 300, ebay: 320} {title: 'acer aspire 6420', source: 'ebay', price: 320, amazon: 300, ebay: 320} 

Tell me, please, in which direction to dig further or how to implement it, so that the result would be like this:

 {title: 'acer aspire 6420', amazon: 300, ebay: 320} 

I googled and read the documentation, really. I just don’t really understand how to tie it all together correctly.

  • If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer). - Nicolas Chabanovsky

1 answer 1

Option with aggregation framework (for mongoshell):

 db.products.aggregate([ {$match: {title: "acer aspire 6420"}}, {$project: {_id: 0, title: 1, price: {source: "$source", value: "$price"} }}, {$group: {_id: "$title", prices: {$push: "$price"}}} ]) 

What gives the output:

 { "_id" : "acer aspire 6420", "prices" : [ { "source" : "amazon", "value" : 300 }, { "source" : "ebay", "value" : 320 } ] } 

The first operation filters by title ( title ) of the product, the next one forms the price field as an object with two fields — source and value , and the last operation groups by title , adding all price options to the created prices array field.

Because your source problem should be the name of the field as a result, but for now the mongodb aggregation framework does not support dynamic names for the created fields ( https://jira.mongodb.org/browse/SERVER-5947 ), you can go through the result with using map :

 db.products.aggregate([ {$match: {title: "acer aspire 6420"}}, {$project: {_id: 0, title: 1, price: {source: "$source", value: "$price"} }}, {$group: {_id: "$title", prices: {$push: "$price"}}} ]).map(function(e) { var r = {} r.title = e._id; e.prices.forEach(function(i){r[i.source] = i.value}); return r; }) 

What will the output:

 [ { "title" : "acer aspire 6420", "amazon" : 300, "ebay" : 320 } ] 

In this case, it is likely (unlike the first option) to lose information if you, for example, have two different prices for amazon.

For python, I think you can do by analogy.

  • Yes, there are no problems with rewriting to python. thank you very much, very helpful - mavokiko
  • If the fields are dynamic, the best solution is mapReduce if you cannot change the structure of the documents. But I suggest you change the structure of the document, because MapReduce can cause a drop in performance in your application - styvane