Tuesday, March 26, 2013

Solved: Pymongo findAndModify example collection object not callable


I just blew an hour with a silly mistake (or two!)

I have an MongoDB collection with a bunch of documents I'm accessing via pymongo.  The docs have data in an array.  In the latest doc, I want to append an element to the array.  So, I use $push, this is normal.  Finding the right doc to modify, this was harder.

Below is the terminal session where I did this.  I made several mistakes, I hope you find it useful to see them.

The biggest one is that THERE IS NO findAndModify in pymongo.  It's called 'find_and_modify()'.   I kept getting the dreaded error: 

TypeError: 'Collection' object is not callable. If you meant to call the 'findAndUpdate' method on a 'Database' object it is failing because no such method exists.

This was reasonable - there is no such function. 

Other mistakes below might be useful, too, in that I think they're probably common.

[esm@myboxname:scripts]$ python
Python 2.6.6 (r266:84292, Jul 20 2011, 10:22:43) 
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymongo import Connetion
>>> c = Connection('myboxname', 40000)
>>> mm = c.mymassivecollection
>>> ret = mm.delme2.insert({'metrictypeid': 20, 'start':1000, 'vals':[3,4]});
>>> ret
ObjectId('5150c9bd64524c04169fd612')
>>> mm.delme2.insert({'metrictypeid': 20, 'start':1010, 'vals':[3,4]});
ObjectId('5150c9cd64524c04169fd613')
>>> mm.delme2.insert({'metrictypeid': 20, 'start':1020, 'vals':[3,4]});
ObjectId('5150c9d264524c04169fd614')
>>> mm.delme2.insert({'metrictypeid': 20, 'start':1030, 'vals':[3,4]});
ObjectId('5150c9d764524c04169fd615')
>>> mm.delme2.insert({'metrictypeid': 10, 'start':1030, 'vals':[3,4]});
ObjectId('5150c9dd64524c04169fd616')
>>> mm.delme2.find()

>>> [x for x in mm.delme2.find()]
[{u'vals': [3, 4], u'start': 1000, u'_id': ObjectId('5150c9bd64524c04169fd612'), u'metrictypeid': 20}, {u'vals': [3, 4], u'start': 1010, u'_id': ObjectId('5150c9cd64524c04169fd613'), u'metrictypeid': 20}, {u'vals': [3, 4], u'start': 1020, u'_id': ObjectId('5150c9d264524c04169fd614'), u'metrictypeid': 20}, {u'vals': [3, 4], u'start': 1030, u'_id': ObjectId('5150c9d764524c04169fd615'), u'metrictypeid': 20}, {u'vals': [3, 4], u'start': 1030, u'_id': ObjectId('5150c9dd64524c04169fd616'), u'metrictypeid': 10}]
>>> import pprint
>>> pprint.pprint([x for x in mm.delme2.find()])
[{u'_id': ObjectId('5150c9bd64524c04169fd612'),
  u'metrictypeid': 20,
  u'start': 1000,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9cd64524c04169fd613'),
  u'metrictypeid': 20,
  u'start': 1010,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d264524c04169fd614'),
  u'metrictypeid': 20,
  u'start': 1020,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d764524c04169fd615'),
  u'metrictypeid': 20,
  u'start': 1030,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9dd64524c04169fd616'),
  u'metrictypeid': 10,
  u'start': 1030,
  u'vals': [3, 4]}]
>>> mm.runCommand( { 'findAndModify':'delme2', 'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1401, in __call__
    self.__name)
TypeError: 'Collection' object is not callable. If you meant to call the 'runCommand' method on a 'Database' object it is failing because no such method exists.
>>> mm.delme2.runCommand( { 'findAndModify':'delme2', 'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'runCommand' method on a 'Collection' object it is failing because no such method exists.
>>> mm.command( { 'findAndModify':'delme2', 'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/database.py", line 395, in command
    msg, allowable_errors)
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/helpers.py", line 144, in _check_command_response
    raise OperationFailure(msg % details["errmsg"])
pymongo.errors.OperationFailure: command {'sort': {'start': 1}, 'query': {'metrictypeid': 20}, 'findAndModify': 'delme2', 'update': {'$push': {'vals': 5}}, 'new': True} failed: no such cmd: sort
>>> mm.command( { 'findAndModify':'delme2', 'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/database.py", line 395, in command
    msg, allowable_errors)
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/helpers.py", line 144, in _check_command_response
    raise OperationFailure(msg % details["errmsg"])
pymongo.errors.OperationFailure: command {'sort': {'start': 1}, 'query': {'metrictypeid': 20}, 'findAndModify': 'delme2', 'update': {'$push': {'vals': 5}}, 'new': True} failed: no such cmd: sort
>>> mm.delme2.findAndModify({'query': 'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
  File "", line 1
    mm.delme2.findAndModify({'query': 'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
                                                    ^
SyntaxError: invalid syntax
>>> mm.delme2.findAndModify({'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndModify' method on a 'Collection' object it is failing because no such method exists.
>>> mm.delme2.findAndModify({query: {metrictypeid:20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'metrictypeid' is not defined
>>> mm.delme2.findAndModify({'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndModify' method on a 'Collection' object it is failing because no such method exists.
>>> mm.delme2.findAndModify({'query': {'metrictypeid':20}, 'sort': { 'start' : 1}, 'update' : { '$push' : { 'vals' : 5 } }, 'new': True});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndModify' method on a 'Collection' object it is failing because no such method exists.
>>> mm.delme2.find()

>>> pprint([x for x in mm.delme2.find()])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'module' object is not callable
>>> pprint.pprint([x for x in mm.delme2.find()])
[{u'_id': ObjectId('5150c9bd64524c04169fd612'),
  u'metrictypeid': 20,
  u'start': 1000,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9cd64524c04169fd613'),
  u'metrictypeid': 20,
  u'start': 1010,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d264524c04169fd614'),
  u'metrictypeid': 20,
  u'start': 1020,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d764524c04169fd615'),
  u'metrictypeid': 20,
  u'start': 1030,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9dd64524c04169fd616'),
  u'metrictypeid': 10,
  u'start': 1030,
  u'vals': [3, 4]}]
>>> mm.delme2.findAndModify({'metrictypeid':20}, { '$push' : { 'vals' : 5 }});
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndModify' method on a 'Collection' object it is failing because no such method exists.
>>> mm.delme2.findAndUpdate()
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1405, in __call__
    self.__name.split(".")[-1])
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndUpdate' method on a 'Collection' object it is failing because no such method exists.
>>> mm.findAndUpdate()
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1401, in __call__
    self.__name)
TypeError: 'Collection' object is not callable. If you meant to call the 'findAndUpdate' method on a 'Database' object it is failing because no such method exists.
>>> db
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'db' is not defined
>>> mm.delme2.find_and_modify({'metrictypeid':20}, { '$push' : { 'vals' : 5 }});
{u'vals': [3, 4], u'start': 1000, u'_id': ObjectId('5150c9bd64524c04169fd612'), u'metrictypeid': 20}
>>> pprint.pprint([x for x in mm.delme2.find()])
[{u'_id': ObjectId('5150c9cd64524c04169fd613'),
  u'metrictypeid': 20,
  u'start': 1010,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d264524c04169fd614'),
  u'metrictypeid': 20,
  u'start': 1020,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9d764524c04169fd615'),
  u'metrictypeid': 20,
  u'start': 1030,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9dd64524c04169fd616'),
  u'metrictypeid': 10,
  u'start': 1030,
  u'vals': [3, 4]},
 {u'_id': ObjectId('5150c9bd64524c04169fd612'),
  u'metrictypeid': 20,
  u'start': 1000,
  u'vals': [3, 4, 5]}]
>>> mm.delme2.find_and_modify(query={'metrictypeid':20}, update={'$push':{'vals':6}}, sort={'start':1},new=True);
/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py:1364: DeprecationWarning: Passing mapping types for `sort` is deprecated, use a list of (key, direction) pairs instead
  DeprecationWarning)
{u'vals': [3, 4, 5, 6], u'start': 1000, u'_id': ObjectId('5150c9bd64524c04169fd612'), u'metrictypeid': 20}
>>> mm.delme2.find_and_modify(query={'metrictypeid':20}, update={'$push':{'vals':6}}, sort=['start',1],new=True);
Traceback (most recent call last):
  File "", line 1, in 
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 1357, in find_and_modify
    kwargs['sort'] = helpers._index_document(sort)
  File "/path/not/important/here/pylibs/lib/python2.6/site-packages/pymongo-2.4.1-py2.6-linux-x86_64.egg/pymongo/helpers.py", line 68, in _index_document
    for (key, value) in index_list:
ValueError: too many values to unpack
>>> mm.delme2.find_and_modify(query={'metrictypeid':20}, update={'$push':{'vals':6}}, sort=[['start',1]],new=True);
{u'vals': [3, 4, 5, 6, 6], u'start': 1000, u'_id': ObjectId('5150c9bd64524c04169fd612'), u'metrictypeid': 20}
>>> mm.delme2.find_and_modify(query={'metrictypeid':20}, update={'$push':{'vals':7}}, sort=[['start',1]],new=True);
{u'vals': [3, 4, 5, 6, 6, 7], u'start': 1000, u'_id': ObjectId('5150c9bd64524c04169fd612'), u'metrictypeid': 20}
>>> mm.delme2.find_and_modify(query={'metrictypeid':20}, update={'$push':{'vals':7}}, sort=[['start',-1]],new=True);
{u'vals': [3, 4, 7], u'start': 1030, u'_id': ObjectId('5150c9d764524c04169fd615'), u'metrictypeid': 20}


Friday, March 08, 2013

Trying to build Python and a set of Python libraries on an AIX 6.1 box.

Starting with both gcc and IBM xlc compilers handy, this will be a set of notes, as opposed to a strict recipe. 

Have in /opt/freeware/bin:  gcc, make, tar.

Python notes:

    AIX:    A complete overhaul of the shared library support is now in
            place.  See Misc/AIX-NOTES for some notes on how it's done.
            (The optimizer bug reported at this place in previous releases
            has been worked around by a minimal code change.) If you get
            errors about pthread_* functions, during compile or during
            testing, try setting CC to a thread-safe (reentrant) compiler,
            like "cc_r".  For full C++ module support, set CC="xlC_r" (or
            CC="xlC" without thread support).
   
    AIX 5.3: To build a 64-bit version with IBM's compiler, I used the
            following:
   
            export PATH=/usr/bin:/usr/vacpp/bin
            ./configure --with-gcc="xlc_r -q64" --with-cxx="xlC_r -q64" \
                        --disable-ipv6 AR="ar -X64"
            make
So, I know a couple of things from previous adventures here.  First, I need working libraries for a variety of things:
* readline
* sqlite3
* zlib

LIBPATH=/opt/freeware/lib:/opt/freeware/lib64:/usr/lib:/usr/local/lib:/opt/recon/dcm/platforms/AIX/lib:/opt/recon/lib:

I've run into problems when compiling with xlc. The tests don't pass, and I get an exit status 11.  yuck.  More later.

MongoDB - How to Easily Pre-Split Chunks

Okay, so I'm doing some performance testing on MongoDB, spinning up a new configuration, and then throwing lots of data at it to see what the characteristics are.

Early on, I discovered that database locking was severely limiting performance.  MongoDB locks the database/collection (as of MongoDB 2.2.3) during every write operation.  However, it only locks it for the specific shard being written to.  So, massive numbers of writes dictates large numbers of shards, to minimize the lock/unlock bottleneck.  That's the theory.

In practice, this indeed proved correct.  We changed from a running (on 4 boxes - 2 primaries and 2 replicas) with 2 shards to running with 48 shards.  YES, they all magically cooperate nicely and share memory and CPU equally.  Actually, the boxes were all 196 GB memory w/ 32 cores each, so we weren't limited on machine capacity.  YMMV.

BTW, at peak operations, we're only seeing CPU of 50% per daemon, so we're not CPU limited. 

But, during my tests, I had to spin up first 4 shards, then 8, then 24, then 48.  Waiting for each test to settle into a steady state took time.  Each shard had to be active and equally used.  Likewise, there had to be enough data for it to be reading and writing at the same time, where the data was read out to send to the replicas.

The startup was handicapped by there being all activity on one shard, and not the others.  I would start things off, and all the activity would just be on 1 shard.  Since I was maxing out the capacity, there wasn't any spare IO available to do the splits and balancing.

To fix this, I found that I could add chunk split-points ahead of time, a process called 'pre-splitting chunks'.  To do this:
  • Turn off the balancer.
  • Run the command 'split' with a middle value, multiple times according to your number of shards.  
  • Turn on the balancer again.

I had 48 shards on a key with range 000-999, so I pretended I had 50 shards and gave each shard 20.

No, you don't have to stop the database from doing anything, like failover to secondaries or anything like that.

So, my commands were:

$ mongo --port 99999999 --host myhost
> use config
> sh.stopBalancer();
> use admin
> db.runCommand({split: "myDb.myCollection" middle:{"key1":000,'key2':1}});

> db.runCommand({split: "myDb.myCollection" middle:{"key1":020,'key2':1}});
> ... (46+ more of these)
> use config
> sh.startBalancer();



NOTE:  Always leave the balancer off more than 1 minute or it doesn't count.  That is, if you're 'bouncing' the balancer, leave it off for a full 60+ seconds or the bounce doesn't do anything.