Tuesday, October 22, 2013

Optimizing Python - getting data out of memcache with struct.unpack

So, I have this Memcache data store that holds timestamps and values from a monitoring application. Since each memcache key corresponds to an hour's data, I only need to store 2 bytes for the number of seconds past the hour. I don't care about duplicate data being stored, but on retrieval I'd like to eliminate it if it exists.

Input data is: (ts,val), (ts, val), ... encoded using Python's struct.pack command. The ts (timestamp) is (as noted) packed with format h (unsigned int). The val (value) is a floating point number of 4 bytes, packed with format f.

The original version of this encoding was:

    def OLD_rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        while raw:
            ts, val, raw = raw[:2], raw[2:6], raw[6:]
            ts = timeOffset + struct.unpack('h', ts)[0]
            val = struct.unpack('f', val)[0]
            tsVals.append((ts, val))
        return tsVals
I found this version:
  • ran really slowly;
  • didn't eliminate duplicate values;
  • would really choke the longer the input data (as in 33,000 datapoints in an hour).

For the second version, I knew I had to stop with the copying of the data string over and over again, which I knew was eating major cycles.

Doing some math, I figured out I could iterate over the string, extracting each element and converting the two parts to python numbers.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

This was better timewise, but didn't remove duplicate data. I made the 'seen it yet' test occur even before the conversion to (int, float), which saved a bit of time doing useless conversions.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            if rawtime in seenTimes:
                continue
            seenTimes.add(rawtime)
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

Yet, it was STILL TOO SLOW. Where was the time going? I timed the various parts and found the slow bit was the conversion to int/float. That unpack was happening a lot and the time added up.

I tried the following but FAILED.

        # BAD DON'T USE ** BAD DON'T USE **
        elems  = rawLen / 6.0  # 6 bytes per - 2=time + 4=data.
        intElems = int(elems)
        if (elems != intElems):
            self.log.warning("elems non-integer: len: %s" % (rawLen))
            return []
        unp = struct.unpack("hf"*intElems, raw)
        # BAD DON'T USE ** BAD DON'T USE **

The above fails because if we pack these things together, there's a word-alignment problem that unpack is unable to cope with. It would have to be something like (int, zeroes, float) to make the float align on a word boundary.

But, I couldn't give up, this had to work better. So, I extract all the ints, string those together and unpack them, then do the same thing with the floats.

HERE IS THE FINAL VERSION:

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        try:
            rawLen = len(raw)
            times = ""
            vals  = ""
            for i in range(0, rawLen, 6):
                rawtime = raw[i:i+2]
                if rawtime in seenTimes:
                    continue
                times += rawtime
                vals  += raw[i+2:i+6]
            timesList = struct.unpack('h'*(len(times)/2), times)
            valsList  = struct.unpack('f'*(len(vals)/4),  vals)
            assert len(timesList) == len(valsList), "Lens of times and vals unequal, t=%s, v=%s" % (len(timesList), len(valsList))
            for i in range(0, len(timesList)):
                tsVals.append((timeOffset+timesList[i], valsList[i]))
            #self.log.debug("unpacked %d vals, len ts %s." % (len(timesList), len(tsVals)))
        except:
            tb = traceback.format_exc()
            self.log.debug("tb in rawDataToTsVals(): rawSize: %s, %s" % (rawLen, tb))
            pass
        
        if 0:  # debugging
            self.log.debug("tsvals: %s" % ( tsVals))
        return tsVals

Unpacking all the h's at the same time, and likewise the floats, makes everything align, and since it's one function call to struct, is very fast.

Enjoy!

No comments: