2020-11-29

Improve query performance in large collection with large documents using indexes or in any other possible way

I am using PyMongo with Flask and I would like to know how to optimize a query, as I am filtering within a large collection (8793 documents) with large documents.

This is one of the document structures of the collections:

enter image description here

As you can see, it has 4 properties (simulationID, simulationPartID, timePass and status, which stores many arrays). This collection has a size of 824.4MB. The average size of the documents is 96.0KB.

enter image description here

Basically, I’m trying to find the documents that have simulationPartID 7 (1256 documents) and filter on them the array index equal to the nodeID value (which I receive as a parameter) within the status property, and take the fourth or fifth element of this array (depending on the case parameter), in addition to append the timePass.

def node_history(nodeID, case):
    coordinates = []
    node_data = db['node_data']
    db.node_data.create_index([('simulationPartID', 1), ('simulationID', 1)])
    if case == 'Temperature':
        for document in node_data.find({"simulationPartID": 7}):
            coordinates.append([document['timePass'], document['status'][int(nodeID)-1][3]])
    elif case == 'Stress':
        for document in node_data.find({"simulationPartID": 7}):
            coordinates.append([document['timePass'], document['status'][int(nodeID)-1][4]])
    else:
        pass
    coordinates.sort()
    return json.dumps(coordinates, default=json_util.default)

As I mentioned, the collection is very large, and the query takes about 30 - 60 seconds to be performed, depending on the machine, but I want it to run as quickly as possible because I want my application to be as interactive as possible. As you can seeI already tried to create a index in both simulationID and simulationPartID properties.

I never worked with large collections before, so I'm not into indexing. I don't even know if I did it properly in my code. So, I would like to know if there is a way to optimize my query using a different approach of indexes, or in any other possible way, and make it faster.

Data samples:

{
  "_id": {
    "$oid": "5f83f54d45104462898aba67"
  },
  "simulationID": "001",
  "simulationPartID": 7,
  "timePass": 0,
  "status": [
    [
      1,
      1.34022987724954e-40,
      0.00220799725502729,
      20,
      114.911392211914
    ],
    [
      2,
      0.00217749993316829,
      0.00220799725502729,
      20,
      -2.0458550453186
    ],
    [
      3,
      0.0020274999551475,
      0.00235799723304808,
      20,
      -1.33439755439758
    ],
    [
      4,
      3.36311631437956e-44,
      0.00235799723304808,
      20,
      148.233413696289
    ],
    [
      5,
      1.02169119449431e-38,
      0.000149997213156894,
      20,
      -25633.59765625
    ],
  ]
},

{  
  "_id": {
    "$oid": "5f83f54d45104462898aba68"
  },
  "simulationID": "001",
  "simulationPartID": 7,
  "timePass": 1,
  "status": [
    [
      1,
      1.34022987724954e-40,
      0.00220799725502729,
      20,
      114.911392211914
    ],
    [
      2,
      0.00217749993316829,
      0.00220799725502729,
      20,
      -2.0458550453186
    ],
    [
      3,
      0.0020274999551475,
      0.00235799723304808,
      20,
      -1.33439755439758
    ],
    [
      4,
      3.36311631437956e-44,
      0.00235799723304808,
      20,
      148.233413696289
    ],
    [
      5,
      1.02169119449431e-38,
      0.000149997213156894,
      20,
      -25633.59765625
    ],
  ]
},
{
"_id": {
    "$oid": "5f83f54d45104462898aba69"
  },
  "simulationID": "001",
  "simulationPartID": 7,
  "timePass": 2,
  "status": [
    [
      1,
      1.34022987724954e-40,
      0.00220799725502729,
      20,
      114.911392211914
    ],
    [
      2,
      0.00217749993316829,
      0.00220799725502729,
      20,
      -2.0458550453186
    ],
    [
      3,
      0.0020274999551475,
      0.00235799723304808,
      20,
      -1.33439755439758
    ],
    [
      4,
      3.36311631437956e-44,
      0.00235799723304808,
      20,
      148.233413696289
    ],
    [
      5,
      1.02169119449431e-38,
      0.000149997213156894,
      20,
      -25633.59765625
    ],
  ]
}

Thank you!



from Recent Questions - Stack Overflow https://ift.tt/33pbmVT
https://ift.tt/2Ji6tXN

No comments:

Post a Comment