2022-03-19

Parallelized loading of data into Pandas Dataframes [duplicate]

I need to compare data from 2 azure databases, that each contain fair amount of data. Given that the sql query is i/o bound, I am wondering if I could kick off 2 queries (one to each database) concurrently, and have the returned data deposited into the respective 2 dataframes.

I have read that Pandas is not thread-safe (this is really unfortunate) otherwise the Python ThreadPoolExecutor along with Pandas pd.read_sql could handle it. Now a few questions:

  1. is it still true that Pandas is still not threadsafe ?
  2. is there a better library to use that is thread-safe ?
  3. other thought on how to safely improve the performance of getting data into dataframes on my PC ? (using python)


No comments:

Post a Comment