Parallelized loading of data into Pandas Dataframes [duplicate]
I need to compare data from 2 azure databases, that each contain fair amount of data. Given that the sql query is i/o bound, I am wondering if I could kick off 2 queries (one to each database) concurrently, and have the returned data deposited into the respective 2 dataframes.
I have read that Pandas is not thread-safe (this is really unfortunate) otherwise the Python ThreadPoolExecutor along with Pandas pd.read_sql could handle it. Now a few questions:
- is it still true that Pandas is still not threadsafe ?
- is there a better library to use that is thread-safe ?
- other thought on how to safely improve the performance of getting data into dataframes on my PC ? (using python)
Comments
Post a Comment