2020-11-27

Download intraday historical stock data

I want to download historical intraday stock data. I've found AlphaVantage offers two years of data. It's the longest history of data that I've found for free.

I'm making a script to download the full two years of data for all ticker symbols that they offer and in all timeframes. They provide the data divided in 30 days intervals from the current day (or the last trading day, I'm not sure). The rows go from newest to oldest timedate. I want to reverse the order in which the data appears and concatenate all the months with the column headers appearing only once. So I would have a single csv file with two years of data for each stock and timeframe. The rows of the data would go from oldest to newest timedate.

The problem I have is that I also want to use the script to update the data and I don't know how to append only the data that doesn't already appear in my file. The data that I've downloaded goes from 2020-10-26 to 2020-09-28. When I update the data if the last datetime that appears is for example 2020-10-26 20:00:00. The next time I update the data I'd like to delete somehow the rows that already appear and append only the rest. So it would continue with 2020-10-26 20-10-26 20:15:00, because I'm downloading the 15 minute timeframe in this case. How can I update the data correctly?

Also when updating if the file already exists it copies the column headers which is something I don't want to do.

import os
import glob
import pandas as pd

from typing import List
from requests import get
from pathlib import Path
import os.path
import sys

BASE_URL= 'https://www.alphavantage.co/'


def download_previous_data(
    file: str,
    ticker: str,
    timeframe: str,
    slices: List,
):
    for _slice in slices:
        url = f'{BASE_URL}query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol={ticker}&interval={timeframe}&slice={_slice}&apikey=demo&datatype=csv'
        pd.read_csv(url).iloc[::-1].to_csv(file, index=False, encoding='utf-8-sig')


def main():

    # Get a list of all ticker symbols
    print('Downloading ticker symbols:')
    #df = pd.read_csv('https://www.alphavantage.co/query?function=LISTING_STATUS&apikey=demo')
    #tickers = df['symbol'].tolist()
    tickers = ['IBM']

    timeframes = ['1min', '5min', '15min', '30min', '60min']

    # To download the data in a subdirectory where the script is located
    modpath = os.path.dirname(os.path.abspath(sys.argv[0]))

    # Make sure the download folders exists
    for timeframe in timeframes:
        download_path = f'{modpath}/{timeframe}'
        #download_path = f'/media/user/Portable Drive/Trading/data/{timeframe}'
        Path(download_path).mkdir(parents=True, exist_ok=True)

    # For each ticker symbol download all data available for each timeframe
    # except for the last month which would be incomplete.
    # Each download iteration has to be in a 'try except' in case the ticker symbol isn't available on alphavantage
    for ticker in tickers:
        print(f'Downloading data for {ticker}...')
        for timeframe in timeframes:
            download_path = f'{modpath}/{timeframe}'
            filepath = f'{download_path}/{ticker}.csv'

            # NOTE:
            # To ensure optimal API response speed, the trailing 2 years of intraday data is evenly divided into 24 "slices" - year1month1, year1month2,
            # year1month3, ..., year1month11, year1month12, year2month1, year2month2, year2month3, ..., year2month11, year2month12.
            # Each slice is a 30-day window, with year1month1 being the most recent and year2month12 being the farthest from today.
            # By default, slice=year1month1

            if Path(filepath).is_file():  # if the file already exists
                # download the previous to last month
                slices = ['year1month2']
                download_previous_data(filepath, ticker, timeframe, slices)
            else:  # if the file doesn't exist
                # download the two previous years
                #slices = ['year2month12', 'year2month11', 'year2month10', 'year2month9', 'year2month8', 'year2month7', 'year2month6', 'year2month5', 'year2month4', 'year2month3', 'year2month2', 'year2month1', 'year1month12', 'year1month11', 'year1month10', 'year1month9', 'year1month8', 'year1month7', 'year1month6', 'year1month5', 'year1month4', 'year1month3', 'year1month2']
                slices = ['year1month2']
                download_previous_data(filepath, ticker, timeframe, slices)


if __name__ == '__main__':
    main()


from Recent Questions - Stack Overflow https://ift.tt/3l86n1F
https://ift.tt/eA8V8J

No comments:

Post a Comment