Purpose of participation
Stock price prediction is pretty distant from the field of my interest, which is NLP.
But there are several purpose that I spent time participating in no-reward competition.
- Data wrangling practice: whatever datatype it be, I think it is important to wrangle as many data as possible. It is for the numpy & pandas practice
- Refreshing: I love studying the field of machine learning and deep learning. Although it is important to study deep enough, I think some refreshment helps the continuity of studying.
-
Somehow connected!
- The usage of cross validation
- Markov-chain vs matrix approach
- usage of sine, cosine function for timeseries annotation
- Regularization at ElasticNet might be also regarded as weight decay.
Using ElasticNetCV for stock price prediction
Adding more data
- external data: WTI, NASDAQ, was tested, but it was no good
- derivative data: after @SangHoeKim constructed derivative data, I added VWAP and SMI but didn't increase the model's performance
Choosing Adaquate Model
- When choosing a model, it was important that the model 1) don't fluctuate its performance over periods of time, 2) has higher performance than Linear Regression.
-
The original plan was to ensemble 1) time-series forecast model with 2) regression model. Regression model ElasticNetCV turned out to be the best model, but ARIMA model's performance was too overfit for specific periods only to use. For example, ARIMA was good for predicting overall bear market / overall bull market but was bad at predicting fluctuation between the spand of 5 days.
- ElasticNetCV was the right choice. It was advanced version of Linear Regression, and it has provided K-fold validation structure so that it comparably had more robustness compared to ElasticNet.
- ARIMA(0,1,1) showed pretty good performance. This was out of expectation, since (p, q, d) = (0, 1, 1) is just moving average model with constant uprise/downfall of differentiation.
- RandomForestRegressor and pmdARIMA was selected since its criteria can be customized as MAE instead of MSE.
- pmdARIMA is implementation of AutoARIMA from RStudio, but its performance was too low to use.
- XGBoostRegressor turned out to be not good. Not only this was my team's conclusion, but the general consensus over the Dacon community was similar. However @tjy3090 pointed out that XGB captured the turning point and the pattern of the graph, even though its NMAE score was low.
- Compared between ffill vs dropna(), but concluded that ffill outperformed on more recend dates.
Model | NMAE (2021-09-06 ~ 2021-09-10) | NMAE (2021-09-15 ~ 2021-09-24) |
---|---|---|
ElasticNetCV | 3.02 | 2.93 |
ARIMA(0,1,1) | 3.03 | - |
ElasticNet | 3.12 | - |
XGBoost | 3.87 | - |
Linear Regression | 4.03 | - |
RFRegressor | 4.11 | - |
pmdARIMA | 8.81 | - |
- Experimented with ARIMA, pmdARIMA, ElasticNetCV, RandomForestRegressor and XGBoost.
- Competition's criteria is based on NMAE. Grading criteria function was constructed as following:
def NMAE(true_df, pred_df_input):
""" grading criteria for public leader board """
return (abs(true_df_copy - pred_df_input) / true_df_copy * 100).iloc[:5].values.mean()
- Thanks for setting up grading criteria, the team was able to experiment different kinds of models without being limited by submission quota.
Code and Outputs
# import python modules
import os
import copy
# import data wrangling modules
import pandas as pd
import numpy as np
# import visualization modules
from tqdm import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
# import dataloader & stock data downloader
import FinanceDataReader as fdr
# import machine learning modules
from sklearn.preprocessing import MinMaxScaler, RobustScaler, StandardScaler
from sklearn.model_selection import train_test_split
# import machine learning models
from sklearn.linear_model import LinearRegression, ElasticNet, ElasticNetCV
from xgboost import XGBRegressor
# import cell output cleaner
from IPython.display import clear_output
# import stock data derivatives module
from talib import RSI, BBANDS, MACD, ATR
import warnings
warnings.filterwarnings('ignore') # remove warnings from pandas
# set options pandas: display more columns and reduced digits of float numbers
pd.set_option('display.max_columns', 100)
pd.options.display.float_format = '{:.4f}'.format
path = './open'
list_name = 'Stock_List.csv'
stock_list = pd.read_csv(os.path.join(path,list_name))
stock_list['종목코드'] = stock_list['종목코드'].apply(lambda x : str(x).zfill(6))
display(stock_list.head())
display(stock_list.tail())
종목명 | 종목코드 | 상장시장 | |
---|---|---|---|
0 | 삼성전자 | 005930 | KOSPI |
1 | SK하이닉스 | 000660 | KOSPI |
2 | NAVER | 035420 | KOSPI |
3 | 카카오 | 035720 | KOSPI |
4 | 삼성바이오로직스 | 207940 | KOSPI |
종목명 | 종목코드 | 상장시장 | |
---|---|---|---|
371 | 더네이쳐홀딩스 | 298540 | KOSDAQ |
372 | 코엔텍 | 029960 | KOSDAQ |
373 | 원익홀딩스 | 030530 | KOSDAQ |
374 | 웹케시 | 053580 | KOSDAQ |
375 | 신흥에스이씨 | 243840 | KOSDAQ |
# define dates
""" Make dataset for training & evaluation """
from datetime import datetime
import os
# determines whether it is public leaderboard period or new_public period
BOOL_PUBLIC_PERIOD = False
# Determines whether to use public way or private way to utilize data
BOOL_PUBLIC = False
# set data window range
start_date = None # if this was set to 2021-01-04, then the mae shoots up to from 3.0 to 4.7
today = datetime.now()
data_end_date = today.strftime('%Y-%m-%d')
# public_start_date = '2021-09-10'
public_start_4th = '2021-09-06'
public_end_date = '2021-09-10'
check_point = '2021-09-14'
recent_known_date = '2021-09-24'
private_start_4th = '2021-09-27'
private_end_4th = '2021-10-01'
if BOOL_PUBLIC_PERIOD:
sample_name = 'sample_submission.csv' # week 4 submission
pred_df = pd.read_csv(os.path.join(path,sample_name)).set_index('Day')
else:
new_public_name = "new_public.csv"
pred_df = pd.read_csv(os.path.join(path,new_public_name)).set_index('Day')
pred_df
000060 | 000080 | 000100 | 000120 | 000150 | 000240 | 000250 | 000270 | 000660 | 000670 | 000720 | 000810 | 000880 | 000990 | 001230 | 001440 | 001450 | 001740 | 002380 | 002790 | 003000 | 003090 | 003380 | 003410 | 003490 | 003670 | 003800 | 004000 | 004020 | 004170 | 004370 | 004490 | 004800 | 004990 | 005250 | 005290 | 005300 | 005380 | 005385 | 005387 | 005490 | 005830 | 005850 | 005930 | 005935 | 005940 | 006260 | 006280 | 006360 | 006400 | ... | 253450 | 256840 | 263720 | 263750 | 267250 | 267980 | 268600 | 271560 | 272210 | 272290 | 273130 | 278280 | 278530 | 282330 | 285130 | 287410 | 290510 | 290650 | 292150 | 293490 | 293780 | 294090 | 294870 | 298000 | 298020 | 298050 | 298380 | 298540 | 299030 | 299660 | 299900 | 307950 | 314130 | 316140 | 319400 | 319660 | 321550 | 323990 | 326030 | 330590 | 330860 | 336260 | 336370 | 347860 | 348150 | 348210 | 352820 | 357780 | 363280 | 950130 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Day | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2021-09-15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-27 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-28 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-09-30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2021-10-01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10 rows × 376 columns
""" fill out ground truth stock price of the companies for the public submission period """
if BOOL_PUBLIC_PERIOD:
blank_submission = pd.read_csv(os.path.join(path,sample_name)).set_index('Day')
true_df = blank_submission.copy()
true_df_first_day = public_start_4th
true_df_last_day = public_end_date
else:
df_new_public = pd.read_csv(os.path.join(path,new_public_name)).set_index('Day')
true_df = df_new_public.copy()
true_df_first_day = true_df.iloc[0].name
true_df_last_day = true_df.iloc[-1].name
print(true_df_first_day, true_df_last_day)
2021-09-15 2021-10-01
for company_code in tqdm(true_df.columns):
data_raw = fdr.DataReader(company_code, start = start_date, end = private_start_4th).reset_index()
data_raw = data_raw.drop(columns = ['Change'])
data_raw = data_raw.replace(0, np.nan).ffill()
data_raw.index = data_raw.Date
# print(data_raw.head())
public_true_closes = data_raw.loc[true_df_first_day:true_df_last_day].Close.iloc[:]
# print(public_true_closes)
true_df.loc[:,company_code] = public_true_closes.to_list() * 2
true_df.head()
100%|██████████| 376/376 [00:50<00:00, 7.50it/s]
000060 | 000080 | 000100 | 000120 | 000150 | 000240 | 000250 | 000270 | 000660 | 000670 | 000720 | 000810 | 000880 | 000990 | 001230 | 001440 | 001450 | 001740 | 002380 | 002790 | 003000 | 003090 | 003380 | 003410 | 003490 | 003670 | 003800 | 004000 | 004020 | 004170 | 004370 | 004490 | 004800 | 004990 | 005250 | 005290 | 005300 | 005380 | 005385 | 005387 | 005490 | 005830 | 005850 | 005930 | 005935 | 005940 | 006260 | 006280 | 006360 | 006400 | ... | 253450 | 256840 | 263720 | 263750 | 267250 | 267980 | 268600 | 271560 | 272210 | 272290 | 273130 | 278280 | 278530 | 282330 | 285130 | 287410 | 290510 | 290650 | 292150 | 293490 | 293780 | 294090 | 294870 | 298000 | 298020 | 298050 | 298380 | 298540 | 299030 | 299660 | 299900 | 307950 | 314130 | 316140 | 319400 | 319660 | 321550 | 323990 | 326030 | 330590 | 330860 | 336260 | 336370 | 347860 | 348150 | 348210 | 352820 | 357780 | 363280 | 950130 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Day | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2021-09-15 | 30400 | 34700 | 65400 | 170500 | 96600 | 16800 | 51800 | 83300 | 107500 | 728000 | 54600 | 232000 | 35400 | 58500 | 20050 | 2770 | 25300 | 5470 | 455500 | 53200 | 21550 | 36200 | 9630 | 8480 | 32550 | 161500 | 50900 | 90500 | 52000 | 275000 | 294000 | 90600 | 122500 | 35400 | 32800 | 28350 | 151500 | 209500 | 102500 | 99700 | 361500 | 61800 | 28500 | 77000 | 71800 | 13400 | 68000 | 363000 | 45000 | 750000 | ... | 83100 | 43100 | 40500 | 84300 | 68600 | 73900 | 64100 | 129500 | 18800 | 39000 | 109300 | 279500 | 13545 | 179500 | 298500 | 7900 | 7840 | 38350 | 13910 | 72400 | 51000 | 52900 | 28450 | 401500 | 748000 | 753000 | 19600 | 32950 | 62400 | 108000 | 16300 | 107500 | 43800 | 11000 | 3505 | 35750 | 20000 | 78300 | 113000 | 5570 | 47600 | 51000 | 65700 | 33550 | 29850 | 59600 | 279500 | 302700 | 28250 | 17450 |
2021-09-16 | 30600 | 34700 | 64800 | 168000 | 102000 | 16800 | 50900 | 84500 | 104000 | 718000 | 55700 | 227000 | 35300 | 56700 | 20500 | 2705 | 24900 | 5450 | 458000 | 52600 | 21200 | 34900 | 9540 | 8460 | 32100 | 163000 | 51000 | 91300 | 51600 | 271500 | 293000 | 90300 | 124500 | 35000 | 32650 | 29450 | 151500 | 208000 | 102500 | 99800 | 364500 | 61400 | 27800 | 76100 | 71500 | 13250 | 68500 | 365000 | 46100 | 725000 | ... | 82300 | 43600 | 40500 | 82000 | 69800 | 73300 | 60500 | 125000 | 18850 | 38900 | 109365 | 276300 | 13425 | 174500 | 297500 | 7760 | 7680 | 39700 | 13745 | 71300 | 49750 | 53000 | 28850 | 419500 | 756000 | 777000 | 19400 | 32600 | 61900 | 113500 | 16150 | 109000 | 46800 | 11000 | 3515 | 35000 | 19850 | 80300 | 114000 | 5530 | 47400 | 51100 | 66000 | 33200 | 30150 | 58400 | 276500 | 294700 | 28150 | 16850 |
2021-09-17 | 30350 | 34050 | 64500 | 169000 | 100000 | 16550 | 51700 | 84100 | 107000 | 718000 | 56200 | 227500 | 35050 | 57600 | 19950 | 2665 | 24750 | 5350 | 454000 | 53100 | 21100 | 34950 | 9570 | 8350 | 32950 | 161000 | 50900 | 91100 | 51500 | 271000 | 290000 | 89500 | 123000 | 34750 | 32950 | 30950 | 149000 | 209000 | 99600 | 97500 | 362500 | 60800 | 27600 | 77200 | 71800 | 13400 | 67400 | 358500 | 46150 | 725000 | ... | 82500 | 49750 | 39950 | 82400 | 65300 | 72400 | 61500 | 126000 | 19200 | 39450 | 109225 | 269400 | 13435 | 175000 | 288000 | 7700 | 7700 | 39550 | 13835 | 72200 | 50600 | 53900 | 28600 | 423000 | 733000 | 810000 | 19500 | 32450 | 61800 | 118000 | 16300 | 108000 | 46950 | 11050 | 3505 | 38300 | 19900 | 80900 | 113500 | 5560 | 47900 | 50500 | 65100 | 33050 | 30150 | 59500 | 276500 | 296900 | 28050 | 17300 |
2021-09-23 | 30100 | 33700 | 63700 | 167500 | 100500 | 16200 | 51600 | 84200 | 105500 | 699000 | 54600 | 227500 | 34950 | 56900 | 19000 | 2580 | 24850 | 5300 | 443500 | 51700 | 20800 | 34950 | 9560 | 8330 | 34200 | 158000 | 51100 | 94100 | 48750 | 269500 | 289000 | 87200 | 122000 | 34550 | 32750 | 30950 | 149000 | 208500 | 100000 | 98300 | 346500 | 63000 | 27150 | 77400 | 71800 | 13250 | 66300 | 354500 | 44950 | 731000 | ... | 84100 | 52800 | 38700 | 82200 | 64400 | 73100 | 59000 | 124000 | 19000 | 41150 | 109125 | 268000 | 13410 | 171500 | 289500 | 7270 | 7590 | 38500 | 13850 | 68900 | 45900 | 52600 | 28050 | 400500 | 712000 | 799000 | 20200 | 31450 | 60500 | 118000 | 16800 | 107000 | 44850 | 11100 | 3395 | 36800 | 19100 | 79700 | 109500 | 5550 | 46750 | 50100 | 65400 | 32000 | 29250 | 56800 | 270500 | 290100 | 28150 | 20450 |
2021-09-24 | 29600 | 33400 | 62900 | 166500 | 97000 | 16300 | 51100 | 82600 | 104000 | 696000 | 53200 | 228500 | 34650 | 56700 | 18700 | 2525 | 25000 | 5300 | 455000 | 50800 | 20450 | 34300 | 9540 | 8250 | 34250 | 162000 | 49350 | 93000 | 48650 | 272000 | 289500 | 85400 | 122500 | 34600 | 31950 | 31000 | 149000 | 207000 | 99500 | 98200 | 345000 | 62700 | 27700 | 77300 | 71700 | 13150 | 66100 | 344500 | 44400 | 726000 | ... | 85100 | 54500 | 40950 | 87400 | 64100 | 72900 | 59000 | 122000 | 18500 | 41300 | 108840 | 272900 | 13435 | 172500 | 283000 | 7780 | 7530 | 38950 | 13825 | 68300 | 45400 | 51500 | 28000 | 400500 | 713000 | 860000 | 19800 | 31500 | 60700 | 110900 | 16650 | 105500 | 45050 | 11150 | 3430 | 36900 | 18800 | 76700 | 107000 | 5530 | 46700 | 49050 | 67600 | 31700 | 29050 | 56900 | 268000 | 291000 | 28600 | 20050 |
5 rows × 376 columns
def makeData(data):
"""
- make derivative data from the fdr dataset
- not using external dataset which increases noise that harms the performance
"""
data = data.copy()
# Korean won volume
data['KRW_Vol'] = data[['Close', 'Volume']].prod(axis=1)
data['KRW_Vol_1m'] = data.KRW_Vol.rolling(21).mean()
# RSI
data['RSI'] = RSI(data.Close)
# Bollinger Bands
def compute_bb(close):
high, mid, low = BBANDS(close, timeperiod=20)
return pd.DataFrame({'bb_high': high, 'bb_low': low}, index=close.index)
data[['bb_high', 'bb_low']] = compute_bb(data.Close)
data['bb_high'] = data.bb_high.sub(data.Close).div(data.bb_high).apply(np.log1p)
data['bb_low'] = data.Close.sub(data.bb_low).div(data.Close).apply(np.log1p)
# ATR
def compute_atr(stock_data):
df = ATR(stock_data.High, stock_data.Low,
stock_data.Close, timeperiod=14)
return df.sub(df.mean()).div(df.std())
data['ATR'] = compute_atr(data)
#MACD
def compute_macd(close):
macd = MACD(close)[0]
return (macd - np.mean(macd))/np.std(macd)
data['MACD'] = compute_macd(data.Close)
# Lagged Returns
lags = [1, 2, 3, 4, 5, 10, 21, 42, 63]
for lag in lags:
data[f'return_{lag}d'] = data.Close.pct_change(lag).add(1).pow(1 / lag).sub(1)
for t in [1, 2, 3, 4, 5]:
for lag in [1, 5, 10, 21]:
data[f'return_{lag}d_lag{t}'] = data[f'return_{lag}d'].shift(t * lag)
# target return
for t in [1, 2, 3, 4, 5]:
data[f'target_{t}d'] = data[f'return_{t}d'].shift(-t)
# volume change
q = 0.01
data[data.filter(like='Vol').columns] = data.filter(like='Vol').pct_change().apply(lambda x: x.clip(lower=x.quantile(q),upper=x.quantile(1 - q)))
data = data.drop(['Date', 'Open', 'High', 'Low', 'Close'], axis= 1)
data = data.fillna(method="ffill")
data = data.fillna(method="bfill")
# display(data.tail())
return data
def mape(true_df, pred_df_input):
""" grading criteria for public leader board """
# extract columns from true_df same as pred_df_input
true_df_copy = true_df.copy()
true_df_copy = true_df_copy.loc[:, pred_df_input.columns]
true_df_copy = true_df_copy.iloc[:5]
pred_df_input = pred_df_input.iloc[:5]
return (abs(true_df_copy - pred_df_input) / true_df_copy * 100).iloc[:5].values.mean()
def sklearn_predict(
pred_df_input,
model,
bool_public,
start_date,
recent_known_date,
clip=True,
scaler=RobustScaler()
):
""" make prediction dataframe """
if bool_public:
# get first five rows of pred_df_input
pred_df_input = pred_df_input.iloc[:5]
else:
# dropping first 5 days of public dates from the submission dataframe
pred_df_input = pred_df_input.iloc[5:]
for company_code in tqdm(pred_df_input.columns):
# clear_output()
data_raw = fdr.DataReader(
company_code,
start = start_date,
end = recent_known_date
).reset_index()
data_raw = data_raw.drop(columns = 'Change')
data_raw = data_raw.replace(0, np.nan).ffill()
data_raw.index = data_raw.Date
# clear_output() # clear output for the jupyter notebook
if bool_public: # if public submission
# make necessary data for the prediction
data = makeData(data_raw)
# get the last date of the public submission period
public_last_close = data_raw.loc[recent_known_date].Close
public = data.loc[[recent_known_date]]
# loc[] is inclusive for the end slicing date, unlike list slicing
train_indv_comp = data.loc[:recent_known_date]
Ys = train_indv_comp.filter(like='target')
X = train_indv_comp.drop(Ys.columns, axis=1)
Ys_public = public.filter(like='target')
X_public = public.drop(Ys_public.columns, axis=1)
X = scaler.fit_transform(X)
X_public = scaler.transform(X_public)
pred_public = []
public_close = public_last_close
for y_col in Ys.columns:
model.fit(X, Ys[y_col])
r_pred_public = model.predict(X_public)
public_close = public_close * (1+r_pred_public[0])
pred_public.append(public_close)
# print(pred_df_input.shape)
# display(pred_df_input)
pred_df_input.loc[:,company_code] = pred_public
else: # if private submission
data = makeData(data_raw)
# display(pred_df_input)
private_last_close = data_raw.loc[recent_known_date].Close
# display(data)
private = data.loc[[recent_known_date]]
# display(private)
train_indv_comp = data.loc[:recent_known_date]
# display(train_indv_comp)
# make train_indv_comp data
Ys = train_indv_comp.filter(like='target') #Consisted of mon, tue, wed, thur, fri data columns
X = train_indv_comp.drop(Ys.columns, axis=1)
# make private data
Ys_private = private.filter(like='target')
X_private = private.drop(Ys_private.columns, axis=1)
# fit scaler
X = scaler.fit_transform(X)
X_private = scaler.transform(X_private)
pred_private = []
private_close = private_last_close
for y_col in Ys.columns:
model.fit(X, Ys[y_col])
r_pred_private = model.predict(X_private)
private_close = private_close * (1+r_pred_private[0])
pred_private.append(private_close)
# print(pred_private)
pred_df_input.loc[:,company_code] = pred_private
# display(pred_df_input)
return pred_df_input
Check the performance for sampled 20 companies
""" sample prediction """
# Utilizing ElasticnetCV to automatically tune the alpha hyperparameter with cross validation.
# (rho is left at default value 0.5)
# ElasticnetCV Documentation: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#sklearn.linear_model.ElasticNetCV
# Cross Validation documentation: https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
# Builtin Cross Validation provided models: https://scikit-learn.org/stable/glossary.html#term-cross-validation-estimator
"""
https://stackoverflow.com/questions/12283184/how-is-elastic-net-used
To automatically tune the value of alpha it is indeed possible to use ElasticNetCV which will spare redundant computation as apposed to using GridSearchCV in the ElasticNet class for tuning alpha.
In complement, you can use a regular GridSearchCV for finding the optimal value of rho.
"""
# name = "ElasticNet_ALL_External_Dataset"
name = "ElaticNetCV_private"
# random sample 20 companies
PRED_NUM = 20
if PRED_NUM != None:
pred_df_sampled = pred_df.copy()
pred_df_sampled = pred_df_sampled.sample(PRED_NUM, axis=1, random_state=616)
scaler = StandardScaler()
model = ElasticNetCV(max_iter=1000000) # default K-Fold is 5 folds
df_result_sampled = sklearn_predict(
pred_df_input = pred_df_sampled,
model = model,
bool_public= True,
start_date=start_date,
recent_known_date = check_point,
clip=True,
scaler=scaler
)
df_result_sampled_copy = df_result_sampled.copy()
df_result_sampled_copy
185750 | 024110 | 206650 | 038500 | 067630 | 241560 | 298000 | 035900 | 005387 | 000990 | 293490 | 078130 | 035420 | 019170 | 217270 | 319660 | 161390 | 035600 | 014680 | 041510 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Day | ||||||||||||||||||||
2021-09-15 | 125064.7540 | 10254.9362 | 42618.1827 | 5372.9996 | 19360.0847 | 42520.2396 | 398721.4753 | 40413.6252 | 99356.3084 | 58953.4027 | 72497.5907 | 5427.5472 | 403990.9318 | 67500.6979 | 18626.1257 | 35912.5445 | 45053.3026 | 18723.2635 | 327323.1354 | 64794.6523 |
2021-09-16 | 125139.3222 | 10258.2773 | 42721.3990 | 5372.6779 | 19387.2423 | 42534.2163 | 399329.2842 | 40504.0768 | 99247.8148 | 58987.6293 | 72439.7577 | 5433.1925 | 405246.7272 | 67571.6898 | 18639.1341 | 35965.5198 | 45040.3133 | 18740.6921 | 327575.8054 | 64930.5748 |
2021-09-17 | 125205.0949 | 10261.0199 | 42818.9030 | 5371.4292 | 19410.6680 | 42545.0733 | 399883.2743 | 40574.4049 | 99185.6754 | 59015.3259 | 72452.6539 | 5438.2213 | 406134.5774 | 67632.3829 | 18649.8843 | 36014.5010 | 45030.5329 | 18759.7012 | 327804.9015 | 64994.3200 |
2021-09-23 | 125155.5177 | 10263.4246 | 42915.5822 | 5369.6404 | 19432.5767 | 42553.3841 | 400448.6410 | 40646.8703 | 99134.4965 | 59039.3481 | 72379.0831 | 5442.9652 | 406902.2789 | 67687.9173 | 18659.4037 | 36058.6550 | 45023.5971 | 18773.1005 | 328019.5180 | 65052.2335 |
2021-09-24 | 125093.3296 | 10265.6863 | 43001.4697 | 5367.4838 | 19453.5579 | 42561.5443 | 401019.1642 | 40716.1010 | 99099.2499 | 59061.2745 | 72294.9027 | 5447.5175 | 407592.0695 | 67740.2365 | 18667.9796 | 36102.0524 | 45011.9833 | 18785.5887 | 328226.2354 | 65100.9893 |
mape_value = mape(true_df, df_result_sampled_copy)
print(mape_value)
2.9318977420110794
전체
""" training time: 3444 second(or 57 minutes) """
# name = "ElasticNet_ALL_External_Dataset"
name = "ElasticNetCV_private"
# predict for all 376 companies
PRED_NUM = None
scaler = StandardScaler()
model = ElasticNetCV(max_iter=1000000) # default K-Fold is 5 folds
df_result_all = sklearn_predict(
pred_df,
model,
bool_public= BOOL_PUBLIC,
start_date=start_date,
recent_known_date = recent_known_date,
clip=True,
scaler=scaler
)
100%|██████████| 376/376 [1:02:57<00:00, 10.05s/it]
display(df_result_all)
000060 | 000080 | 000100 | 000120 | 000150 | 000240 | 000250 | 000270 | 000660 | 000670 | 000720 | 000810 | 000880 | 000990 | 001230 | 001440 | 001450 | 001740 | 002380 | 002790 | 003000 | 003090 | 003380 | 003410 | 003490 | 003670 | 003800 | 004000 | 004020 | 004170 | 004370 | 004490 | 004800 | 004990 | 005250 | 005290 | 005300 | 005380 | 005385 | 005387 | 005490 | 005830 | 005850 | 005930 | 005935 | 005940 | 006260 | 006280 | 006360 | 006400 | ... | 253450 | 256840 | 263720 | 263750 | 267250 | 267980 | 268600 | 271560 | 272210 | 272290 | 273130 | 278280 | 278530 | 282330 | 285130 | 287410 | 290510 | 290650 | 292150 | 293490 | 293780 | 294090 | 294870 | 298000 | 298020 | 298050 | 298380 | 298540 | 299030 | 299660 | 299900 | 307950 | 314130 | 316140 | 319400 | 319660 | 321550 | 323990 | 326030 | 330590 | 330860 | 336260 | 336370 | 347860 | 348150 | 348210 | 352820 | 357780 | 363280 | 950130 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Day | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2021-09-27 | 29619.9213 | 33404.8682 | 62934.1627 | 166627.0991 | 96951.3550 | 16318.2245 | 51157.0890 | 82640.1175 | 104056.0722 | 696633.1831 | 53210.2029 | 228543.4327 | 34616.2408 | 56750.9950 | 18713.0184 | 2525.4144 | 25020.4805 | 5300.8629 | 455412.5823 | 50759.8474 | 20476.7460 | 34335.2240 | 9536.7369 | 8255.9627 | 34271.9425 | 162211.5139 | 49392.2069 | 92991.9337 | 48689.8328 | 272236.9822 | 289666.8211 | 85478.4144 | 122615.8457 | 34643.0840 | 31984.6833 | 31032.8182 | 149020.7516 | 206984.5334 | 99366.3220 | 98193.8653 | 345410.9456 | 62765.9660 | 27726.3380 | 77250.9716 | 71617.0677 | 13158.5612 | 66146.4555 | 344804.7013 | 44426.1867 | 726008.7348 | ... | 85134.8412 | 55242.7675 | 41037.1551 | 87580.0575 | 64106.1314 | 72915.4644 | 59209.0412 | 122193.8499 | 18533.3206 | 41371.7568 | 108823.5392 | 273798.8944 | 13438.8918 | 172536.5767 | 282948.8369 | 7795.4170 | 7536.9262 | 39055.3084 | 13833.1392 | 68344.4815 | 45389.0326 | 51583.5746 | 27979.6905 | 400906.3756 | 710964.3401 | 871526.8897 | 19831.0911 | 31559.7435 | 60848.2452 | 111160.9148 | 16692.1713 | 105428.1348 | 45187.3560 | 11146.7696 | 3438.2884 | 36965.8713 | 18833.2831 | 76519.7910 | 106993.1205 | 5528.7665 | 46765.8625 | 49179.3201 | 67486.5771 | 31774.1941 | 29074.3546 | 56910.9233 | 268362.2621 | 290074.0721 | 28644.1991 | 20074.8843 |
2021-09-28 | 29641.2372 | 33410.9766 | 62996.9338 | 166704.4970 | 97004.0898 | 16335.1979 | 51198.9499 | 82659.9783 | 104066.5543 | 697048.4354 | 53154.2625 | 228638.8170 | 34595.6852 | 56783.5609 | 18720.7457 | 2524.9961 | 25035.1632 | 5299.6526 | 455698.5859 | 50696.7335 | 20497.6473 | 34358.2641 | 9531.8428 | 8259.1139 | 34295.1064 | 162433.0638 | 49423.3173 | 92950.0978 | 48739.4518 | 272558.1387 | 289818.6419 | 85665.7054 | 122699.8400 | 34683.6341 | 32012.5879 | 31013.3876 | 149093.8104 | 207123.2654 | 99285.9642 | 98222.7768 | 345734.1227 | 62823.5848 | 27745.6550 | 77270.0037 | 71642.3679 | 13163.4640 | 66179.3712 | 344998.0796 | 44450.9740 | 726398.5421 | ... | 85150.5989 | 55572.7020 | 41155.7774 | 87730.7834 | 64107.7192 | 72911.9722 | 59367.1055 | 122342.1253 | 18561.5405 | 41391.4693 | 108832.3748 | 274609.9104 | 13443.2515 | 172502.3226 | 282953.2805 | 7811.3651 | 7541.5022 | 39147.3212 | 13840.5457 | 68263.6724 | 45478.3515 | 51629.3426 | 27953.9254 | 400845.8235 | 710307.6560 | 880941.8171 | 19848.3055 | 31603.2093 | 60986.5374 | 111366.8870 | 16730.7055 | 105360.5544 | 45304.5754 | 11141.9899 | 3440.4603 | 37020.5905 | 18850.3581 | 76408.3022 | 106751.9680 | 5527.3493 | 46823.7669 | 49393.8403 | 67488.7739 | 31799.0865 | 29069.9619 | 56848.9364 | 269112.6391 | 289045.8422 | 28671.6626 | 20092.6880 |
2021-09-29 | 29659.9998 | 33414.0403 | 63060.8037 | 166766.6097 | 97047.5968 | 16350.5709 | 51235.1900 | 82672.4479 | 104029.4717 | 697398.8750 | 53127.4203 | 228808.0665 | 34591.6177 | 56809.9294 | 18726.6707 | 2524.2865 | 25047.6200 | 5297.8121 | 455953.6829 | 50647.3469 | 20511.6095 | 34379.0331 | 9526.7009 | 8261.3723 | 34311.4601 | 162599.8111 | 49450.8603 | 92915.7797 | 48838.7770 | 272890.9565 | 289973.1638 | 85718.1060 | 122772.9330 | 34729.9229 | 32037.7311 | 31013.6800 | 149205.9845 | 207299.9896 | 99307.0351 | 98257.5142 | 346058.4688 | 62874.2882 | 27762.7063 | 77304.3423 | 71660.5777 | 13166.8532 | 66207.6025 | 345139.2794 | 44471.1243 | 726786.1160 | ... | 85155.3182 | 55919.4128 | 41232.5346 | 87873.5729 | 64103.3814 | 72905.3485 | 59515.5655 | 122422.1061 | 18588.3547 | 41409.0996 | 108841.1698 | 275375.6071 | 13447.6955 | 172457.9531 | 282988.4423 | 7825.7740 | 7545.5702 | 39236.0164 | 13847.9131 | 68146.6331 | 45566.7920 | 51948.7495 | 27927.4594 | 401388.8262 | 710597.3525 | 891051.5782 | 19860.8061 | 31642.8484 | 61120.5119 | 111560.2977 | 16767.3728 | 105286.1974 | 45412.5893 | 11136.9467 | 3442.4249 | 37077.8781 | 18866.4524 | 75868.4705 | 106511.4224 | 5526.0227 | 46879.8267 | 49616.5520 | 67502.8980 | 31817.8737 | 29062.0097 | 56912.9122 | 269886.0036 | 289047.2415 | 28689.7233 | 20106.9512 |
2021-09-30 | 29677.6329 | 33418.5014 | 63124.2603 | 166819.7420 | 97087.3846 | 16363.0247 | 51268.0538 | 82681.2358 | 104028.8458 | 697716.3061 | 53120.1486 | 228978.3769 | 34592.4368 | 56832.7125 | 18735.2131 | 2523.4326 | 25058.8927 | 5295.6598 | 456193.9527 | 50600.5772 | 20524.5186 | 34398.6680 | 9521.3283 | 8263.1191 | 34326.0686 | 162741.3691 | 49476.8197 | 92945.8520 | 48980.0360 | 273205.5961 | 290114.5844 | 85767.4181 | 122840.7642 | 34760.9855 | 32059.8130 | 31001.9968 | 149288.2221 | 207442.3514 | 99319.4612 | 98298.1543 | 346370.0238 | 62921.4168 | 27792.9857 | 77346.4759 | 71697.8110 | 13161.0052 | 66233.6612 | 345259.8600 | 44489.0416 | 727210.4963 | ... | 85183.3230 | 56093.8298 | 41308.5059 | 88007.7671 | 64100.4969 | 72897.1765 | 59654.9000 | 122468.4388 | 18614.0005 | 41424.6338 | 108850.1303 | 276138.1048 | 13451.9330 | 172398.1197 | 283351.6078 | 7842.5450 | 7548.6130 | 39325.1257 | 13855.0212 | 68041.5214 | 45651.3523 | 52262.2849 | 27902.0970 | 401944.6413 | 711507.5729 | 899799.3356 | 19871.2709 | 31679.9238 | 61253.1297 | 111754.5069 | 16804.7915 | 105233.4033 | 45517.8564 | 11131.7003 | 3444.3205 | 37129.6121 | 18875.7521 | 75596.5935 | 106310.3201 | 5524.5951 | 46926.6664 | 49832.2755 | 67486.1650 | 31820.3359 | 29041.0651 | 57020.0201 | 271019.1920 | 287825.4264 | 28709.3131 | 20119.0428 |
2021-10-01 | 29694.6207 | 33423.9375 | 63193.3635 | 166868.1106 | 97124.4400 | 16345.2020 | 51298.7895 | 82687.5453 | 104023.7073 | 698015.0172 | 53134.0349 | 229134.0139 | 34598.0681 | 56853.3770 | 18742.5141 | 2522.5170 | 25069.4874 | 5293.3238 | 456423.1230 | 50563.0390 | 20536.8711 | 34415.6411 | 9515.8036 | 8264.5175 | 34326.4535 | 162881.1631 | 49475.1053 | 92973.7446 | 49028.6449 | 273507.2655 | 290222.1305 | 85814.4315 | 122867.5208 | 34787.7937 | 32081.1379 | 30982.2917 | 149367.6134 | 207596.9442 | 99316.1011 | 98349.8378 | 346657.8080 | 62966.5126 | 27821.7097 | 77389.4860 | 71756.7589 | 13156.2628 | 66258.1970 | 345369.7252 | 44505.3942 | 727541.8449 | ... | 85210.5932 | 56443.1028 | 41380.6160 | 88139.9194 | 64095.7902 | 72888.7033 | 59793.2597 | 122519.5538 | 18640.2923 | 41437.6867 | 108859.2157 | 276890.2609 | 13456.2170 | 172334.4949 | 283873.0039 | 7889.8827 | 7550.9179 | 39412.9016 | 13862.0739 | 67979.0034 | 45735.5675 | 52611.6164 | 27877.6216 | 402492.1156 | 712434.5186 | 908492.8089 | 19880.8972 | 31718.5363 | 61385.7797 | 111941.7628 | 16845.4909 | 105197.0065 | 45590.5963 | 11126.2165 | 3446.0998 | 37179.4420 | 18881.2332 | 75544.5761 | 106053.9055 | 5522.9875 | 46959.3655 | 50044.8621 | 67442.7575 | 31815.1584 | 29013.0873 | 57201.4429 | 272120.7188 | 287296.5950 | 28730.4632 | 20130.6205 |
5 rows × 376 columns
name = "ElasticNetCV_private_last"
df_result_all.to_csv(f'predict/{name}.csv')
# load result from csv
df_result_all = pd.read_csv(f'predict/{name}.csv')
# apply retina display for clearer visualization
%config InlineBackend.figure_format = 'retina'
# Korean font path designation
import matplotlib.font_manager as fm
fontpath = './font/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=14)
# sample 9 companies from stock_list
SAMPLE_NUM = 16
sampled_data = stock_list.sample(SAMPLE_NUM)
# visualize 16 companies subplots from df_result_all
fig, ax = plt.subplots(nrows=4, ncols=4, figsize=(16,16))
# make margin between figures
plt.subplots_adjust(hspace=0.5, wspace=0.5)
# visualize df_result_all
for i, code in enumerate(sampled_data["종목코드"]):
ax[i//4, i%4].plot(df_result_all.loc[:,code])
# find the matching row of sampled_data from code
company_name = stock_list[stock_list["종목코드"] == code]["종목명"].values[0]
ax[i//4, i%4].set_title(company_name, fontproperties=font)
ax[i//4, i%4].set_xlabel('Date')
ax[i//4, i%4].set_ylabel('Close Price')
ax[i//4, i%4].grid()
# check submission's validity
submission_last = pd.read_csv("./predict/ElasticNetCV_2021-09-26.csv")
# show information of the submission
submission_last.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Columns: 377 entries, Day to 950130
dtypes: float64(376), object(1)
memory usage: 29.6+ KB