The stock market is a complex and dynamic system where the prices of securities fluctuate based on various factors, such as economic conditions, political events, and investor sentiment. Accurately predicting stock prices has been a long-standing challenge for both individual investors and financial institutions. Traditional financial models often fail to capture the non-linear and complex relationships in the stock market, leading to the need for more advanced techniques.
In recent years, the field of machine learning has shown great potential in addressing the stock price prediction problem. Machine learning algorithms can analyze vast amounts of data, identify patterns, and make predictions with higher accuracy compared to traditional methods. By leveraging historical stock data, macroeconomic indicators, and other relevant features, machine learning models can be trained to forecast future stock prices, providing valuable insights for investment decisions.
The primary objective of this project is to develop a machine learning-based model that can accurately predict the future stock prices of a specific company, ITC Ltd. (ITC.NS), using a combination of historical stock data and other relevant features.
The scope of this project includes:
- Collecting and preprocessing the necessary data, including historical stock prices, trading volumes, and other relevant financial indicators.
- Exploring and selecting the most influential features for the stock price prediction task.
- Designing and implementing a machine learning model, specifically a Random Forest Classifier, to forecast the future stock prices.
- Evaluating the performance of the model using appropriate metrics, such as precision, recall, and F1-score.
- Analyzing the results and drawing insights from the model's predictions.
The main objectives of this project are:
- To develop a robust and accurate machine learning-based model for predicting the future stock prices of ITC Ltd.
- To explore the impact of various features, including technical indicators and macroeconomic factors, on the stock price prediction.
- To provide valuable insights and recommendations to investors and financial decision-makers based on the model's performance and predictions.
The rest of this report is organized as follows:
- Chapter 2: Methodology
- Chapter 3: Experimental Setup and Results
- Chapter 4: Discussion and Insights
- Chapter 5: Conclusion and Future Work
The dataset used in this project was obtained from Yahoo Finance, which provides historical stock data for various companies. The dataset includes the daily stock prices (open, high, low, close) and trading volumes for ITC Ltd. (ITC.NS) from January 2015 to October 2024.
The data was preprocessed by handling missing values, removing unnecessary columns (e.g., dividends, stock splits), and creating additional features, such as various technical indicators and moving averages. The "Tomorrow" column was created by shifting the "Close" column to represent the next day's closing price, which will be the target variable for the prediction task.
To enhance the model's predictive performance, several feature engineering techniques were employed. The following new features were created:
- Close Price Ratio: The ratio of the current close price to the rolling average close price over different time horizons (2, 5, 60, 250, and 1000 days).
- Trend: The sum of the "Target" variable (whether the next day's close price is higher or lower than the current day's close price) over different time horizons (2, 5, 60, 250, and 1000 days).
These features capture the relative changes in stock prices and the overall trend, which are expected to be informative for the prediction task.
For this project, a Random Forest Classifier was chosen as the machine learning model. Random Forest is an ensemble learning method that combines multiple decision trees to improve the overall prediction accuracy and robustness.
The model was trained using the preprocessed dataset, with the "Close", "Volume", "Open", "High", and "Low" columns as the predictor variables, and the "Target" column as the output variable. The model was trained using 500 estimators, and the minimum number of samples required to split an internal node was set to 3.
To evaluate the performance of the trained model, a backtest approach was used. The dataset was split into training and testing sets, with the training set growing in a sliding window fashion. The model was retrained at each step, and its predictions were recorded.
The primary evaluation metric used was the precision score, which measures the proportion of true positive predictions among all positive predictions. This metric was chosen because it is important to minimize the number of false positive predictions, as they can lead to suboptimal investment decisions.
The data preprocessing and feature engineering steps described in Section 2.1 and 2.2 were implemented using Python and the Pandas library. The resulting dataset had 2,652 rows and 11 columns, including the original features and the newly created ones.
The Random Forest Classifier model was implemented using the Scikit-learn library. The model was trained and evaluated using the backtest approach described in Section 2.4.
The precision score of the model's predictions on the test set was 0.78, indicating that 78% of the positive predictions (i.e., predictions of the stock price increasing) were accurate.
The breakdown of the model's predictions is as follows:
- Predicted stock price increase: 1,423
- Actual stock price increase: 1,152
- Correctly predicted stock price increase: 1,109
The results of the stock price prediction model demonstrate its ability to forecast the future stock price movements of ITC Ltd. with a reasonable level of accuracy. The precision score of 0.78 suggests that the model can be a valuable tool for investors and financial decision-makers.
Some key insights from the model's performance:
- The inclusion of technical indicators, such as the close price ratio and trend features, improved the model's predictive ability compared to using only the basic stock data (open, high, low, close, volume).
- The model was able to capture the non-linear relationships and complex patterns in the stock market data, which traditional financial models may have difficulty with.
- The backtest approach allowed for a more realistic evaluation of the model's performance, as it simulates the real-world scenario of making predictions based on the available data at each point in time.
While the developed model showed promising results, there are several limitations and challenges that should be acknowledged:
- The model's performance may be affected by external factors, such as macroeconomic conditions, geopolitical events, and industry-specific developments, which are not fully captured in the current set of features.
- The dataset used for this project is limited to the stock data of a single company, ITC Ltd. Expanding the analysis to a broader range of stocks or industries could provide a more comprehensive understanding of the model's performance.
- The backtest approach, while more realistic than a simple train-test split, still has limitations in simulating the real-world investment decision-making process, where factors such as transaction costs, market liquidity, and investor psychology play a significant role.
To enhance the model's performance and address the limitations, the following improvements and future work can be considered:
- Incorporate additional features, such as macroeconomic indicators, industry-specific data, and sentiment analysis from news articles or social media, to capture a broader range of factors influencing stock prices.
- Explore other machine learning algorithms, such as deep learning models, and compare their performance to the current Random Forest Classifier.
- Conduct a more comprehensive backtest, including simulated trading and portfolio optimization, to better understand the model's practical implications for investment strategies.
- Extend the analysis to a wider range of stocks, sectors, or markets to assess the model's generalizability and robustness.
- Investigate the interpretability of the model's predictions, providing investors with insights into the key drivers of stock price movements.
In this project, a machine learning-based model was developed to predict the future stock prices of ITC Ltd. (ITC.NS) using historical stock data and engineered features. The Random Forest Classifier model achieved a precision score of 0.78 on the backtest dataset, demonstrating its ability to forecast stock price movements with reasonable accuracy.
The project's key contributions include:
- Developing a robust and accurate stock price prediction model using a Random Forest Classifier.
- Exploring the impact of various technical indicators and trend features on the model's predictive performance.
- Providing insights and recommendations to investors and financial decision-makers based on the model's performance and predictions.
While the current model shows promising results, there are opportunities for further improvements and future work, such as incorporating additional features, exploring alternative machine learning algorithms, and conducting more comprehensive backtesting and portfolio optimization.
Overall, this project showcases the potential of machine learning in the domain of stock price prediction and highlights the importance of continuous research and development in this field to support informed investment decisions.