【Data Analysis(5)】XGBoost Algorithm Predicts Returns (Part 1)
Use algorithm to learn the investment factors and predict returns.
Highlights
- Difficulty:★★★☆☆
- Setting Virtual Environment
- XGBoost Introduction and Installation
Preface
Recently, a lot of algorithms have emerged, and various mathematical models have been developed to solve problems. The classic model is “regression”. With the advancement of technology, algorithms now been developed which can improve and learn by themselves (Machine Learning). Nowaday has developed into the most popular type of neural network model (Deep Learning).
This article introduces the tree model XGBoost and will be divided into two parts. The first part will teach how to set environment and module installation. The second part is the preprocessing of the data, training, and prediction and visualization.
XGBoost Introduction
First, let’s introduce the popular algorithm XGBoost. The so-called Boosting is a kind of aggregating many weak learnings into a more powerful learner, which has higher accuracy for the final prediction result.
XGBoost (Extreme Gradient Boosting) is a gradient descent algorithm, Gradient Boosted Tree (GBDT), Each step of learning is based on previous errors, and will retain the original model, and add new functions as a correction the last error, this is a collection of multiple weak learners. The application mainly solves supervised learning, which can deal with classification and regression problems as well.
The Editing Environment and Modules Required
Mac OS and Jupyter Notebook
Virtual Environment
Due to XGBoost uses many modules, if the versions are inconsistent, it will cause endless errors. Therefore, we can create a new environment to install these modules. There are many ways to install them. This tutorial is a relatively simple and easy-to-understand way to minimize errors.
Step 1. Install Anaconda
Anaconda can be said to be a lazy package for beginners. It solves the current situation that the inconsistency of various systems causes installation difficulties. It has organized more than 1000 packages that can be installed, which are suitable for Windows, Linux and MacOS. Operating system environment, also has a virtual environment manager, which is simple and fast for installing and executing machine learning environment.
Step 2. Click terminal
Windows system is Anaconda Prompt
Enter the following command
conda create -n 新環境名稱 python==3.8
It will pop up and ask if you want to install it. Enter y
and enter
! The name of our new environment is test
. Of course you can also type any name you like.
conda env list
This command will show all of the environment we have created.
step 3. Activate environment
conda activate 新環境名稱
At this time, the front bracket (base) of the terminal will turn into the name (test). It means we activate the environment successful. If the following installation fails and need to reinstall. We just remove the environment by simply entering a series of commands below.
conda env remove -n 新環境名稱
Install XGBoost
step 1. Activate environment
conda activate 新環境名稱
step 2. Enter command
conda install py-xgboost
The same will ask if you want to install these modules, type y
and press enter
to start the installation, and it will be successful after running! Is it very simple!
Install XGBoost visualization module graphviz
step 1. Install Homebrew (under our new environment)
Homebrew We can understand it as an installation method. For example, using pip
to install python module. On macOS, Homebrew is the most widely used package management tool.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Enter the command on the terminal to install
step 2. graphviz
brew install graphviz
The above are the modules we will mainly use in this article! However, in the new environment, XGBoost does not have some of the modules we need, so we have to install them separately (pandas, matplotlib, tejapi). The command is separated by spaces.
pip install pandas matplotlib tejapi
Install jupyter notebook
step 1. Open Anaconda, choose the name we just created for the environment
step 2. Under jupyter notebook Click install
Final Result
Finally, checking whether the installation is successful in jupyter!
Database
We use TWN/AFF_RAW in this article. It provides trading factors for algorithms learning. Database refer to Kenneth R. French and top three financial journals (JF、RFS、JFE). The indicators are calculated by using Taiwan market data, and the all indicators are sorted out in a monthly frequency.
df = tejapi.get('TWN/AFF_RAW',
coid = '9921',
mdate={'gte': '2015-01-01', 'lte':'2020-12-31'}
chinese_column_name = True,
paginate = True)
Conclusion
The part 1 of this article is about module installation. I believe that most people will encounter many installation situations when first contact the program. The arrangement of the environment is the first class for programmer. After everyone has successfully installed it, the part 2 will start to use the database. We will process the data, feed the model, and predict returns as a reference for our investment.
Extended Reading
Related Link
You could give us encouragement by …
We will share financial database applications every week.
If you think today’s article is good, you can click on the applause icononce.
If you think it is awesome, you can hold the applause icon until 50 times.
If you have any feedback, please feel free to leave a comment below.