Name: Xinde Zhang (No)
Title: Associate Professor
Country: United States
Organization: University of Arkansas
That 'students-first' philosophy is a big reason why the University of Arkansas is consistently ranked among the nation's top public research universities and best values. We work hard to ensure a low student-to-faculty ratio that promotes plenty of personal attention and mentoring opportunities. The Carnegie Foundation classifies the university as having "the highest possible level of research," placing us among the top 3% of colleges and universities nationwide.
Machine learning (ML) and artificial intelligence (AI) have gained significant ground in analytical fields over the last few years. As a result, it's become critical for finance students to acquire such knowledge, in addition to hands-on experience. That said, it can be challenging for finance students to learn about ML and AI due to a variety of reasons:
- The financial industry is a big data and information industry by nature. Finance practitioners should know and understand the latest technologies.
- Data analytical skills are critical for finance professionals. However, most business students lack such skills.
- Teaching non-technical students technical topics can be challenging for all non-technical disciplines.
One way to overcome these challenges is to condense the programming part into visual building blocks. That way, we can encourage students to equip themselves with ML and AI and help advance their careers in the finance industry.
The Teaching Procedure
The course is designed to be project oriented. The task assigned to students is to use either ML or AI to predict and explain crypto currency returns. The students need to follow the data analytic procedure to accomplish the task. They need to do the following to accomplish the assignment.
- Ask the right question:
- What are the possible driving factors for crypto return?
- Feasibility analysis
- Data availability
- Computational resource
- Data collection
- Data processing
- Model building and model selection
- Model interpretation and economic reasoning
- Refine and enrich the model
The software package and the procedure are laid out as the following.
Dataiku© is the package adopted for the class. Dataiku© is a virtual recipe for users to understand and speed up their ML implementation. Almost all the popular tasks of ML can be done with virtual recipes without writing Python code. The company offers free and detailed documentation of the package, which can be found here.
The procedure starts with a literature review, data collection, data processing, sample selection, machine learning model training, prediction, and verification. We try to conduct an analysis to see if any factor has prediction power of Crypto return. In particular, we run the following model using Bitcoin return as an example.
Where in the Bitcoin return, is the lagged explanation variable(s). We use the change of new/active address as the main interested variables in the example. The raw data is collected from Glassnode.com.
The following diagram shows how we use Dataiku to prepare the data for the analysis. As demonstrated below, we employ Dataiku to convert the raw pricing data and new address data into return and changes and merged the datasets. There is no single code needed for the process. More importantly, the diagram clearly demonstrates the procedure and the merging process visually.
After the heavy lifting of the data processing, we then start the fun part by letting Dataiku run models and examine the model performance as shown below.
Using Random Forest model as an example, the explanatory variables can be easily selected and adjusted.
A visual decision tree is also presented on the platform.
The model is then deployed and tested in the testing data sample. A visual presentation of the performance is as follows.
The technique introduces students to rationalize financial data analysis, that is, research and define the question, data collection, data processing, modeling, and testing. The visualized and intuitive procedure allows students to apply and test their financial knowledge such as market efficiency, factor models, and so on.
The method, procedure, and platform are intuitive and helpful. Some positive feedback reported by students includes:
- “Learning different types of machine learning techniques will help me in the future. I would not change much about this class overall it is a great class.”
- “I learned a lot about how to analyze and forecast data through advanced software techniques.”
However, given the nature of the course, there are still ways to improve and develop it further, with one student reporting that “...I find the ML component captivating, however, there needs to be more documentation of how to recreate the process in Dataiku….”.
Value Brought by Dataiku:
Visuals from Dataiku are key for business and finance students to learn about data analytics
As mentioned previously, it’s extremely challenging for non-technical students to understand and implement data analytics tasks. ML and AI can be an even heavier burden for them. The visual blocks provided by Dataiku are the key to helping students go from zero to hero on their data analytic journey. They were able to grasp the thinking process and successfully implement data analytic tasks in a very short period of time. It was a blessing to us.
The ease of transferability to another instructor
Dataiku© is free to higher education institutions and for personal use. Financial data is readily available on many financial websites such as Yahoo Finance, Federal Reserve, FINRA, Glassnote, Kaggle, and more. The procedure is easy to follow and intuitive to learn for those with some financial data handling experience. An incomplete guide can be found on my GitHub page.