Siti Sulaiha Binti Subiono
RiseHill Data Analysis Sdn. Bhd.
Risehill Data Analysis Sdn. Bhd (RDA), is a high-tech development and service company registered in Kuala Lumpur, Malaysia, which is specialized in petroleum technique consulting, services, and data analytics. The company is committed to comprehensive technical research, development, and consultation based on the concept of ‘the integration of multiple sources of data'. Currently, the company has some software copyrights, technical patents, and the tailored workflow and solutions for some particular and challenging problems. The company aims to be a world-class integrated service in data analytics and is acknowledged for its state-of-the-art technology provider.
- Data Science for Good
- AI Democratization & Inclusivity
- Responsible AI
- Value at Scale
To detect fraudulent activity, most organizations used to rely on a rule-based approach, which requires an algorithm to perform several defined scenarios - and the workflow must be manually updated if new scenarios or trends come in. As fraud tactics have become more advanced, this approach is now outdated.
The vast number and size of datasets at hand also made fraud detection more challenging. Based on the Crime Statistics Malaysia 2020 by the Department of Statistics Malaysia, Corporate Fraud which involves bribery, corruption, and asset misappropriation recorded an increase from 2018 to 2019. The Covid-19 pandemic also contributed to the rising trend in fraud cases, as it accelerated the need for effective payment channels between consumers and companies - and faster payments can potentially mean faster crime.
In addition, Malaysian organizations are quite slow to adopt AI technologies combating fraud, due to a number of factors. First, the increasing amount of data of questionable quality, which makes it harder to leverage. Second, corporations still do not trust technology as a tool in detecting fraud effectively and tend to keep conventional investigation methods, which are time-consuming.
The last challenge lies in the shortage of local talents, which hinders progress in detecting fraudulent activities. As a Data Scientist, I also have challenges in building the whole workflow, which is a very lengthy process - from joining data from various sources, doing exploration, building machine learning models using Java or Python, fine-tuning those or optimizing computing time, until deployment.
We found that Dataiku enabled to fill these gaps, so that the RiseHill Data Analysis Team stands together in combatting the rise in corporate fraud in Malaysia using AI and Data Analytics. We want many companies in Malaysia to open their eyes and use advanced technology and tools to combat this issue before it’s become worsen.
RiseHill Data Analysis Sdn. Bhd. leverages Dataiku to develop Machine Learning models as a more effective method in detecting fraudulent activities, as well as a more secure and efficient approach - moving past the old school “rule-based” approach. We are now able to centralize data exploration, wrangling, and the creation of machine learning models in one platform - hence Dataiku helps us save time in the development and deployment phases of the models.
Our favorite feature is Data Partitioning, which enables us to refresh the data on a daily basis, while Dataiku will only re-build the workflow with the partition that contains the new data. This is especially helpful to re-train models efficiently.
Machine learning relies on pattern recognition and classification to distinguish legitimate transactions from fraudulent ones occurring through online payments channel. The types of classification we used are using are based on user identity, order history, location of the payment, time of transactions, and amount spent:
- In Identity classification, we use the age of the customer, the amount of the characters they used in their email address, fraud rate of their IP address, and the number of devices with which they access the organization’s site.
- In Order History, we use the data when the orders were placed, or time period, the amount spent in each transaction, and the data on how many orders were attempted and failed.
- In Location Classification, fraudulent activities can be detected through mismatch between the billing and shipping addresses, or between the user's IP location and the shipping address.
- In Method of Payment Classification, credit card details, name of the customer and the shipping information must reference the same country, and the credit card used by the customer must not be issued by a bank with a reputation of fraudulent transactions.
Machine learning is particularly helpful to organizations which implement these models in the long run, as they are able to remove non-legit transactions and streamline the acquisition of new, reliable customers.
It also enables risk mitigation, as these techniques detect more advanced fraud than the traditional rule-based approached. Our approach has the potential to be generalized in Malaysian organizations to fight fraud.
Dataiku has benefited both our organization and customers through fraud detection - and enabled us to save time on development, execution, and deployment of modeling. No more hard-code!