Mark Subryan, Director, Data Engineering
Amanda DeSouza, Senior Data Engineer
Christopher Matthews, Director, Data Products
Rahul Sirimanna, Senior Manager, Strategy and Operations
Masood Ali, Senior Director, Data Strategy and Governance
Royal Bank of Canada (RY on TSX and NYSE) and its subsidiaries operate under the master brand name RBC.
We are one of Canada's biggest banks, and among the largest in the world based on market capitalization.
We are one of North America's leading diversified financial services companies, and provide personal and commercial banking, wealth management, insurance, investor services and capital markets products and services on a global basis.
At RBC Internal Audit, we’ve begun a transformational journey that adopts a data-driven approach to provide assurance to our clients. To address challenges of identifying insufficient controls that could result in financial loss, we utilize analytics to uncover exceptions or outliers. These in-house developed analytics can range from advanced ML models to simple rules-based checks. Given the complexity of RBC’s business, the design and execution of control tests also requires specialized domain knowledge. Previously the control testing process was manually intensive and done periodically. The Audit team would:
1. Select the control tests
2. Design the test procedures
3. Take samples of the resulting dataset/transactions
4. Check samples for adherence to criteria
This process would be repeated anywhere from annually to once every two years. The audit team, burdened by the administrative overhead, had less time to review and revise the outliers. This process was difficult to scale as each platform resorted to silos, where they built and managed their own control testing process. This duplicated effort and makes consolidation to a holistic Internal Audit (IA) view a cumbersome, manual process.
There are numerous barriers to building a unified platform. Platform analysts would need the freedom to onboard and update their models in production. Any solution would need to support the variability of different models and schemas of outliers. Semantically, each control test needed categorization to fit within the broader universe of IA analytics. Additionally, managing data governance requirements across permissions, datasets and definition would be very costly for a custom application.
1. For new control tests, IA needed mindset shift where auditors thought of what they could test continuously as opposed to periodically
2. There were existing control tests we could reuse from prior audits. But they needed to be updated and on-boarded
3. Incentive for adoption
Dataiku provides an all-in-one platform for Control Testing at RBC.
Beginning our journey, we realized the largest barrier to adoption was the onboarding process. IA already had a gold mine of control tests created from past audits. In the future, each platform would be rapidly creating new control tests. We needed a self-driven way for Data Scientists/Analysts to onboard. Additionally, we didn’t want the overhead of Engineers acting as gatekeepers for each update. Working with Emma of the Dataiku team, we became aware of three key out-of-the-box features that would aid us:
• Dataiku API
• Editable Datasets
Leveraging a self-service process, Data Scientists/Analysts focus on building the control test in Dataiku. Once ready to onboard, they would simply “tag” the final dataset which contained the final list of outliers. For rollup to organizational attributes, (E.g., Control Test Name, Description and Audit ID) an editable dataset would be entered.
Using the Dataiku API, a background loops through each project containing the metadata and tagged dataset. The outlier dataset’s schema would vary for each use-case. Dataiku’s SQL API allows a schema-less approach to dump each final dataset to a centralized database. The metadata on the other hand has a deterministic set of columns and is imported to a summary table.
Scalability: Data Scientists/Analysts are able to build control tests and onboard with minimal effort through out-of-the box Dataiku features.
Flexibility: With this approach, each control test can be updated iteratively as auditors review and refine the logic for outliers.
Data Governance: Through Dataiku, we have existing automated pipelines which facilitate automated capture of Data Quality and Lineage metadata. The framework fits nicely with existing processes and automates publishing to Collibra’s RBC data governance platform.
Leveraging the Control Test Framework process saves 20-25% of time for a given audit. Rather than incurring the overhead of testing controls during an audit, an automated process is run continuously. This gives the auditor insight into the strength of the control environment throughout the year, not simply at that point of time.
This new process creates greater certainty into the status of the control, allowing auditors for exploration of ideas. Not only are significant time savings a benefit, but the quality of assurance is increased. As the 3rd line of defense, our core responsibility is providing assurance to RBC.
We’ve moved from a periodic, manual process to an iterative “test and learn” approach for auditing. This increases velocity and favors a more accurate view of business health at any given time, which gives decision-makers more time to react if needed.
This triggered a change in mindset around how analytics and data science may support the entire IA department, as a pioneer in enabling new ways to empower RBC employees to leverage data to enhance their day-to-day activities.
The ease of onboarding, accessing the data, exploring previous work, and collaborating with teammates of all profiles paves the way for organizational transformation around what data can help us achieve.
Value Brought by Dataiku:
• Allows easy collaboration across teams.
o Out-of-the-box features such as tagging, where one team can tag a dataset and have it consumed by another team or process downstream
o Data Scientists/Analytics can focus on solving business problems. Access to sensitive information can be easily restricted on multiple levels. E.g., Data Connections, Projects
o Engineers have a robust API within Dataiku that enables aggregation. Can programmatically access metadata and datasets for a given project. The SQL API allows Data Engineers the flexibility to programmatically transfer data from one database to another for aggregation purposes.
• Speed of change: Auditors can have a near real-time view of feedback they give as projects are decoupled from one another. A Data Scientist/Analyst has the power to update their flow in production.
• Integration with source systems: In RBC IA’s case, the amount the Data Scientist/Analyst needs to enter is reduced because a background process in Dataiku can connect to RBC Source systems and get any data the user would otherwise need to fill out.
• Data Governance on Control Test result forms the key input to identify, define and raise Audit Issues to RBC Business Units on their risk assurance practices. Having a consistent framework to execute control tests, store results with explainable metadata provides the necessary underpinning for IA to explain results to BU and Regulators in a consistent and transparent way.
o Applying higher standard of care on Critical Data Elements (CDE) informing control tests and keeping a continuous health check on these CDEs enables auditors and analysts to trust data and builds confidence in decision making
o This scalable approach powered by Dataiku platform will help IA build data driven perspective of Audit universe and answer questions on automated coverage of assurance work with traceable data quality and lineage.