Mark Subryan, Director, Data Engineering
Amanda DeSouza, Senior Data Engineer
Christopher Matthews, Director, Data Products
Rahul Sirimanna, Senior Manager, Strategy and Operations
Masood Ali, Senior Director, Data Strategy and Governance
Royal Bank of Canada (RY on TSX and NYSE) and its subsidiaries operate under the master brand name RBC.
We are one of Canada's biggest banks, and among the largest in the world based on market capitalization.
We are one of North America's leading diversified financial services companies, and provide personal and commercial banking, wealth management, insurance, investor services and capital markets products and services on a global basis.
At RBC Internal Audit, we’ve begun a transformational journey that adopts a data-driven approach to provide assurance to our clients. To address challenges of identifying insufficient controls that could result in financial loss, we utilize analytics to uncover exceptions or outliers. These in-house developed analytics can range from advanced ML models to simple rules-based checks. Given the complexity of RBC’s business, the design and execution of control tests also requires specialized domain knowledge. Previously the control testing process was manually intensive and done periodically. The Audit team would:
1. Select the control tests
2. Design the test procedures
3. Take samples of the resulting dataset/transactions
4. Check samples for adherence to criteria
This process would be repeated anywhere from annually to once every two years. The audit team, burdened by the administrative overhead, had less time to review and revise the outliers. This process was difficult to scale as each platform resorted to silos, where they built and managed their own control testing process. This duplicated effort and makes consolidation to a holistic Internal Audit (IA) view a cumbersome, manual process.
There are numerous barriers to building a unified platform. Platform analysts would need the freedom to onboard and update their models in production. Any solution would need to support the variability of different models and schemas of outliers. Semantically, each control test needed categorization to fit within the broader universe of IA analytics. Additionally, managing data governance requirements across permissions, datasets and definition would be very costly for a custom application.
1. For new control tests, IA needed mindset shift where auditors thought of what they could test continuously as opposed to periodically
2. There were existing control tests we could reuse from prior audits. But they needed to be updated and on-boarded
3. Incentive for adoption
Dataiku provides an all-in-one platform for Control Testing at RBC.
Beginning our journey, we realized the largest barrier to adoption was the onboarding process. IA already had a gold mine of control tests created from past audits. In the future, each platform would be rapidly creating new control tests. We needed a self-driven way for Data Scientists/Analysts to onboard. Additionally, we didn’t want the overhead of Engineers acting as gatekeepers for each update. Working with Emma of the Dataiku team, we became aware of three key out-of-the-box features that would aid us:
• Dataiku API
• Editable Datasets
Leveraging a self-service process, Data Scientists/Analysts focus on building the control test in Dataiku. Once ready to onboard, they would simply “tag” the final dataset which contained the final list of outliers. For rollup to organizational attributes, (E.g., Control Test Name, Description and Audit ID) an editable dataset would be entered.
Using the Dataiku API, a background loops through each project containing the metadata and tagged dataset. The outlier dataset’s schema would vary for each use-case. Dataiku’s SQL API allows a schema-less approach to dump each final dataset to a centralized database. The metadata on the other hand has a deterministic set of columns and is imported to a summary table.
Scalability: Data Scientists/Analysts are able to build control tests and onboard with minimal effort through out-of-the box Dataiku features.
Flexibility: With this approach, each control test can be updated iteratively as auditors review and refine the logic for outliers.
Data Governance: Through Dataiku, we have existing automated pipelines which facilitate automated capture of Data Quality and Lineage metadata. The framework fits nicely with existing processes and automates publishing to Collibra’s RBC data governance platform.
The Control Testing Framework has transformed the mindset of auditors. Previously, control testing was considered to be a reactive, one time analytic performed during an audit. By making it a continuous process, we’ve instilled a more iterative and analytical mindset, enhancing each control’s efficiency and design.
By analytical mindset, auditors now need to prioritize what can be achieved given the limitations of data sources, data attributes, and critical data elements. They need to consider the forward looking picture; whether a control test can provide early detection of issues. Prioritization has also improved as the control tests with more tangible requirements are moved to the front of the line. This was achieved through closer collaboration between auditors and analysts, the latter of which are well versed in the limitations of the RBC systems and data.
As the creation of control test analytics materialized, auditors developed a greater line of sight into whether the control is working. As exceptions are observed throughout the year, they noticed significant increases in some quarters. Previously, these increases were found at the time of an audit, which could result in a significant time discrepancy between the outlier’s occurrence and detection. With the new process, these cases are uncovered in the following quarter. The faster turnaround time not only grants greater assurance to our clients, but also provides auditors with a self-serve workflow to examine the root cause. Auditors can then assess whether the control needs to be redesigned or tweaked. Dataiku’s version control and ease of productionization, provides structured iteration to evolve the control test logic through time.
The results from the control tests are presented in a holistic, centralized view which is easier to digest for the users. The data is also stored in a centralized area which simplifies governance.
• Allows easy collaboration across teams.
o Out-of-the-box features such as tagging, where one team can tag a dataset and have it consumed by another team or process downstream
o Data Scientists/Analytics can focus on solving business problems. Access to sensitive information can be easily restricted on multiple levels. E.g., Data Connections, Projects
o Engineers have a robust API within Dataiku that enables aggregation. Can programmatically access metadata and datasets for a given project. The SQL API allows Data Engineers the flexibility to programmatically transfer data from one database to another for aggregation purposes.
• Speed of change: Auditors can have a near real-time view of feedback they give as projects are decoupled from one another. A Data Scientist/Analyst has the power to update their flow in production.
• Integration with source systems: In RBC IA’s case, the amount the Data Scientist/Analyst needs to enter is reduced because a background process in Dataiku can connect to RBC Source systems and get any data the user would otherwise need to fill out.
• Data Governance o Control Test result forms the key input to identify, define and raise Audit Issues to RBC Business Units on their risk assurance practices. Having a consistent framework to execute control tests, store results with explainable metadata provides the necessary underpinning for IA to explain results to BU and Regulators in a consistent and transparent way.
o Applying higher standard of care on Critical Data Elements (CDE) informing control tests and keeping a continuous health check on these CDEs enables auditors and analysts to trust data and builds confidence in decision making
o This scalable approach powered by Dataiku platform will help IA build data driven perspective of Audit universe and answer questions on automated coverage of assurance work with traceable data quality and lineage.