Hospital de Clínicas de Porto Alegre - Streamlining Data Workflows for Clinical Research

tiagoandresvaz Registered, Dataiku Frontrunner Awards 2021 Finalist, Dataiku Frontrunner Awards 2021 Participant Posts: 4 ✭✭✭✭

Tiago Andres Vaz

Head of A.I. (From Research-to-Production) | IT Advisor in Healthcare


Hospital de Clínicas de Porto Alegre

Hospital de Clínicas de Porto Alegre is a large teaching hospital located in Porto Alegre, Brazil. Affiliated with Federal University of Rio Grande do Sul, it was inaugurated in 1970, gradually becoming a reference for the state of Rio Grande do Sul and southern Brazil. It takes care of in about 60 specialties, since the simplest procedures until most complex, with priority, for patients of the Secondary Uses Service.

Awards Categories:

  • Excellence in Research


Hospital de Clínicas de Porto Alegre is a general, public and tertiary health care institution partnering with the medical, nursing, pharmacy and dental schools of the public university UFRGS, in Porto Alegre, Brazil. We develop our own Electronic Health Record called AGHUse, which is open source and the most adopted university hospital information system in Brazil.

We faced multiple challenges:

  • Data acquisition and preparation is time consuming and leads to lots of transformations and less data quality.
  • Large amounts of data took hours to open and process simple modifications, and querying such complex databases usually requires more than one system analyst and business experts.
  • The necessity to query each dataset multiple times to understand the information and create manual pipelines for machine learning were complex and confusing processes, without a graphical clear path explanation of what was going on.
  • Each modification in methods or statistical analyzes involved the creation of a new branch in one centralized repository only for syntax and code versioning, and data was re-generated each time we had to rollback, sometimes leading to impossible reproduction in our own laboratory.
  • Comments on our data were managed in docs without any link or meaningful integration to our code or data.
  • We had faced limits with the number of columns in traditional relational databases and switching databases was an almost forbidden process.
  • Switching machine learning pipelines from R to Python, and vice-versa, the same.
  • And lately, before Dataiku, we were starting to feel pain points for larger data governance, tracing access to data, defining user profiles and logging every aspect from our research project with data.


We started using Dataiku after a Tableau representative sent to me a comparison between Dataiku and Databricks. We analyzed both platforms, comparing important features for us, and I remember that moment that our research team voted unanimously for Dataiku to start our research.

Then we sent a message to the company's Academic and Education relations team, and after a fast response we received a donated license and installed Dataiku on premises. After a few configurations and installation steps accomplished, almost with no need for support from IT department, we started the following steps:

  • Statistical description of all our income data
  • De-identification
  • Cleaning and formatting
  • Interpretation and curation strategy
  • Definition of roles and tasks planning
  • Notes and codes standardization
  • Graphical pipeline definition
  • Pipeline execution
  • Statistical description of processed data
  • Machine learning modeling
  • Parameter tuning
  • Artificial intelligence deployment

In our journey, we learned that innovation tools like Dataiku will revamp clinical research, and so, there is the need to formally define ontologies and new methods for healthcare research using large hospital datasets. This is the motivation of our current work!


The time saved using Dataiku is remarkable. I am teaching at medical school, where students are using Dataiku as a substitute for MS Excel and Power BI are leading the process.

I think that this is due to the innovative “first layer” that the Dataiku interface gives to old spreadsheet concepts, used along solid backend processing (auditable and automatic), which enable us to move to a DataOps culture.

Setup Info
      Help me…