ALMA Observatory - Rising Data Science & Analytics to Meet Operational and Astronomical Demands with

Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron

Team members:

Ignacio Toledo, Data Science and Analytics Lead, with:

  • Sergio Pavez
  • Tomás Staig
  • José Lobos
  • Gastón Velez
  • Jose Luis Ortiz
  • Rosita Hormann
  • Jorge García
  • Departments of Science Operations and Computing

Country: Chile

Organization: ALMA

In the Chajnantor Plateau in the Atacama Desert, one of the highest and driest places on Earth, a gentle “rain” is falling. It is light from space, in millimetric and submillimetric wavelengths, a natural, scarce and precious resource. It is well-known that these waves are full of information about our cosmic origins, that is why people thirsty for this knowledge have gathered here to collect, channel and analyze it.

This is what gives rise to the Atacama Large Millimeter/submillimeter Array (ALMA), currently the largest radio telescope in the world. This achievement is the result of an international association between Europe (ESO), North America (NRAO) and East Asia (NAOJ), in collaboration with the Republic of Chile, to build the observatory of the “Dark Universe”.

Awards Categories:

  • Most Impactful Ikigai Story
  • Best Acceleration Use Case
  • Best Positive Impact Use Case
  • Best Data Democratization Program

Business Challenge:

As one of the world's premier terrestrial observatories, ALMA stands at the forefront of astronomical exploration. Our legacy has been built upon cutting-edge software and hardware development, enabling us to peer into the vastness of space with unparalleled precision. However, a new challenge looms on our horizon. While our expertise in observatory technology is mature, our adoption of data science and analytics, particularly in managing intricate operations, has lagged.

The operational complexity of ALMA has surged, driven by evolving demands from the global astronomers community [Figure 1] and the complexities of maintaing 66 telecopes with theri instruments at 5000 meters over the sea level. These researchers, our primary clientele, exert growing pressure for more refined data, quicker observational cycles, and streamlined processes. They seek insights into the universe's deepest secrets, and their expectations from us have never been higher. Yet, ironically, our challenges are not just celestial but deeply terrestrial.

Our operational budgets, a mere fraction compared to the colossal investments made during the facility's construction phase, have placed us in a tight spot. How do we elevate our operational efficiency, cater to the intensified demands, and manage the vast influx of data, all while keeping expenditures in check? The answer, it seems, lies in harnessing the power of data science and analytics.

However, our journey into this realm is fraught with challenges. Diving into data science without a strong foundational knowledge risks missteps. We face the dual challenge of determining which tools align with our unique needs and ensuring they seamlessly integrate with our existing systems, all within a context where we have a limited capacity for recruitment, or for funding new initiatives. The consequences are important, as a misstep in this domain might lead to inefficiencies, potentially compromising the quality of our observations and the trust of the astronomers we serve.

Figure 1: Time requested by Cycle by the scientific community, since Cycle 0 (2011) to Cycle 10 (2023). ALMA can observe a maximum of around 4000 hours each cycle, for the 12-m Array.

In summary, ALMA stands at a crossroads. The demands of the present call for a shift toward a more data-driven operational approach. But the journey to integrate data science and analytics into our workflow, with our budgetary constraints and the immense responsibility we hold to the astronomical community, is a challenge of cosmic proportions.

Business Solution:

In 2018, our path converged with Dataiku, a collaboration that marked the beginning of our data-driven metamorphosis. Thanks to our research and non-profit stance geared towards enhancing scientific acumen, Dataiku generously provided ALMA with a free license to their Data Science Studio (Dataiku DSS).

The immediate impact of Dataiku was transformative. It wasn't just a platform; it was a holistic environment where our team could dive deep into the data's depths. With its robust suite of tools for data access, preparation, cleaning, and analysis, Dataiku streamlined our analytical process. What set Dataiku apart was its "enforced" data science and analytical workflow. This structured pathway enabled our team to grasp the intricacies of data science, illuminating the kind of team dynamics and workflows essential for our evolution.

The versatility of Dataiku, particularly its adaptability to diverse data storage technologies and database preferences, coupled with the feasibility of on-premises deployment, was invaluable. Its ease of deployment and maintenance meant that, even with limited resources and manpower, we swiftly established a formidable data stack.

The impact was palpable. Our analysts, engineers, and scientists rapidly embraced Dataiku, and in a space of a few weeks, we had near 20 people (10% of the observatory staff) doing analytical work within the platform, and up to 100 consumers connecting. We swiftly progressed from rudimentary data analytics to crafting insightful dashboards and visualizations, giving us a real-time pulse of our operations. This was just the beginning. As we delved deeper, we defined critical KPIs, and soon, machine learning was no longer a futuristic concept but a tangible tool in our arsenal. One of our pioneering endeavors involved harnessing ML for preemptive maintenance. Another milestone was deploying Natural Language Processing (NLP) to smartly classify project proposals, assigning them to reviewers seasoned in the project's theme. This innovation drastically enhanced our review process's efficiency.

Our journey with Dataiku wasn't just about technological advancement; it was a story of empowerment, growth, and maturity. The successes we achieved with our experimental deployment resonated profoundly, garnering the unequivocal support of ALMA's management. It paved the way for transitioning from an experimental to a production-level deployment, solidifying our commitment to data-driven excellence.

Business Area Enhanced: Internal Operations

Use Case Stage: In Production

Value Generated:

Over the span of our collaboration, the tangible value that Dataiku has delivered transcends mere numbers. Here's a dive into the multifaceted benefits we've derived:

1. Cost-effective Scalability

Building a robust data stack often requires significant investment, both in terms of finance and human resources. However, with Dataiku's architecture, we achieved a feat many would deem impossible. For an expenditure of nearly $100,000 spread over five years and the involvement of no more than 1 Full-Time Equivalent (FTE) per month, we've launched a production-ready data stack. Split between a system administrator, a Dataiku administrator, and a software engineer specializing in data engineering, this minimal team catered to 20 analytic users and over 100 data consumers. Dataiku’s inherent design ensures that scaling our operations is only tethered to funding, not to the intricacies of deployment.

2. Standardized Reporting & Enhanced Decision-making

The transition to Dataiku catalyzed the creation and automation of over 50 reports and dashboards. Where once analysts toiled in isolation on their personal computers, leading to duplicated efforts and inconsistent findings, Dataiku offers a unified platform. This harmonization eliminates inconsistencies and fosters collaboration. Moreover, this centralized repository of insights has empowered decision-makers. With real-time metrics and a comprehensive overview of the observatory's operations, pinpointing areas for efficiency enhancement is now systematic and evidence-based.

3. A Paradigm Shift in Culture

Perhaps the most profound impact of Dataiku has been on ALMA's organizational culture. The introduction of a unified data stack, accessible through Dataiku, has instilled a newfound respect for data integrity. Staff members now recognize the indispensability of quality data. They understand that each report or data product is crafted with a specific intent, and thus, conclusions shouldn't be hastily drawn. Importantly, there’s a growing recognition that data science isn’t a siloed endeavor for the tech-savvy few. Instead, it’s a collective pursuit. The narrative has shifted from an outsourced task to a collaborative team sport, knitting together diverse professionals in a shared mission.

Value Brought by Dataiku:

1. The 'Project' Paradigm & The Flow Advantage

One of Dataiku's most transformative features is its 'project' concept. By streamlining data sources, preparation sequences, analysis, and final output into cohesive units, our staff has accelerated their analytical processes, delivering valuable insights in significantly reduced time. The embedded 'flow' system takes transparency to the next level. It's a visual representation, laying out the crux of any project's methodology. This has fostered a collaborative ethos. Staff can effortlessly share the intricacies of their work, and with Dataiku’s integrated version control, collaboration is free from the constant dread of overwriting or losing vital work.

2. Seamless DataOps Implementation through Diverse Instance Types

Dataiku’s architecture is tailor-made for a seamless DataOps lifecycle, and our experience stands testament:

  • Design Instance: Ideation is a critical phase, and Dataiku's 'design' instance serves as the perfect sandbox. It's where creativity meets data, allowing users to sketch out the initial contours of their projects.
  • Govern Instance: As projects evolve and mature, governance becomes pivotal. The 'govern' instance acts as a checkpoint, facilitating stakeholders to review, refine, and decide on the subsequent deployment pathways.
  • Deployer & Bundle Creation: Dataiku’s 'deployer' is a boon for operational fluidity. Projects metamorphose into deployable 'bundles', marking their journey from ideation to execution. These bundles, akin to software releases, navigate from testing phases to full-fledged production deployment.
  • Automation Instances for Environment Mastery: Dataiku's 'automation' instances are the linchpins of environment management. They form the bedrock of our various stages - be it testing, integration, or production. Beyond mere creation, they provide monitoring capabilities, ensuring each environment functions optimally.

Figure 2: The data stack architechture around the Dataiku different kinds of instances. This demonstrate both the value added to the DataOps/MLOps processes thanks to the different kinds of nodes/instances, and the simple to scale architecture.

Value Type:

  • Improve customer/employee satisfaction
  • Reduce cost
  • Save time
  • Increase trust
Setup Info
      Help me…