Run book for DSS installation

gnaldi62
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

Hi all,

we're going to install a DSS v10 (or 11) on a rather complex environment (LDAP, SSO, UIF, Cloudera 6.3.3, Kerberos, Sentry, Spark, ...). I wonder if there's a sort of run-book to help with organizing in sequence the activities which need to be performed. Yes, the documentation has all the details but we'd need to better understand the best order by which activities should be performed. Any help is appreciated.

Regards.

Giuseppe

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @gnaldi62

    That’s a super interesting question. Over the next few months, I may be doing an implementation, and would love to hear more about your results and sequence.

    Don’t know where you are on the project. Based on other projects I’ve done over the years, I would think that SSO and directory would be early and important, and frankly prone to post implementation regrets.

    Based on your question, I suspect that the following is on your radar screen, If I did not have a good design that I knew others were successfully using (and could not get it through the vendor) I’d likely focus on a quick-ish End to End proof of concept implementation creating the run book during this process. I’d implement all the pieces, getting one or two of everything setup. And then find a few “Ginny pig” users to try to help crush the Start up environment. With the understanding that we might through it all out before building the actual deployment. Key points here are all pieces end to end, and then find ways to abuse the system, then I’d be looking for all sorts of scaleability, usability feedback. It is easier to adapt at this stage rather than when I have hundreds of users on the system.

    Good luck, I look forward to hearing more.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hello @gnaldi62
    ,

    I'm not sure if there is a recommended run book (or playbook) for the DSS installation, specially when you do the installation on premises, where each organization will have their own architecture and environment.

    We created our own "run book", or installation sequence, that we follow every time we update a DSS instance:

    • Stop running instances
    • Install/update Design node:
      • Run update
      • Install R integration
      • Install graphics export
      • Load hadoop and spark enviromental variables
      • Install hadoop integration
      • Install spark integration
      • Install nb widget support
      • Install jupyter widgets
    • Repeat this process for other instances (deployer/auto/api)
    • Update systemctl scripts
    • Start DSS

    From the experience, I doubt that the order of the steps after the upgrade and before starting DSS is important. However, if you are installing everything from zero, in a new server, before running the installation script we will be sure to check that first all the packages related to LDAP, SSO, Spark, Hadoop, etc., are installed and up to date, before starting.

    However, I'd recommend you get in touch first with DSS support in case they can give you recommendations to your particular setup. Also, a great tool to automatize the process for the future, is the use of Ansible (https://github.com/dataiku/dataiku-ansible-modules).

    One extra comment, opinion: installing a fully operational infrastructure is much more simple if your organization moves into the cloud. We did a test (a proof of concept) last year, and Dataiku Fleet Manager (FM) is a remarkable tool to perform all of these operations automatically, with many savings for the IT groups and administrators.

    Hope this helps a bit Giuseppe!

  • gnaldi62
    gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

    Hi Ignacio,

    thanks for the comments. Yes, probably the sequence you indicated is a sensible one. Customers are mainly banks and they don't want to move to the cloud, at least short term, mainly due to law restrictions (consider that AWS has added an Italian region few months ago and first such region of GCP will be inaugurated this June).

    According to my experience, anything related to security (SSO, Kerberos, ...) are the main source of problems.

    Cheers.

    Giuseppe

  • gnaldi62
    gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

    Hi Tom,

    you are right, it will be a rather complex installation with Hadoop and Spark integration (and this is OK) and also LDAP integration, SSO, Kerberos security and UIF.

    I've been told that one can fill out a questionnaire from Dataiku and a .md document with all the steps in sequence ill be produced, but cannot find any further reference.

    Thanks. Regards.

    Giuseppe

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @gnaldi62

    I’d reach out to your customer success manager or support to see if you can get access to that tool.

  • matthieu
    matthieu Registered Posts: 11

    Hi @gnaldi62
    & @Ignacio_Toledo
    ,

    I would like to share the https://galaxy.ansible.com/datarsense/dataikudss ansible role which has been created to fully automate the DSS installation with LDAP configuration, and standalone SPARK support.

    The role pre-install tasks take care of preparing the server environment (dss service user creation, installing system packages, tuning sysctl, ...) before downloading and installing DSS & DSS standalone spark package.

    The role has been designed to be compatible with restricted network environment in which securit devices might deny direct access to dataiku CDN. This feature might be required for your bank customers.

    The https://github.com/dataiku/dataiku-ansible-modules module is used by the role to configure LDAP settings in an enterprise environment (succesfully tested with Active Directory). The role could probably be extended following the same logic to configure SSO settings.

    Feel free to provide any feedback on this thread or to open an issue in the project repository (https://github.com/datarsense/ansible-role-dataikudss/issues)

    Cheers,

    Matthieu

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    I am working on something similar but I have taken a different approach. For me, there isn't much value in configuring the DSS server since that can be restored from a working backup. For me the biggest issue is to get a VM setup with the required OS level packages and ngix configured for reverse proxy. As such I am working a single command line script that will create and install a DSS instance in a GCP VM. It's already working, I now need to automate the HTTPS cert creation and deployment and I will share it with the community.

  • matthieu
    matthieu Registered Posts: 11

    I agree that the toughest par was to find the required OS package for each OS version. Example for a Debian 10 environment:

    - name: Disable SELinux
    become: true
    ansible.posix.selinux:
    state: disabled
    register: result
    failed_when: result.msg | default('ok', True) is not search('(^ok$|libselinux-python|(SELinux state changed))')
    tags: [setup]

    - name: Increase system limits as required by DSS
    become: true
    pam_limits:
    domain: "{{ dss_service_user }}"
    limit_item: "{{ item }}"
    limit_type: "-"
    value: 65536
    loop:
    - nofile
    - nproc
    tags: [setup]

    - name: Create service user
    become: true
    ansible.builtin.user:
    name: "{{ dss_service_user }}"
    home: "{{ dss_service_user_home_basedir }}/{{ dss_service_user }}"
    shell: "{{ dss_service_user_shell }}"
    state: present
    tags: [setup, dss-setup]

    - name: Allow admin group executing playbook to run any commands as DSS service user
    become: true
    community.general.sudoers:
    name: enable-sudoer-admin-runas-dss-account
    state: present
    user: "{{ ansible_user_id }}"
    runas: "{{ dss_service_user }}"
    commands: ALL

    - name: Create dss install directory
    become: true
    ansible.builtin.file:
    path: "{{ dss_install_dir_location }}"
    state: directory
    owner: "{{ dss_service_user }}"
    mode: "u=rwx,g=rx,o=rx"
    tags: [setup, dss-setup]

    - name: Create dss data directory
    become: true
    ansible.builtin.file:
    path: "{{ dss_service_user_home_basedir }}/{{ dss_service_user }}/{{ datadir }}"
    state: directory
    owner: "{{ dss_service_user }}"
    mode: "u=rwx,g=,o="
    tags: [setup, dss-setup]

    - name: Installing sudo package
    ansible.builtin.apt:
    name: "{{ packages }}"
    state: present
    vars:
    packages:
    - sudo

    - name: Installing dependencies
    become: true
    ansible.builtin.apt:
    name: "{{ packages }}"
    state: present
    update_cache: yes
    vars:
    packages:
    - acl
    - curl
    - git
    - libexpat1
    - libncurses5
    - nginx
    - unzip
    - zip
    - default-jre-headless
    - python2.7
    - libpython2.7
    - libfreetype6
    - libgomp1

    - name: Installing Debian 10 specific packages
    become: true
    ansible.builtin.apt:
    name: "{{ packages }}"
    state: present
    vars:
    packages:
    - python3.7
    - python3-distutils
    - python3-pip
    - python3-setuptools
    - python2-minimal
    - python-six
    when: ansible_facts['os_family'] == 'Debian' and ansible_facts ['distribution_major_version'] == '10'

    The ansible role setup the required dependencies for DSS install, and then installs and configure DSS server.

    It can be used in a playbook in which a second role is used for configuring nginx as an SSL reverse-proxy and gathering the SSL certificate from a provider of your choice : some people might choose to use another web gateway to secure their DSS deployment (Citrix ADC, F5, ...)

    As these two parts of the deployment are not DSS specific tasks, I choosed to keep them outside of the dss role to make it suit different deployment environments.

Setup Info
    Tags
      Help me…