Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Run book for DSS installation

gnaldi62
Run book for DSS installation

Hi all,

  we're going to install a DSS v10 (or 11) on a rather complex environment (LDAP, SSO, UIF, Cloudera 6.3.3, Kerberos, Sentry, Spark, ...). I wonder if there's a sort of run-book to help with organizing in sequence the activities which need to be performed. Yes, the documentation has all the details but we'd need to better understand the best order by which activities should be performed. Any help is appreciated.

 Regards.

Giuseppe

8 Replies
tgb417

@gnaldi62 

That’s a super interesting question.  Over the next few months, I may be doing an implementation, and would love to hear more about your results and sequence.  

Don’t know where you are on the project.  Based on other projects I’ve done over the years, I would think that SSO and directory would be early and important, and frankly prone to post implementation regrets.

Based on your question, I suspect that the following is on your radar screen, If I did not have a good design that I knew others were successfully using (and could not get it through the vendor) I’d likely focus on a quick-ish End to End proof of concept implementation creating the run book during this process.  I’d implement all the pieces, getting one or two of everything setup. And then find a few “Ginny pig” users  to try to help crush the Start up environment.  With the understanding that we might through it all out before building the actual deployment.  Key points here are all pieces end to end, and then find ways to abuse the system, then I’d be looking for all sorts of scaleability, usability feedback.  It is easier to adapt at this stage rather than when I have hundreds of users on the system.  

Good luck,  I look forward to hearing more.

--Tom
gnaldi62
Author

Hi Tom,

  you are right, it will be a rather complex installation with Hadoop and Spark integration (and this is OK) and also LDAP integration, SSO, Kerberos security and UIF.

I've been told that one can fill out a questionnaire from Dataiku and a .md document with all the steps in sequence ill be produced, but cannot find any further reference.

Thanks. Regards.

Giuseppe

tgb417

@gnaldi62 

I’d reach out to your customer success manager or support to see if you can get access to that tool.

 

--Tom
0 Kudos
Ignacio_Toledo

Hello @gnaldi62,

I'm not sure if there is a recommended run book (or playbook) for the DSS installation, specially when you do the installation on premises, where each organization will have their own architecture and environment.

We created our own "run book", or installation sequence, that we follow every time we update a DSS instance:

  • Stop running instances
  • Install/update Design node:
    • Run update
    • Install R integration
    • Install graphics export
    • Load hadoop and spark enviromental variables
    • Install hadoop integration
    • Install spark integration
    • Install nb widget support
    • Install jupyter widgets
  • Repeat this process for other instances (deployer/auto/api)
  • Update systemctl scripts
  • Start DSS

From the experience, I doubt that the order of the steps after the upgrade and before starting DSS is important. However, if you are installing everything from zero, in a new server, before running the installation script we will be sure to check that first all the packages related to LDAP, SSO, Spark, Hadoop, etc., are installed and up to date, before starting.

However, I'd recommend you get in touch first with DSS support in case they can give you recommendations to your particular setup. Also, a great tool to automatize the process for the future, is the use of Ansible (https://github.com/dataiku/dataiku-ansible-modules).

One extra comment, opinion: installing a fully operational infrastructure is much more simple if your organization moves into the cloud. We did a test (a proof of concept) last year, and Dataiku Fleet Manager (FM) is a remarkable tool to perform all of these operations automatically, with many savings for the IT groups and administrators.

Hope this helps a bit Giuseppe!

gnaldi62
Author

Hi Ignacio,

  thanks for the comments. Yes, probably the sequence you indicated is a sensible one. Customers are mainly banks and they don't want to move to the cloud, at least short term, mainly due to law restrictions (consider that AWS has added an Italian region few months ago and first such region of GCP will be inaugurated this June).

According to my experience, anything related to security (SSO, Kerberos, ...) are the main source of problems.

Cheers.

Giuseppe

matthieu
Level 2

Hi @gnaldi62 & @Ignacio_Toledo,

I would like to share the https://galaxy.ansible.com/datarsense/dataikudss ansible role which has been created to fully automate the DSS installation with LDAP configuration, and standalone SPARK support.

The role pre-install tasks take care of preparing the server environment (dss service user creation, installing system packages, tuning sysctl, ...) before downloading and installing DSS & DSS standalone spark package.

The role has been designed to be compatible with  restricted network environment in which securit devices might deny direct access to dataiku CDN. This feature might be required for your bank customers.

The https://github.com/dataiku/dataiku-ansible-modules module is used by the role to configure LDAP settings in an enterprise environment (succesfully tested with Active Directory). The role could probably be extended following the same logic to configure SSO settings.

Feel free to provide any feedback on this thread or to open an issue in the project repository (https://github.com/datarsense/ansible-role-dataikudss/issues)

Cheers,

Matthieu

Turribeach
Level 6

I am working on something similar but I have taken a different approach. For me, there isn't much value in configuring the DSS server since that can be restored from a working backup. For me the biggest issue is to get a VM setup with the required OS level packages and ngix configured for reverse proxy. As such I am working a single command line script that will create and install a DSS instance in a GCP VM. It's already working, I now need to automate the HTTPS cert creation and deployment and I will share it with the community. 

0 Kudos
matthieu
Level 2

I agree that the toughest par was to find the required OS package for each OS version. Example for a Debian 10 environment:

- name: Disable SELinux
become: true
ansible.posix.selinux:
state: disabled
register: result
failed_when: result.msg | default('ok', True) is not search('(^ok$|libselinux-python|(SELinux state changed))')
tags: [setup]

- name: Increase system limits as required by DSS
become: true
pam_limits:
domain: "{{ dss_service_user }}"
limit_item: "{{ item }}"
limit_type: "-"
value: 65536
loop:
- nofile
- nproc
tags: [setup]

- name: Create service user
become: true
ansible.builtin.user:
name: "{{ dss_service_user }}"
home: "{{ dss_service_user_home_basedir }}/{{ dss_service_user }}"
shell: "{{ dss_service_user_shell }}"
state: present
tags: [setup, dss-setup]

- name: Allow admin group executing playbook to run any commands as DSS service user
become: true
community.general.sudoers:
name: enable-sudoer-admin-runas-dss-account
state: present
user: "{{ ansible_user_id }}"
runas: "{{ dss_service_user }}"
commands: ALL

- name: Create dss install directory
become: true
ansible.builtin.file:
path: "{{ dss_install_dir_location }}"
state: directory
owner: "{{ dss_service_user }}"
mode: "u=rwx,g=rx,o=rx"
tags: [setup, dss-setup]

- name: Create dss data directory
become: true
ansible.builtin.file:
path: "{{ dss_service_user_home_basedir }}/{{ dss_service_user }}/{{ datadir }}"
state: directory
owner: "{{ dss_service_user }}"
mode: "u=rwx,g=,o="
tags: [setup, dss-setup]

- name: Installing sudo package
ansible.builtin.apt:
name: "{{ packages }}"
state: present
vars:
packages:
- sudo

- name: Installing dependencies
become: true
ansible.builtin.apt:
name: "{{ packages }}"
state: present
update_cache: yes
vars:
packages:
- acl
- curl
- git
- libexpat1
- libncurses5
- nginx
- unzip
- zip
- default-jre-headless
- python2.7
- libpython2.7
- libfreetype6
- libgomp1

- name: Installing Debian 10 specific packages
become: true
ansible.builtin.apt:
name: "{{ packages }}"
state: present
vars:
packages:
- python3.7
- python3-distutils
- python3-pip
- python3-setuptools
- python2-minimal
- python-six
when: ansible_facts['os_family'] == 'Debian' and ansible_facts ['distribution_major_version'] == '10'

The ansible role setup the required dependencies for DSS install,  and then installs and configure DSS server.

It can be used in a playbook in which a second role is used for configuring nginx as an SSL reverse-proxy and gathering the SSL certificate from a provider of your choice : some people might choose to use another web gateway  to secure their DSS deployment (Citrix ADC, F5, ...)

As these two parts of the deployment are not DSS specific tasks, I choosed to keep them outside of the dss role to make it suit different deployment environments.

 

0 Kudos