Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Run book for DSS installation

gnaldi62
Neuron
Neuron
Run book for DSS installation

Hi all,

  we're going to install a DSS v10 (or 11) on a rather complex environment (LDAP, SSO, UIF, Cloudera 6.3.3, Kerberos, Sentry, Spark, ...). I wonder if there's a sort of run-book to help with organizing in sequence the activities which need to be performed. Yes, the documentation has all the details but we'd need to better understand the best order by which activities should be performed. Any help is appreciated.

 Regards.

Giuseppe

5 Replies
tgb417
Neuron
Neuron

@gnaldi62 

That’s a super interesting question.  Over the next few months, I may be doing an implementation, and would love to hear more about your results and sequence.  

Don’t know where you are on the project.  Based on other projects I’ve done over the years, I would think that SSO and directory would be early and important, and frankly prone to post implementation regrets.

Based on your question, I suspect that the following is on your radar screen, If I did not have a good design that I knew others were successfully using (and could not get it through the vendor) I’d likely focus on a quick-ish End to End proof of concept implementation creating the run book during this process.  I’d implement all the pieces, getting one or two of everything setup. And then find a few “Ginny pig” users  to try to help crush the Start up environment.  With the understanding that we might through it all out before building the actual deployment.  Key points here are all pieces end to end, and then find ways to abuse the system, then I’d be looking for all sorts of scaleability, usability feedback.  It is easier to adapt at this stage rather than when I have hundreds of users on the system.  

Good luck,  I look forward to hearing more.

--Tom
gnaldi62
Neuron
Neuron
Author

Hi Tom,

  you are right, it will be a rather complex installation with Hadoop and Spark integration (and this is OK) and also LDAP integration, SSO, Kerberos security and UIF.

I've been told that one can fill out a questionnaire from Dataiku and a .md document with all the steps in sequence ill be produced, but cannot find any further reference.

Thanks. Regards.

Giuseppe

tgb417
Neuron
Neuron

@gnaldi62 

I’d reach out to your customer success manager or support to see if you can get access to that tool.

 

--Tom
0 Kudos
Ignacio_Toledo

Hello @gnaldi62,

I'm not sure if there is a recommended run book (or playbook) for the DSS installation, specially when you do the installation on premises, where each organization will have their own architecture and environment.

We created our own "run book", or installation sequence, that we follow every time we update a DSS instance:

  • Stop running instances
  • Install/update Design node:
    • Run update
    • Install R integration
    • Install graphics export
    • Load hadoop and spark enviromental variables
    • Install hadoop integration
    • Install spark integration
    • Install nb widget support
    • Install jupyter widgets
  • Repeat this process for other instances (deployer/auto/api)
  • Update systemctl scripts
  • Start DSS

From the experience, I doubt that the order of the steps after the upgrade and before starting DSS is important. However, if you are installing everything from zero, in a new server, before running the installation script we will be sure to check that first all the packages related to LDAP, SSO, Spark, Hadoop, etc., are installed and up to date, before starting.

However, I'd recommend you get in touch first with DSS support in case they can give you recommendations to your particular setup. Also, a great tool to automatize the process for the future, is the use of Ansible (https://github.com/dataiku/dataiku-ansible-modules).

One extra comment, opinion: installing a fully operational infrastructure is much more simple if your organization moves into the cloud. We did a test (a proof of concept) last year, and Dataiku Fleet Manager (FM) is a remarkable tool to perform all of these operations automatically, with many savings for the IT groups and administrators.

Hope this helps a bit Giuseppe!

gnaldi62
Neuron
Neuron
Author

Hi Ignacio,

  thanks for the comments. Yes, probably the sequence you indicated is a sensible one. Customers are mainly banks and they don't want to move to the cloud, at least short term, mainly due to law restrictions (consider that AWS has added an Italian region few months ago and first such region of GCP will be inaugurated this June).

According to my experience, anything related to security (SSO, Kerberos, ...) are the main source of problems.

Cheers.

Giuseppe