Going from Prototype to Production (SSL & HTTPS)

tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

At this time our small prototype DSS server is working on HTTP, not HTTPS. As a solo practitioner, I feel like I need some help.

I want to (maybe not want to... but I really should) move my configuration from HTTP for all connections to HTTPS for all connections.

However, when doing certificate stuff in the past there seems to always be some sort of "GOT YOU...." or "I'm sorry but..." that comes up.

I've found the very brief instructions here on getting DSS on an HTTPS certificate setup. Is it really that easy?

OK, I'll need some kind of certificate. (I'll get a real one if I need it.) However, for the prototype. I'm hoping to start with a self-signed certificate with OpenSSL. I think I've got a valid .crt, .key files.

Where in the DSS environment is it suggested that these files be placed? In the DSS home director. Outside the DSS Home Directory. Thoughts? The instruction above gives no guidance.

There is some discussion of having to use a "reverse proxy" (which I understand in principle). This will allow the hostname to just act as the connection to the https SSL protected connection. (That sounds lovely. ) How hard is this going to be?

Second. The server also has a PostgreSQL server running on it. I'm guessing that I should also protect that as well. I found these basic instructions.

Tagged:

Best Answers

  • Omar
    Omar Dataiker Posts: 30 Dataiker
    Answer ✓

    Hi Tom,

    It is indeed easy to use HTTPS with DSS. You essentially have two options:

    1. Provide a certificate to the embedded nginx used by DSS;
    2. Use a reverse proxy in front of DSS and have it terminate your certificate.

    In terms of security there is no difference, however the second option is usually preferable since your users won't need to append a port number to the address they use to connect to DSS.

    As you saw already, our documentation provides instruction on how to setup the first option here, while here you can find instructions on how to setup the reverse proxy, along with examples for nginx and apache.

    Since you are talking about a prototype environment, a Self-Signed Certificate will work (although your browser will complain it cannot verify it, because it is not signed by a trusted Certificate Authority).

    You can generate a certificate on your linux box with the following command:

    openssl req -newkey rsa:2048 -nodes -keyout dss-cert.key -x509 -days 365 -out dss-cert.pem

    I recommend placing key and certificate in dss_data_dir/config folder, although there is no special requirement for that, since you will provide the exact position in the config file of nginx or apache.

    TIP: When you generate the certificate you'll be asked a bunch of information: the one you care the most is the Common Name, which needs to be the exact name of your linux host (as per the output of the hostname -f command), and will be the one you will provide as server name in nginx or apache configuration. It's also the address you will type in your browser to reach the DSS webUI.
    Here you can find a very useful openSSL cheat-sheet.

    Regarding PostgreSQL, the link you provided is good however if it is installed on the same box as DSS and you only use it in DSS (meaning you don't connect to it from outside DSS) you probably don't need to. Connections from other hosts are already disabled by default in Postgres.

    Take care,

    Omar
    Architect @ Dataiku

  • Omar
    Omar Dataiker Posts: 30 Dataiker
    edited July 17 Answer ✓

    Hi Tom,

    please try removing the two lines ssl_protocols and ssl_prefer_server_ciphers from the http stanza, and move them to the server stanza. I don't know why you would want to remove TLS 1.3 support, however.

    Please also add, after the ssl_prefer_server_ciphers line, this one:

    ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256;

    Hope it'll works, cheers.

    Omar
    Architect @ Dataiku

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
    Answer ✓

    Thank @AlexK
    for sending me a link to some UFW commands.

    So after lots of help....

    Thanks from both @AlexK
    & @Omar

    I have my DSS instance locked down so that HTTPS (without port number works as expected.).

    Next Step a hostname change and real Cert from a trusted source so that users do not have to make changes in order to get things to work with the local cert.

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    Thanks for the help.

    I've failed on the first attempt.

    I'm wondering about permissions on dss-cert.pem and dss-cert.key files. I've seen some conversations about premissions when it comes to certs.

    I'm also wondering if the problem is that my local IT team has set a sort of funky hostname myservername.orgname.local (ie mybigserver.ibm.local) rather than myservername.orgname.com (ie mybigserver.ibm.com) (Names disguised to protect the innocent.)

    --Tom

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    The problem does not seem to be the odd ______._____.local hostname that was given to this computer.

    As I worked through this, my problem was the Dataiku account did not have access to the certificate. Once I gave the Dataiku account access to read the certificate things began to work. Does anyone know best practices on certificate file system rights for DSS? I'm sure that this wants to be fairly tight. I just don't know how tight I can go. Both on the owner and group as well as rwx.

    With a self-signed certificate on Macintosh Catalina, it was a challenge to get Google Chrome to accept the self-signed certificate. And these instructions seem to be different depending on your operating system and browser. But i did get that working.

    Now to sort out the reverse proxy...

  • Omar
    Omar Dataiker Posts: 30 Dataiker

    Hi Tom,

    I'm glad it's working now, welcome to the club!

    DSS doesn't require particular settings or specifications when it comes to certificates. It just needs to be able to read it.

    In big companies, certificates are usually managed by a dedicated Team, especially the purchased ones. Certificates represent the Company, and thus they need to be preserved so they cannot be stolen. For this reason such big Companies usually go down the reverse proxy road, so that the aforementioned Team is in charge of managing the reverse proxy along with the certificate so that you (as user/DSS admin) don't have to handle them.

    For your particular setup (using the certificate on the embedded nginx of DSS), the recommendation is to store, as said before, the certificate into the dss_data_dir/config folder. It goes without saying, however, that anyone having SSH access to the folder can see the certificate, but again, this is a self-signed certificate anyway.

    See you around, take care!

    Omar
    Architect @ Dataiku

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Omar
    @AlexK

    OK, I'm trying the reverse proxy thing on a single host. I'd like to get https://hostname.domain.org to work rather than https://hostname.domain.org:10000. This later I have worked out with straight https.

    In the /etc/nginx/nginx.conf file. Does the "server_name" line need to be different than the actual hostname? Say dss.domain.org. rather than the real hostname of hostname.domain.org?

  • Omar
    Omar Dataiker Posts: 30 Dataiker

    Hi again,

    let's put it simple: the server_name directive in your nginx file is what nginx is listening for.

    Let's pretend your server is named something like dss1.mycorp.local. There is a DNS server, somewhere in your network, that knows that dss1.mycorp.local is attached to an IP address, let's say, 10.11.12.1.

    Again, putting it simple:
    1. You type dss1.mycorp.local in your browser
    2. Your browser asks the DNS the IP address associated with dss1.mycorp.local
    3. Your DNS answers with the IP address, and the browser makes contact with the server directly
    4. nginx is at the door (literally), your browser asks: "I'm looking for dss1.mycorp.local", is it here?
    5. nginix looks in it's configuration. Since there is a server_name directive matching the request (dss1.mycorp.local), it serves the content requested (in this case it just redirects to the correct port)

    Hence, your server_name directive needs to match the request coming from your browser (again, I'm keeping it very simple here).

    Now the question: so I can pretend to be whatever server I want, I just need the server_name directive to match a domain address? Yes, and it's the reason why SSL certificates exist.

    Hope it helps.

    Take care,

    Omar
    Architect @ Dataiku

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Omar

    Thanks, your feedback by in large, is what I've thought. However, I've still not gotten this to work. So I'm clearly missing something.

    When I started trying to make this work I thought that the content here. https://doc.dataiku.com/dss/latest/installation/proxies.html#reverse-proxy went into the file /etc/nginx/nginx.conf.

    However, I think, I now understand that this content should somehow be in the directory /etc/nginx/sites-available or /etc/nginx/sites-enabled. The documentation linked above does not make it very clear where the chunk of JSON should be placed.

    This is the first time I'm working with nginx. Any help would be appreciated.

  • Omar
    Omar Dataiker Posts: 30 Dataiker

    Hi,

    I'm sorry you're struggling so much on this setup: perhaps you can ask someone in your IT dept to look into it ?

    Separating config files is useful when the same machine is delivering more than one content (websites or applications, like in this case) or if you want your config to be tidy and well separated for convenience.

    Usually, DSS is the only thing running in the linux box (which is also recommended), so you don't really need to separate files, since they don't grow big.

    A single /etc/nginx/nginx.conf file containing the server stanza for DSS will work just fine.

    Take care,

    Omar
    Architect @ Dataiku

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
    edited July 17

    No one else in the IT department is able to help.

    I got some help from another colleague.

    We did find that something else was running on the box that was badly configured. That has been removed.

    nginx -t 

    come back OK.

    systemctl restart nginx

    comes back without errors. (This is how we found the other application.)

    I have the Dataiku provided server stanza installed in the /etc/nginx/nginx.conf file.

    I can get the Reverse Proxy working on HTTP (port 80) through nginx.

    However, if I try to switch to https now the browser is telling me:

    ERR_SSL_VERSION_OR_CIPHER_MISMATCH

    Unsupported protocol
    The client and server don't support a common SSL protocol version or cipher suite.

    This had not been a problem with putting a certificate on Dataiku Directly, but we had to use the port in the url.

    The nginx.conf in /etc/nginx is configured (I've left detail out below where the "..." is shown

    https: {
    ...
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2 ; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;
    ...
    # nginx SSL reverse proxy configuration for Dataiku Data Science Studio # requires nginx version 1.4 or above server { # Host/port on which to expose Data Science Studio to users listen 443 ssl; server_name XXXXXXXX.xxx.local; ssl_certificate /etc/nginx/ssl/dss-cert.pem; ssl_certificate_key /etc/nginx/ssl/dss-cert.key; location / { # Base url of the Data Science Studio installation proxy_pass http://localhost:11000/; proxy_redirect off; # Allow long queries proxy_read_timeout 3600; proxy_send_timeout 600; # Allow large uploads client_max_body_size 0; # Allow protocol upgrade to websocket proxy_http_version 1.1; proxy_set_header Host $http_host; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }
    }

    @Omar
    any further thoughts?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Omar

    The removal of TLS 1.3 came in the default nginx.conf "out of the box".

    Will give this a try.

    Thanks.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Omar

    That worked!!

    • build a locally signed certificate.
    • Updated /etc/nginex/nginx.conf
      • Added the server stanza provided in the documentation
      • moved the ssl_protocols & ssl_prefer_server_cipher to the server stanza
      • Added the ssl_ciphers line to the Server Stanza
    • checked the ningx configuration with
      • nginx -t
    • restarted nginx with
      • systemctl restart nginx
    • Then convinced browser that the self-signed certificate was OK using these instructions

    However: Now I have to convince DSS not respond to HTTP http://XXXXXXX.lsc.local:11000/home/

    How do I convince DSS only to respond to localhost processes on http.

Setup Info
    Tags
      Help me…