At this time our small prototype DSS server is working on HTTP, not HTTPS. As a solo practitioner, I feel like I need some help.
I want to (maybe not want to... but I really should) move my configuration from HTTP for all connections to HTTPS for all connections.
However, when doing certificate stuff in the past there seems to always be some sort of "GOT YOU...." or "I'm sorry but..." that comes up.
I've found the very brief instructions here on getting DSS on an HTTPS certificate setup. Is it really that easy?
OK, I'll need some kind of certificate. (I'll get a real one if I need it.) However, for the prototype. I'm hoping to start with a self-signed certificate with OpenSSL. I think I've got a valid .crt, .key files.
Where in the DSS environment is it suggested that these files be placed? In the DSS home director. Outside the DSS Home Directory. Thoughts? The instruction above gives no guidance.
There is some discussion of having to use a "reverse proxy" (which I understand in principle). This will allow the hostname to just act as the connection to the https SSL protected connection. (That sounds lovely. ) How hard is this going to be?
Second. The server also has a PostgreSQL server running on it. I'm guessing that I should also protect that as well. I found these basic instructions.
It is indeed easy to use HTTPS with DSS. You essentially have two options:
In terms of security there is no difference, however the second option is usually preferable since your users won't need to append a port number to the address they use to connect to DSS.
As you saw already, our documentation provides instruction on how to setup the first option here, while here you can find instructions on how to setup the reverse proxy, along with examples for nginx and apache.
Since you are talking about a prototype environment, a Self-Signed Certificate will work (although your browser will complain it cannot verify it, because it is not signed by a trusted Certificate Authority).
You can generate a certificate on your linux box with the following command:
openssl req -newkey rsa:2048 -nodes -keyout dss-cert.key -x509 -days 365 -out dss-cert.pem
I recommend placing key and certificate in dss_data_dir/config folder, although there is no special requirement for that, since you will provide the exact position in the config file of nginx or apache.
TIP: When you generate the certificate you'll be asked a bunch of information: the one you care the most is the Common Name, which needs to be the exact name of your linux host (as per the output of the hostname -f command), and will be the one you will provide as server name in nginx or apache configuration. It's also the address you will type in your browser to reach the DSS webUI.
Here you can find a very useful openSSL cheat-sheet.
Regarding PostgreSQL, the link you provided is good however if it is installed on the same box as DSS and you only use it in DSS (meaning you don't connect to it from outside DSS) you probably don't need to. Connections from other hosts are already disabled by default in Postgres.
Architect @ Dataiku
Thanks for the help.
I've failed on the first attempt.
I'm wondering about permissions on dss-cert.pem and dss-cert.key files. I've seen some conversations about premissions when it comes to certs.
I'm also wondering if the problem is that my local IT team has set a sort of funky hostname myservername.orgname.local (ie mybigserver.ibm.local) rather than myservername.orgname.com (ie mybigserver.ibm.com) (Names disguised to protect the innocent.)
The problem does not seem to be the odd ______._____.local hostname that was given to this computer.
As I worked through this, my problem was the Dataiku account did not have access to the certificate. Once I gave the Dataiku account access to read the certificate things began to work. Does anyone know best practices on certificate file system rights for DSS? I'm sure that this wants to be fairly tight. I just don't know how tight I can go. Both on the owner and group as well as rwx.
With a self-signed certificate on Macintosh Catalina, it was a challenge to get Google Chrome to accept the self-signed certificate. And these instructions seem to be different depending on your operating system and browser. But i did get that working.
Now to sort out the reverse proxy...
I'm glad it's working now, welcome to the club!
DSS doesn't require particular settings or specifications when it comes to certificates. It just needs to be able to read it.
In big companies, certificates are usually managed by a dedicated Team, especially the purchased ones. Certificates represent the Company, and thus they need to be preserved so they cannot be stolen. For this reason such big Companies usually go down the reverse proxy road, so that the aforementioned Team is in charge of managing the reverse proxy along with the certificate so that you (as user/DSS admin) don't have to handle them.
For your particular setup (using the certificate on the embedded nginx of DSS), the recommendation is to store, as said before, the certificate into the dss_data_dir/config folder. It goes without saying, however, that anyone having SSH access to the folder can see the certificate, but again, this is a self-signed certificate anyway.
See you around, take care!
Architect @ Dataiku
OK, I'm trying the reverse proxy thing on a single host. I'd like to get https://hostname.domain.org to work rather than https://hostname.domain.org:10000. This later I have worked out with straight https. 🙂
In the /etc/nginx/nginx.conf file. Does the "server_name" line need to be different than the actual hostname? Say dss.domain.org. rather than the real hostname of hostname.domain.org?
let's put it simple: the server_name directive in your nginx file is what nginx is listening for.
Let's pretend your server is named something like dss1.mycorp.local. There is a DNS server, somewhere in your network, that knows that dss1.mycorp.local is attached to an IP address, let's say, 10.11.12.1.
Again, putting it simple:
1. You type dss1.mycorp.local in your browser
2. Your browser asks the DNS the IP address associated with dss1.mycorp.local
3. Your DNS answers with the IP address, and the browser makes contact with the server directly
4. nginx is at the door (literally), your browser asks: "I'm looking for dss1.mycorp.local", is it here?
5. nginix looks in it's configuration. Since there is a server_name directive matching the request (dss1.mycorp.local), it serves the content requested (in this case it just redirects to the correct port)
Hence, your server_name directive needs to match the request coming from your browser (again, I'm keeping it very simple here).
Now the question: so I can pretend to be whatever server I want, I just need the server_name directive to match a domain address? Yes, and it's the reason why SSL certificates exist.
Hope it helps.
Architect @ Dataiku
Thanks, your feedback by in large, is what I've thought. However, I've still not gotten this to work. So I'm clearly missing something.
When I started trying to make this work I thought that the content here. https://doc.dataiku.com/dss/latest/installation/proxies.html#reverse-proxy went into the file /etc/nginx/nginx.conf.
However, I think, I now understand that this content should somehow be in the directory /etc/nginx/sites-available or /etc/nginx/sites-enabled. The documentation linked above does not make it very clear where the chunk of JSON should be placed.
This is the first time I'm working with nginx. Any help would be appreciated.
I'm sorry you're struggling so much on this setup: perhaps you can ask someone in your IT dept to look into it ?
Separating config files is useful when the same machine is delivering more than one content (websites or applications, like in this case) or if you want your config to be tidy and well separated for convenience.
Usually, DSS is the only thing running in the linux box (which is also recommended), so you don't really need to separate files, since they don't grow big.
A single /etc/nginx/nginx.conf file containing the server stanza for DSS will work just fine.
Architect @ Dataiku