Using Selenium Chrome Driver in python recipe
Hi, I have been trying to create a python recipe that web scrapes with Selenium, but have been facing this error:
unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location .../managed_folders/TEST_3/a4dI41U2/chromedriver is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
I have added the chromedriver as an input to the code and these are the libraries in my environment:
selenium
chromedriver-py
webdriver-manager
chromedriver-binary-auto
chromedriver-binary
This is the code I have been using:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
## Will throw a need permissions error
#from webdriver_manager.chrome import ChromeDriverManager
#driver = webdriver.Chrome(ChromeDriverManager().install())
# Read recipe inputs
chrome_driver = dataiku.Folder("a4dI41U2")
chrome_driver_info = chrome_driver.get_info()
driver_path = chrome_driver.get_path() + '/chromedriver'
# Compute recipe outputs
chrome_options = Options()
chrome_options.binary_location = driver_path
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-extensions")
driver = webdriver.Chrome(executable_path=driver_path, chrome_options=chrome_options)
driver.quit()
What should I do to solve the error?
Thanks in advance!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,238 Dataiker
Hi,
Can you confirm the version of google-chrome installed and your OS and chrome-driver version?
google-chrome --version && which google-chrome
cat /etc/*release
Likely the version of the chrome-driver and google-chrome are not compatible or google-chrome was not installed correctly. The error indicates issues starting google-chrome headless.
Your code seems fine and is similar to the example I've tested before here:
https://community.dataiku.com/t5/Using-Dataiku/Using-selenium-with-a-python-recipe/m-p/19661
Thanks
-
I think I probably did not install Google Chrome! I am looking to deploy the webscraper using dataiku, so I am wondering where should I be downloading Google Chrome to on dataiku?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,238 Dataiker
Hi,
You can install google-chrome on the DSS instance directly only requirement is really that it's available in the PATH for DSS.
Note if you plan on running on containerized execution you will need to add google-chrome to the base image https://doc.dataiku.com/dss/latest/containers/custom-base-images.html#add-a-dockerfile-fragment
The following steps should work ( for CentOS/RH)
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum install ./google-chrome-stable_current_*.rpmCheck the installation by running: google-chrome --version as the DSS user
Thanks