Using Selenium Chrome Driver in python recipe

sarahtan
Level 1
Using Selenium Chrome Driver in python recipe

Hi, I have been trying to create a python recipe that web scrapes with Selenium, but have been facing this error:

unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location .../managed_folders/TEST_3/a4dI41U2/chromedriver is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

I have added the chromedriver as an input to the code and these are the libraries in my environment:

selenium
chromedriver-py
webdriver-manager
chromedriver-binary-auto
chromedriver-binary

This is the code I have been using:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

## Will throw a need permissions error
#from webdriver_manager.chrome import ChromeDriverManager
#driver = webdriver.Chrome(ChromeDriverManager().install())

# Read recipe inputs
chrome_driver = dataiku.Folder("a4dI41U2")
chrome_driver_info = chrome_driver.get_info()
driver_path = chrome_driver.get_path() + '/chromedriver'


# Compute recipe outputs
chrome_options = Options()
chrome_options.binary_location = driver_path
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-extensions")
driver = webdriver.Chrome(executable_path=driver_path, chrome_options=chrome_options)
driver.quit()

What should I do to solve the error?

Thanks in advance!

 

0 Kudos
3 Replies
AlexT
Dataiker

Hi,

Can you confirm the version of google-chrome installed and your OS and chrome-driver version? 

google-chrome --version && which google-chrome

cat /etc/*release 

Likely the version of the chrome-driver and google-chrome are not compatible or google-chrome was not installed correctly.  The error indicates issues starting google-chrome headless.

Your code seems fine and is similar to the example I've tested before here: 

https://community.dataiku.com/t5/Using-Dataiku/Using-selenium-with-a-python-recipe/m-p/19661

Thanks

 

0 Kudos
sarahtan
Level 1
Author

I think I probably did not install Google Chrome! I am looking to deploy the webscraper using dataiku, so I am wondering where should I be downloading Google Chrome to on dataiku?

0 Kudos
AlexT
Dataiker

Hi,

You can install google-chrome on the DSS instance directly only requirement is really that it's available in the PATH for DSS.

Note if you plan on running on containerized execution you will need to add google-chrome to the base image  https://doc.dataiku.com/dss/latest/containers/custom-base-images.html#add-a-dockerfile-fragment  

The following steps should work ( for CentOS/RH) 

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum install ./google-chrome-stable_current_*.rpm

Check the installation by running:  google-chrome --version as the DSS user

Thanks

0 Kudos