How to use chrome driver non headless in dataiku

Mitra · January 2024

I'm trying to use chrome driver headless in DSS to parse data from a website.

Is there a way to use non headless to visualize the parsing activity on the screen?

Turribeach · January 2024

This is something you would develop / test locally in your machine and then deploy the automation to run somewhere else. Is there a reason as to why you can't take this approach?

Vitaliy · January 2024

Hi,
If by using Chrome driver headful you mean being able to see the page in UI, then there is no way to do that. However, if you want to be able to parse the page the same way as in a Browser using developer tools, you can do that in a Notebook with Selenium. The prerequisites will be downloading the Chrome driver and adding it to PATH on the DSS server or specifying the path to the driver directly in the code. Also, you may get an error regarding missing "Xvfb" (I got the error in my testing). In that case, you can fix the issue with the below OS package:

sudo yum install xorg-x11-server-Xvfb

Then create a code env adding the below packages (I used Python3.9 in my test):

pyvirtualdisplay
selenium

Then try the code below to grab a specific element from a website:

from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By

# Start virtual display using Xvfb
display = Display(visible=0, size=(800, 600))
display.start()

# Path to your ChromeDriver executable.
chromedriver_path = "/usr/bin/chrome/chrome-linux64/chromedriver" # change to your path
# Set up Chrome service with executable_path.
chrome_service = ChromeService(executable_path=chromedriver_path)

# Set up Chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--no-sandbox')
# chrome_options.add_argument("--headless=new") # enable headless for Chrome >= 109

# Create a Chrome driver instance
driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

# Example: Navigate to a website
driver.get("https://www.selenium.dev/")

# Example: Extract data from the website
# ... Your scraping code here ...

element = driver.find_element(By.CSS_SELECTOR, "body > div.container-fluid.td-default.td-outer > main > section.row.td-box.td-box--gradient.-bg-selenium-green.p-2 > div > div > div > h1")
print(element.text)

# Close the driver
driver.quit()

# Stop the virtual display
display.stop()

Screenshot 2024-01-20 at 16.42.37.png

Hope this helps.

Best.

How to use chrome driver non headless in dataiku

Answers

Categories

Setup Info

Tags