How to use chrome driver non headless in dataiku
I'm trying to use chrome driver headless in DSS to parse data from a website.
Is there a way to use non headless to visualize the parsing activity on the screen?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,971 Neuron
This is something you would develop / test locally in your machine and then deploy the automation to run somewhere else. Is there a reason as to why you can't take this approach?
-
Hi,
If by using Chrome driver headful you mean being able to see the page in UI, then there is no way to do that. However, if you want to be able to parse the page the same way as in a Browser using developer tools, you can do that in a Notebook with Selenium. The prerequisites will be downloading the Chrome driver and adding it to PATH on the DSS server or specifying the path to the driver directly in the code. Also, you may get an error regarding missing "Xvfb" (I got the error in my testing). In that case, you can fix the issue with the below OS package:sudo yum install xorg-x11-server-Xvfb
Then create a code env adding the below packages (I used Python3.9 in my test):
pyvirtualdisplay selenium
Then try the code below to grab a specific element from a website:
from pyvirtualdisplay import Display from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from selenium.webdriver.common.by import By # Start virtual display using Xvfb display = Display(visible=0, size=(800, 600)) display.start() # Path to your ChromeDriver executable. chromedriver_path = "/usr/bin/chrome/chrome-linux64/chromedriver" # change to your path # Set up Chrome service with executable_path. chrome_service = ChromeService(executable_path=chromedriver_path) # Set up Chrome options chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--disable-extensions') chrome_options.add_argument('--no-sandbox') # chrome_options.add_argument("--headless=new") # enable headless for Chrome >= 109 # Create a Chrome driver instance driver = webdriver.Chrome(service=chrome_service, options=chrome_options) # Example: Navigate to a website driver.get("https://www.selenium.dev/") # Example: Extract data from the website # ... Your scraping code here ... element = driver.find_element(By.CSS_SELECTOR, "body > div.container-fluid.td-default.td-outer > main > section.row.td-box.td-box--gradient.-bg-selenium-green.p-2 > div > div > div > h1") print(element.text) # Close the driver driver.quit() # Stop the virtual display display.stop()
Hope this helps.
Best.