We’re pleased to share that Dataiku has published an OnCrawl plugin.
At OnCrawl, we are convinced that data science, like technical SEO, is essential to strategic decision-making in forward-looking companies today. The complexity of today's markets, the sheer volume of data available affecting SEO, the growing opacity of search engine ranking algorithms, and the ability to easily manipulate and analyze data now makes the difference between SEO as a marketing tool, and SEO as an executive-level strategy.
The search market is getting more and more competitive. It is thus important to optimize your crawl budget - a budget allowed by Google to analyze and rank your website - in order to focus on the right SEO projects and to increase conversions. In this article, we are going to explain how to easily deploy a method to predict your crawl budget.
OnCrawl is an award-winning technical SEO platform that helps you open Google’s blackbox to make smarter decisions. The solution helps 1000+ companies improve their traffic, rankings and revenues by supporting their search engine optimization plans with reliable data and actionable dashboards. The platform offers:
Some features of Dataiku DSS are delivered as separate plugins. A DSS plugin is a zip file.
Inside DSS, click the top right gear → Administration → Plugins → Store
In the search engine, type “OnCrawl” and click on “Install”:
The plugin is now ready-to-use.
You will need to have a subscription that includes the OnCrawl API option. If you don’t have an API access yet, please reach out to our team using the Intercom blue chat button or by sending an email at sales@oncrawl.com
From your account settings page, scroll down to the "API Access Tokens" section. Click on "View API Access Tokens" to see the list of tokens you have generated or click on “Add API Access Token” if you want to create a new one.
Google daily sets up the amount of resources dedicated to crawl your website. This is what is called “Crawl Budget”. The mission of any SEO team is to drive Google’s crawl budget to the pages that matter. This article aims at explaining the method to predict your crawl budget and we’ll be sharing the full project with a zip file. Different use cases can be addressed:
This plugin provides 3 visual recipes to train and deploy R forecasting models on yearly to hourly time series.
It covers the full cycle of data cleaning, model training, evaluation and prediction.
With the 2 Dataiku DSS plugins, the workflow is simple to use:
You can download the project zip file here
You can display the result directly in Dataiku DSS in order to compare the forecast with historical data.
2 important notes:
Looking at the graph you can see that the crawl budget will increase in the upcoming weeks. It means that your website is SEO compliant and optimized for:
If you wish to enhance your crawl budget, please read the following blog post to take advantage of advanced OnCrawl features to improve your efficiency during daily SEO monitoring.
You can now improve your data workflow and monitor your crawl budget by category or subcategory in order to detect SEO issues or detect the best new products for the next weeks. It might also be interesting to monitor your crawl budget based on the different Google bots (google_image, google_smartphone, google_web_search…).
If you don’t have a Dataiku DSS licence, you can test this project with the free edition.
You can try OnCrawl with the 14-day free trial and request API access by reaching out using the blue Intercom chat button.
About the author
Vincent Terrasi is Product Director at OnCrawl since 2019 after working as a Data Marketing Manager at OVH. He is also the co-founder of dataseolabs.com where he offers training about Data Science and SEO.
Vincent has a very varied background with 7 years of entrepreneurship for his own sites, then 3 years at M6Web and 3 years at OVH as Data Marketing Manager. He’s a pioneer in Data Science and Machine Learning for SEO in France.