The order of rows in a dataset
Howdy folks,
I am building a data pipeline in DSS that creates a table in Google BigQuery with a SQL recipe , then uploads this table to Google Analytics via a custom plugin.
In order for the upload to succeed I need the first rows of my data table to be specific values, which are then parsed in my plugin.
I have used this several times without issue, but I have one particular dataset where the headers are showing at the bottom of the dataset, and this is breaking my pipeline.
Here is the top of my dataset, where no header row is present:
If I load all the rows and scroll to the bottom, I see my header row here:
When looking in the source table in BigQuery, the header row is the first row:
Here is the SQL I am running in the previous recipe, to build this dataset:
CREATE OR REPLACE TABLE `project.Customers.ie_segments_to_ga` AS
SELECT
'ga:dimension1' AS UserID,
'ga:dimension6' AS customer_group,
'ga:dimension7' AS recency_group,
'ga:dimension8' AS aov_group;
INSERT `project.Customers.ie_segments_to_ga`
SELECT
UserID,
customer_group,
recency_group,
aov_group
FROM `project.Customers.ie_user_facts_${yd_date}`
WHERE orders >0;
SELECT * FROM `project.Customers.ie_segments_to_ga`
I am running almost identical code for two other jobs and the header row is appearing where at should, at the top of the dataset.
The dataset is using default settings, with no ordering or filtering.
I have tried recreating the dataset from scratch, but the error persists.
If I create a new dataset and point it to the table in BigQuery the row appears on the top, but I'd rather not have to break my flow into extra sections as a workaround.
What could be causing the row to drop to the bottom in this case?
Answers
-
Hi @ben_p
This may be a long shot but have you tried adding something like:
order by WHERE UserID LIKE '%ga:%' DESC
Mateusz
-
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Hi @emate,
Good call, but unfortunately even when I specify an order, for example:
SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY UserID DESC;I am still not seeing the header row at the top - even though it is when I run the query direct in the source database.
How odd! I've got no idea what is changing the row order on this dataset.
-
@ben_p
Last try....SELECT * FROM NAMES
ORDER BY CASE WHEN Name LIKE "%Lucy%" THEN 1 ELSE 2 END; Maybe try this one?So in your case I would go like
SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY CASE WHEN UserID LIKE "%ga:%" THEN 1 ELSE 2 END; -
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Thanks for your efforts @emate
but unfortunately, even though the SQL does work in the source DB, it won;t reorder the DSS dataset!I have worked around this by switching the data source from my SQL recipe from BigQuery to Google Cloud Storage, which is now working.
Ben