The order of rows in a dataset

ben_p · February 2021

Howdy folks,

I am building a data pipeline in DSS that creates a table in Google BigQuery with a SQL recipe , then uploads this table to Google Analytics via a custom plugin.

In order for the upload to succeed I need the first rows of my data table to be specific values, which are then parsed in my plugin.

I have used this several times without issue, but I have one particular dataset where the headers are showing at the bottom of the dataset, and this is breaking my pipeline.

Here is the top of my dataset, where no header row is present:

If I load all the rows and scroll to the bottom, I see my header row here:

When looking in the source table in BigQuery, the header row is the first row:

Here is the SQL I am running in the previous recipe, to build this dataset:

CREATE OR REPLACE TABLE `project.Customers.ie_segments_to_ga` AS
SELECT
'ga:dimension1' AS UserID,
'ga:dimension6' AS customer_group,
'ga:dimension7' AS recency_group,
'ga:dimension8' AS aov_group;

INSERT `project.Customers.ie_segments_to_ga`

SELECT 
UserID,
customer_group,
recency_group,
aov_group
FROM `project.Customers.ie_user_facts_${yd_date}`
WHERE orders >0;

SELECT * FROM `project.Customers.ie_segments_to_ga`

I am running almost identical code for two other jobs and the header row is appearing where at should, at the top of the dataset.

The dataset is using default settings, with no ordering or filtering.

I have tried recreating the dataset from scratch, but the error persists.

If I create a new dataset and point it to the table in BigQuery the row appears on the top, but I'd rather not have to break my flow into extra sections as a workaround.

What could be causing the row to drop to the bottom in this case?

emate · February 2021

Hi @ben_p

This may be a long shot but have you tried adding something like:

order by WHERE UserID LIKE '%ga:%' DESC

Mateusz

ben_p · February 2021

Hi @emate,

Good call, but unfortunately even when I specify an order, for example:

SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY UserID DESC;

I am still not seeing the header row at the top - even though it is when I run the query direct in the source database.

How odd! I've got no idea what is changing the row order on this dataset.

emate · February 2021

@ben_p
Last try....

SELECT * FROM NAMES
ORDER BY CASE WHEN Name LIKE "%Lucy%" THEN 1 ELSE 2 END; Maybe try this one?

So in your case I would go like

SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY CASE WHEN UserID LIKE "%ga:%" THEN 1 ELSE 2 END;

ben_p · February 2021

Thanks for your efforts @emate
but unfortunately, even though the SQL does work in the source DB, it won;t reorder the DSS dataset!

I have worked around this by switching the data source from my SQL recipe from BigQuery to Google Cloud Storage, which is now working.

Ben

The order of rows in a dataset

Answers

Categories

Setup Info

Tags