Do you know the History of Data Science? READ MORE

The order of rows in a dataset

ben_p
Neuron
Neuron
The order of rows in a dataset

Howdy folks,

I am building a data pipeline in DSS that creates a table in Google BigQuery with a SQL recipe , then uploads this table to Google Analytics via a custom plugin.

In order for the upload to succeed I need the first rows of my data table to be specific values, which are then parsed in my plugin.

I have used this several times without issue, but I have one particular dataset where the headers are showing at the bottom of the dataset, and this is breaking my pipeline.

Here is the top of my dataset, where no header row is present:

top.PNG

If I load all the rows and scroll to the bottom, I see my header row here:

bottom.PNG

When looking in the source table in BigQuery, the header row is the first row:

bq.PNG

Here is the SQL I am running in the previous recipe, to build this dataset:

CREATE OR REPLACE TABLE `project.Customers.ie_segments_to_ga` AS
SELECT
'ga:dimension1' AS UserID,
'ga:dimension6' AS customer_group,
'ga:dimension7' AS recency_group,
'ga:dimension8' AS aov_group;

INSERT `project.Customers.ie_segments_to_ga`

SELECT
UserID,
customer_group,
recency_group,
aov_group
FROM `project.Customers.ie_user_facts_${yd_date}`
WHERE orders >0;

SELECT * FROM `project.Customers.ie_segments_to_ga`

I am running almost identical code for two other jobs and the header row is appearing where at should, at the top of the dataset.

The dataset is using default settings, with no ordering or filtering.

I have tried recreating the dataset from scratch, but the error persists.

If I create a new dataset and point it to the table in BigQuery the row appears on the top, but I'd rather not have to break my flow into extra sections as a workaround.

What could be causing the row to drop to the bottom in this case?

0 Kudos
4 Replies
emate
Neuron
Neuron

Hi @ben_p

This may be a long shot but have you tried adding something like:

order by WHERE UserID LIKE '%ga:%' DESC

Mateusz

0 Kudos
ben_p
Neuron
Neuron
Author

Hi @emate,

Good call, but unfortunately even when I specify an order, for example:

SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY UserID DESC;

I am still not seeing the header row at the top - even though it is when I run the query direct in the source database.

How odd! I've got no idea what is changing the row order on this dataset.

0 Kudos
emate
Neuron
Neuron

@ben_p  Last try.... 😛

 

SQL.png

SELECT * FROM NAMES
ORDER BY CASE WHEN Name LIKE "%Lucy%" THEN 1 ELSE 2 END; Maybe try this one?

So in your case I would go like

SELECT * FROM `project.Customers.ie_segments_to_ga`
ORDER BY CASE WHEN UserID LIKE "%ga:%" THEN 1 ELSE 2 END;

0 Kudos
ben_p
Neuron
Neuron
Author

Thanks for your efforts @emate but unfortunately, even though the SQL does work in the source DB, it won;t reorder the DSS dataset!

I have worked around this by switching the data source from my SQL recipe from BigQuery to Google Cloud Storage, which is now working.

Ben

A banner prompting to get Dataiku DSS