The order of rows in a dataset

Options
ben_p
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
edited July 16 in Using Dataiku

Howdy folks,

I am building a data pipeline in DSS that creates a table in Google BigQuery with a SQL recipe , then uploads this table to Google Analytics via a custom plugin.

In order for the upload to succeed I need the first rows of my data table to be specific values, which are then parsed in my plugin.

I have used this several times without issue, but I have one particular dataset where the headers are showing at the bottom of the dataset, and this is breaking my pipeline.

Here is the top of my dataset, where no header row is present:

top.PNG

If I load all the rows and scroll to the bottom, I see my header row here:

bottom.PNG

When looking in the source table in BigQuery, the header row is the first row:

bq.PNG

Here is the SQL I am running in the previous recipe, to build this dataset:

CREATE OR REPLACE TABLE `project.Customers.ie_segments_to_ga` AS
SELECT
'ga:dimension1' AS UserID,
'ga:dimension6' AS customer_group,
'ga:dimension7' AS recency_group,
'ga:dimension8' AS aov_group;

INSERT `project.Customers.ie_segments_to_ga`

SELECT
UserID,
customer_group,
recency_group,
aov_group
FROM `project.Customers.ie_user_facts_${yd_date}`
WHERE orders >0;

SELECT * FROM `project.Customers.ie_segments_to_ga`

I am running almost identical code for two other jobs and the header row is appearing where at should, at the top of the dataset.

The dataset is using default settings, with no ordering or filtering.

I have tried recreating the dataset from scratch, but the error persists.

If I create a new dataset and point it to the table in BigQuery the row appears on the top, but I'd rather not have to break my flow into extra sections as a workaround.

What could be causing the row to drop to the bottom in this case?

Answers

  • emate
    emate Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
    Options

    Hi @ben_p

    This may be a long shot but have you tried adding something like:

    order by WHERE UserID LIKE '%ga:%' DESC

    Mateusz

  • ben_p
    ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
    Options

    Hi @emate,

    Good call, but unfortunately even when I specify an order, for example:

    SELECT * FROM `project.Customers.ie_segments_to_ga`
    ORDER BY UserID DESC;

    I am still not seeing the header row at the top - even though it is when I run the query direct in the source database.

    How odd! I've got no idea what is changing the row order on this dataset.

  • emate
    emate Dataiku DSS Core Designer, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 91 ✭✭✭✭✭✭
    Options

    @ben_p
    Last try....

    SQL.png

    SELECT * FROM NAMES
    ORDER BY CASE WHEN Name LIKE "%Lucy%" THEN 1 ELSE 2 END; Maybe try this one?

    So in your case I would go like

    SELECT * FROM `project.Customers.ie_segments_to_ga`
    ORDER BY CASE WHEN UserID LIKE "%ga:%" THEN 1 ELSE 2 END;

  • ben_p
    ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
    Options

    Thanks for your efforts @emate
    but unfortunately, even though the SQL does work in the source DB, it won;t reorder the DSS dataset!

    I have worked around this by switching the data source from my SQL recipe from BigQuery to Google Cloud Storage, which is now working.

    Ben

Setup Info
    Tags
      Help me…