Using Dataiku

Sort by:
61 - 70 of 5.2k
  • Hello Everyone, I created a connection with my Azure SQL DB using the MS SQL Server connector, the connection went well, but when I clicked on get table list, I got the following error message: Oops: …
    Answered
    Started by samuel_acr_96
    Most recent by Alexandru
    0
    1
    Alexandru
    Last answer by Alexandru

    Hi @samuel_acr_96 ,

    If the issue still persists, can you please open support along with the instance diagnostics taken immediately after your tests? The logs may provide more information on the exact exception

    Thanks

    Alexandru
    Last answer by Alexandru

    Hi @samuel_acr_96 ,

    If the issue still persists, can you please open support along with the instance diagnostics taken immediately after your tests? The logs may provide more information on the exact exception

    Thanks

  • I have a dataset which has Jan, Jan_1, Feb, Feb_1... I was to use Column index to pick the last column. Can you help using Column index without Python?
    Answered
    Started by Poornima
    Most recent by Yasmine_T
    0
    3
    Yasmine_T
    Last answer by Yasmine_T

    Hi again!

    This use case would be better handled by our team in a support ticket if you'd like to create one and follow up on there:

    https://support.dataiku.com/support/tickets/new

    We would be happy to provide support and help you with your use case:)

    Best,

    Yasmine

    Yasmine_T
    Last answer by Yasmine_T

    Hi again!

    This use case would be better handled by our team in a support ticket if you'd like to create one and follow up on there:

    https://support.dataiku.com/support/tickets/new

    We would be happy to provide support and help you with your use case:)

    Best,

    Yasmine

  • I found this article but i have some questions, hope someone can help me. I created a standard webapp with python server, im trying to access the endpoints from postman. I'm sending my project apikey …
    Answered
    Started by pPGrillo
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    Well the post you linked says you should pass the key as an HTTP header variable ("X-DKU-APIKey") not using basic authentication. So change your Postman request to pass the key via the required variable. And the API Designer is something completely different than Webapps so it’s not relevant to your issue.

  • hello everyone, I am currently working on optimizing my DSS flow. I have a scenario that currently takes 20 minutes to execute, and I am looking to reduce this time to just 5 minutes. I would greatly …
    Answered
    Started by HAFEDH
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    So the first thing you need to realise is that while Dataiku allows you to build a complex data pipeline in a visual way without writting any code this is never going to be the most optimal way of loading/preparing large datasets as fast as possible. The fact that DSS persists all the intermediate datasets is both a big advantage (explainability, debug, etc) and a big dissadvantage too (lots of redundant data, lots of reads and writes). Depending on the recipes and connections that you use you may be able to enable SQL pipelines in part of your flow which should make those recipes run faster.

    You should change the flow view to Recipe engines. Any recipe showing as DSS engine should be reviewed because this means the data will have to moved to the DSS server for processing and back to the database for writing the output. This tends to be slower than SQL engine which means the execution happens only on the database without data moving to Dataiku.

    Finally you should review your SQL database and make sure it's sized and tuned accordingly. When you start to get into millions of rows traditional RDBMS databases start to struggle so moving to other technologies that can handle billions of rows at speed will help (like Databricks, Snowflake, BigQuery, etc).

  • Hello, I am interested in understanding how to configure Spark settings to ensure optimal resource allocation. Specifically, I am looking for guidance on configuring parameters like spark.driver.cores…
    Answered
    Started by HAFEDH
    Most recent by Grixis
    0
    1
    Last answer by
    Grixis
    Last answer by Grixis

    Hi HAFEDH,

    If you are working working in a large enterprise, I suppose your dataiku instance is managed by an IT service or specific infras service ? In this case, it's not your responsability to change these settings if you have not been infromed of their effectiveness. And if changing the params has an impact on performance, this means that you can affect the availability of the spark queue. It's up to you to decide whether you want to take the risk, given that you specify that the request to use spark jobs in your entreprise is significant and that, you have no knowledge of spark sessions.

    Nonetheless, technical you can just try to create a pyspark recipe to script whatever you want to benchmark and try fit tuning a batch of differents config empirically. But the result will depend on your task's resource needs, as each type of job you want to optimize has different needs and performance depending on catching, shuffle, memory and parallelization .

  • Is there any tool available to calculate pairwise distance. I have 2 different geo points available in dataframe .
    Answered
    Started by Deep
    Most recent by Yasmine_T
    0
    1
    Last answer by
    Yasmine_T
    Last answer by Yasmine_T

    Hi,

    I hope that you are doing well.

    From my understanding you have two columns (col_1 and col_2) with both geopoint data and would like to calculate the distance between these two points as output in a third column (col_3).

    If that is the case, we do have a processor called geo distance that is meant to computes the geodesic distance between a geospatial column and another geospatial object. The computation outputs a number of distance units (kilometers, miles) in another column.

    In the case of a distance between two geometries, the distance is the shortest distance between these two.

    (see: https://doc.dataiku.com/dss/latest/preparation/processors/geo-distance.html).

    We have a step by step tutorial available here: https://knowledge.dataiku.com/latest/data-preparation/prepare-recipe/tutorial-geo-processing.html#compute-the-distance-between-two-geopoints

    If this does not cover your use case/your request is different, please let me know!

    Best,

    Yasmine

  • I have an excel input file. Col A to Col T till row T26, I have data where Col T have latest month data. Col V to Col AO have second set of data till AO50. Now, its dynamic data, every month, a new co…
    Answered
    Started by Poornima
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    No. This has nothing to do with Regex. File based Dataiku datasets use fixed data type schemas. If the file changes you have to manually update the schema. Only Python recipes can write dynamic schemas as their output. And even doing so will complicate your flow so your best option is to pivot the data so months are rows not columns.

  • Newbie here. Trying to convert a SQL from HIVE that pulls records partly based on several JOIN conditions but limits those record based on other JOIN conditions. In SQL it is a "WHERE NOT EXISTS" cond…
    Answered
    Started by Dbase3tate
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    Use a SQL recipe and you can copy / paste your SQL.

  • Hello everyone, I would like to prevent python from inferring the data type of my dataframe during a python recipe . For example, I would like an id column to remain in string type rather than dataiku…
    Answered
    Started by Natpap
    Most recent by Turribeach
    0
    1
    Last answer by
  • Hello everyone ! I have a dataset with empty values in one of the columns (col1) and I use a group by recipe on an other column (col2) without empty values with col1_distinct as aggregation. I get a v…
    Answered ✓
    Started by Max334
    Most recent by LucOBrien
    0
    4
    Solution by
    Max334
    Solution by Max334

    Ok I found the solution : when I run

    if(isNonBlank(col1) && col1> col3, '', col1)

    that works beacause the '' value isn't the default value.

61 - 70 of 5.2k7