Sharepoint Plugin and DataType issue

indy2005
indy2005 Registered Posts: 21 ✭✭✭✭

Hi,

I am reading in data from Sharepoint. The number is coming over as a % formatted decimal, so 94.5%, and other numbers are coming over formatted as numbers with thousand separators.

No matter what I set in the schema of the dataset, when I read in as a dataframe, I am getting a lot of (i.e. all) NANs indicating that it isnt able to convert the value from SPOL to a decimal. The dtypes of the data frame is indicating its a float64, but it is obviously not able to read from the data set definition the formatted values and convert these, meaning everything is NaN.

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    Answer ✓

    Hi @indy2005
    ,

    In this case, you can manually set the schema to string in your Sharepoint input dataset.

    Then use infer_with_pandas=False in the get_dataframe() when reading the data frame in Python and format your columns in your python code. https://www.w3schools.com/python/ref_string_format.asp

    So to convert 30% to 0.3 you can do :

    df[<span class="hljs-string">'col'</span>] = df[<span class="hljs-string">'col'</span>].<span class="hljs-built_in">str</span>.rstrip(<span class="hljs-string">'%'</span>).astype(<span class="hljs-string">'float'</span>) / <span class="hljs-number">100.0</span>

    https://www.w3schools.com/python/ref_string_format.asp

    Let me know if that helps!

Setup Info
    Tags
      Help me…