Sharepoint Plugin and DataType issue
Hi,
I am reading in data from Sharepoint. The number is coming over as a % formatted decimal, so 94.5%, and other numbers are coming over formatted as numbers with thousand separators.
No matter what I set in the schema of the dataset, when I read in as a dataframe, I am getting a lot of (i.e. all) NANs indicating that it isnt able to convert the value from SPOL to a decimal. The dtypes of the data frame is indicating its a float64, but it is obviously not able to read from the data set definition the formatted values and convert these, meaning everything is NaN.
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @indy2005
,In this case, you can manually set the schema to string in your Sharepoint input dataset.
Then use infer_with_pandas=False in the get_dataframe() when reading the data frame in Python and format your columns in your python code. https://www.w3schools.com/python/ref_string_format.asp
So to convert 30% to 0.3 you can do :
df[<span class="hljs-string">'col'</span>] = df[<span class="hljs-string">'col'</span>].<span class="hljs-built_in">str</span>.rstrip(<span class="hljs-string">'%'</span>).astype(<span class="hljs-string">'float'</span>) / <span class="hljs-number">100.0</span>
https://www.w3schools.com/python/ref_string_format.asp
Let me know if that helps!