Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 16, 2017 5:13PM
Likes: 0
Replies: 4
Hello,
I'm Dataiku and ML beginner, so excuse my (maybe) simple question.
I have a dataset with data on internet companies. Originally it came with ">" and "," separated info on target markets (column: markets). There are some extra columns on eg. #of employees, financing etc to the right.
My goal is to create a model with "activity" as a target variable (it has 3 values: operating, acquired and non-operating). Eg, to identify the most promising markets to "survive", or the most dangerous (causing "non-operation").
My original file had 1 record per company (app. 1 000 companies), with only "markets" column. I started with splitting it, first with ">", and then "," as separators. Finally (after some cleaning and merging) I got the dataset with many records per company, as displayed below, with distinct "market__" features.
My questions:
1. Is it OK for ML model to keep a data on a single company in a form of many records (see picture below)?
2. Is there any other procedure of data preparation (folding, splitting, transformation, etc) You would recommend?
I would greatly appreciate Your help,
Many thanks in advance,
Andy