Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on August 18, 2015 6:48PM
Likes: 0
Replies: 2
Hi again
I'm trying to play with geozip processor and with elasticsearch / Kibana. My sample contains two columns "zipcode,country" and the geozip processor generates me a third column with a Geopoint value (that processor is perfect!) . I also added an other string column which is the concatenation of latitude and longitude, separed by a comma (this is a valid Geopoint format in ES). Example :
42800;POINT (4.6657 45.5598);4.6657,45.5598;France;
69480;POINT (4.7028 45.9126);4.7028,45.9126;France;
I syncronise those datas to elasticsearch. Unfortunately, the automatically inferred schema is not detecting geoPoint types:
{
"test" : {
"mappings" : {
"test" : {
"properties" : {
"Country" : {
"type" : "string",
"fields" : {
"Country_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"PostCodeZip" : {
"type" : "long",
"store" : true
},
"geo" : {
"type" : "string",
"fields" : {
"geo_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"geopoint" : {
"type" : "string",
"fields" : {
"geopoint_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
}
}
So I tried to delete the ES index and apply my own schema with correct fields types, through a mapping :
curl -XPUT ../test/test/_mapping -d '{
"test" : {
"mappings" : {
"test" : {
"properties" : {
"Country" : {
"type" : "string",
"fields" : {
"Country_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"PostCodeZip" : {
"type" : "long",
"store" : true
},
"geo" : {
"type" : "string",
"fields" : {
"geo_facet" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},
"geopoint" : {
"type" : "geo_point"
}
}
}
}
}
}
}
}
'
This mapping is erased by DSS before uploading data and a new schema (with wrong types) is auto inferred.
I found the DSS configuration file for this "sync" module and I edit it to change type from "string" to "geo_point" (valid ES type), just like this :
File $DSS_FOLDER/projects/<project_name>/datasets/<sync_name>.json
.....
{
"name": "geopoint",
"type": "geo_point",
"maxLength": -1
},
.....
It generates the following error :
[12:12:09] [ERROR] [dku.flow.jobrunner] running sync_test_NP - Activity unexpectedly failed
java.lang.IllegalArgumentException: in running sync_test_NP: Type not found: geo_point
at com.dataiku.dip.utils.ErrorContext.iae(ErrorContext.java:82)
at com.dataiku.dip.datasets.Type.forName(Type.java:97)
at com.dataiku.dip.coremodel.SchemaColumn.getType(SchemaColumn.java:82)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchUtils.getElasticSearchType(ElasticSearchUtils.java:55)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchUtils.getMappingDefinition(ElasticSearchUtils.java:125)
at com.dataiku.dip.datasets.elasticsearch.ElasticSearchOutput$ElasticSearchOutputWriter.init(ElasticSearchOutput.java:147)
at com.dataiku.dip.dataflow.exec.stream.ToDatasetStreamSplitRunner.init(ToDatasetStreamSplitRunner.java:55)
at com.dataiku.dip.dataflow.exec.sync.FSToAny.init(FSToAny.java:67)
at com.dataiku.dip.dataflow.exec.SyncRecipeRunner.init(SyncRecipeRunner.java:110)
at com.dataiku.dip.dataflow.jobrunner.ExecutionRunnablesBuilder.getRunnables(ExecutionRunnablesBuilder.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:383)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:102)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$700(JobRunner.java:27)
at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:263)
any ideas ?