Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
The coverage of plug-in is identified as follows:
"With this plugin, you will be able to:
Can anyone share an idea about where to find which language is supported for which functionality?
Thanks in advance
AS
Hi @ahmet_selcuk!
Welcome to Dataiku community!
The supported langages can be found in the source code of the plugin, or by installing it. Here is the list of supported langages for each category:
Langage detection:
{
"af": "Afrikaans",
"sq": "Albanian",
"am": "Amharic",
"ar": "Arabic",
"an": "Aragonese",
"hy": "Armenian",
"as": "Assamese",
"az": "Azerbaijani",
"eu": "Basque",
"be": "Belarusian",
"bn": "Bengali",
"bs": "Bosnian",
"br": "Breton",
"bg": "Bulgarian",
"my": "Burmese",
"ca": "Catalan",
"km": "Central Khmer",
"zh": "Chinese",
"hr": "Croatian",
"cs": "Czech",
"da": "Danish",
"nl": "Dutch",
"dz": "Dzongkha",
"en": "English",
"eo": "Esperanto",
"et": "Estonian",
"fo": "Faroese",
"fi": "Finnish",
"fr": "French",
"gl": "Galician",
"ka": "Georgian",
"de": "German",
"el": "Greek",
"gu": "Gujarati",
"ht": "Haitian",
"ha": "Hausa",
"he": "Hebrew",
"hi": "Hindi",
"hu": "Hungarian",
"is": "Icelandic",
"ig": "Igbo",
"id": "Indonesian",
"ga": "Irish",
"it": "Italian",
"ja": "Japanese",
"jv": "Javanese",
"kn": "Kannada",
"kk": "Kazakh",
"rw": "Kinyarwanda",
"ky": "Kirghiz",
"ko": "Korean",
"ku": "Kurdish",
"lo": "Lao",
"la": "Latin",
"lv": "Latvian",
"lt": "Lithuanian",
"lb": "Luxembourgish",
"mk": "Macedonian",
"mg": "Malagasy",
"ms": "Malay",
"ml": "Malayalam",
"mt": "Maltese",
"mi": "Maori",
"mr": "Marathi",
"mn": "Mongolian",
"ne": "Nepali",
"se": "Northern Sami",
"nb": "Norwegian Bokmรฅl",
"nn": "Norwegian Nynorsk",
"no": "Norwegian",
"ny": "Nyanja",
"oc": "Occitan",
"or": "Oriya",
"pa": "Panjabi",
"fa": "Persian",
"pl": "Polish",
"pt": "Portuguese",
"ps": "Pushto",
"qu": "Quechua",
"ro": "Romanian",
"ru": "Russian",
"sm": "Samoan",
"gd": "Scottish Gaelic",
"sr": "Serbian",
"sn": "Shona",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"st": "Southern Sotho",
"es": "Spanish",
"su": "Sundanese",
"sw": "Swahili",
"sv": "Swedish",
"tl": "Tagalog",
"tg": "Tajik",
"ta": "Tamil",
"te": "Telugu",
"th": "Thai",
"tr": "Turkish",
"ug": "Uighur",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"vo": "Volapรผk",
"wa": "Walloon",
"cy": "Welsh",
"fy": "Western Frisian",
"xh": "Xhosa",
"yi": "Yiddish",
"yo": "Yoruba",
"zu": "Zulu",
}
Langages misspellings:
{
"ar": "Arabic",
"bg": "Bulgarian",
"ca": "Catalan",
"cs": "Czech",
"da": "Danish",
"de": "German",
"el": "Greek",
"en": "English",
"es": "Spanish",
"et": "Estonian",
"fa": "Persian",
"fi": "Finnish",
"fr": "French",
"he": "Hebrew",
"hr": "Croatian",
"hu": "Hungarian",
"id": "Indonesian",
"is": "Icelandic",
"it": "Italian",
"ja": "Japanese",
"ko": "Korean",
"lt": "Lithuanian",
"lv": "Latvian",
"nl": "Dutch",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sk": "Slovak",
"sl": "Slovenian",
"sq": "Albanian",
"sr": "Serbian",
"sv": "Swedish",
"th": "Thai",
"tr": "Turkish",
"uk": "Ukrainian",
"vi": "Vietnamese",
"zh": "Chinese (simplified)",
}
Langages for tokenization, filtering and lemmatization:
{
"af": "Afrikaans",
"ar": "Arabic",
"bg": "Bulgarian",
"bn": "Bengali",
"ca": "Catalan",
"cs": "Czech",
"da": "Danish",
"de": "German",
"el": "Greek",
"en": "English",
"es": "Spanish",
"et": "Estonian",
"eu": "Basque",
"fa": "Persian",
"fi": "Finnish",
"fr": "French",
"ga": "Irish",
"gu": "Gujarati",
"he": "Hebrew",
"hi": "Hindi",
"hr": "Croatian",
"hu": "Hungarian",
"hy": "Armenian",
"id": "Indonesian",
"is": "Icelandic",
"it": "Italian",
"kn": "Kannada",
"lb": "Luxembourgish",
"lt": "Lithuanian",
"lv": "Latvian",
"mk": "Macedonian",
"ml": "Malayalam",
"mr": "Marathi",
"nb": "Norwegian Bokmรฅl",
"ne": "Nepali",
"nl": "Dutch",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"sq": "Albanian",
"sr": "Serbian",
"sv": "Swedish",
"ta": "Tamil",
"te": "Telugu",
"th": "Thai",
"tl": "Tagalog",
"tr": "Turkish",
"tt": "Tatar",
"uk": "Ukrainian",
"ur": "Urdu",
"vi": "Vietnamese",
"yo": "Yoruba",
"zh": "Chinese (simplified)",
}
Hope this helps ๐
Hi @ahmet_selcuk!
Welcome to Dataiku community!
The supported langages can be found in the source code of the plugin, or by installing it. Here is the list of supported langages for each category:
Langage detection:
{
"af": "Afrikaans",
"sq": "Albanian",
"am": "Amharic",
"ar": "Arabic",
"an": "Aragonese",
"hy": "Armenian",
"as": "Assamese",
"az": "Azerbaijani",
"eu": "Basque",
"be": "Belarusian",
"bn": "Bengali",
"bs": "Bosnian",
"br": "Breton",
"bg": "Bulgarian",
"my": "Burmese",
"ca": "Catalan",
"km": "Central Khmer",
"zh": "Chinese",
"hr": "Croatian",
"cs": "Czech",
"da": "Danish",
"nl": "Dutch",
"dz": "Dzongkha",
"en": "English",
"eo": "Esperanto",
"et": "Estonian",
"fo": "Faroese",
"fi": "Finnish",
"fr": "French",
"gl": "Galician",
"ka": "Georgian",
"de": "German",
"el": "Greek",
"gu": "Gujarati",
"ht": "Haitian",
"ha": "Hausa",
"he": "Hebrew",
"hi": "Hindi",
"hu": "Hungarian",
"is": "Icelandic",
"ig": "Igbo",
"id": "Indonesian",
"ga": "Irish",
"it": "Italian",
"ja": "Japanese",
"jv": "Javanese",
"kn": "Kannada",
"kk": "Kazakh",
"rw": "Kinyarwanda",
"ky": "Kirghiz",
"ko": "Korean",
"ku": "Kurdish",
"lo": "Lao",
"la": "Latin",
"lv": "Latvian",
"lt": "Lithuanian",
"lb": "Luxembourgish",
"mk": "Macedonian",
"mg": "Malagasy",
"ms": "Malay",
"ml": "Malayalam",
"mt": "Maltese",
"mi": "Maori",
"mr": "Marathi",
"mn": "Mongolian",
"ne": "Nepali",
"se": "Northern Sami",
"nb": "Norwegian Bokmรฅl",
"nn": "Norwegian Nynorsk",
"no": "Norwegian",
"ny": "Nyanja",
"oc": "Occitan",
"or": "Oriya",
"pa": "Panjabi",
"fa": "Persian",
"pl": "Polish",
"pt": "Portuguese",
"ps": "Pushto",
"qu": "Quechua",
"ro": "Romanian",
"ru": "Russian",
"sm": "Samoan",
"gd": "Scottish Gaelic",
"sr": "Serbian",
"sn": "Shona",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"st": "Southern Sotho",
"es": "Spanish",
"su": "Sundanese",
"sw": "Swahili",
"sv": "Swedish",
"tl": "Tagalog",
"tg": "Tajik",
"ta": "Tamil",
"te": "Telugu",
"th": "Thai",
"tr": "Turkish",
"ug": "Uighur",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"vo": "Volapรผk",
"wa": "Walloon",
"cy": "Welsh",
"fy": "Western Frisian",
"xh": "Xhosa",
"yi": "Yiddish",
"yo": "Yoruba",
"zu": "Zulu",
}
Langages misspellings:
{
"ar": "Arabic",
"bg": "Bulgarian",
"ca": "Catalan",
"cs": "Czech",
"da": "Danish",
"de": "German",
"el": "Greek",
"en": "English",
"es": "Spanish",
"et": "Estonian",
"fa": "Persian",
"fi": "Finnish",
"fr": "French",
"he": "Hebrew",
"hr": "Croatian",
"hu": "Hungarian",
"id": "Indonesian",
"is": "Icelandic",
"it": "Italian",
"ja": "Japanese",
"ko": "Korean",
"lt": "Lithuanian",
"lv": "Latvian",
"nl": "Dutch",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sk": "Slovak",
"sl": "Slovenian",
"sq": "Albanian",
"sr": "Serbian",
"sv": "Swedish",
"th": "Thai",
"tr": "Turkish",
"uk": "Ukrainian",
"vi": "Vietnamese",
"zh": "Chinese (simplified)",
}
Langages for tokenization, filtering and lemmatization:
{
"af": "Afrikaans",
"ar": "Arabic",
"bg": "Bulgarian",
"bn": "Bengali",
"ca": "Catalan",
"cs": "Czech",
"da": "Danish",
"de": "German",
"el": "Greek",
"en": "English",
"es": "Spanish",
"et": "Estonian",
"eu": "Basque",
"fa": "Persian",
"fi": "Finnish",
"fr": "French",
"ga": "Irish",
"gu": "Gujarati",
"he": "Hebrew",
"hi": "Hindi",
"hr": "Croatian",
"hu": "Hungarian",
"hy": "Armenian",
"id": "Indonesian",
"is": "Icelandic",
"it": "Italian",
"kn": "Kannada",
"lb": "Luxembourgish",
"lt": "Lithuanian",
"lv": "Latvian",
"mk": "Macedonian",
"ml": "Malayalam",
"mr": "Marathi",
"nb": "Norwegian Bokmรฅl",
"ne": "Nepali",
"nl": "Dutch",
"pl": "Polish",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"sq": "Albanian",
"sr": "Serbian",
"sv": "Swedish",
"ta": "Tamil",
"te": "Telugu",
"th": "Thai",
"tl": "Tagalog",
"tr": "Turkish",
"tt": "Tatar",
"uk": "Ukrainian",
"ur": "Urdu",
"vi": "Vietnamese",
"yo": "Yoruba",
"zh": "Chinese (simplified)",
}
Hope this helps ๐
Great help, thanks ๐