skip first row in file using regex

smp Registered Posts: 4 ✭✭✭

Hi community, I am trying to parse a badly shaped file so I need to parse it with Regex. I am currently looking at my dataset and setting the configuration in the Format / Preview tab, using "type : regular expression".

I am currently struggling to skip the first line, which I know in some regex languages can be done with something like


the rest of the regex for now it's stupid (still working on it)


The problem is that if I launch the regex Pattern, I get this error:

Tried format regexp but configuration is not OK: Illegal/unsupported escape sequence near index 5 .*\n\K"(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*)" ^

I have tried to add a new backslash to \K >>> \\K . I no longer have the error, but I also don't see any data being parsed.

Can you please suggest how to translate \K into dataiku regex language?
Or even just pointing to the official documentation of the regex language in use...

thanks a lot


  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17

    Hi, it is hard to provide you with some suggestions without having sample data. A good place to start will be to test your regex in one of the online regex playgrounds like, for example, If you want to use python to parse the file, you can probably try something like the below:

    import re
    regex = re.compile(r'\n(^.*);', re.MULTILINE)
    matches = [m.groups() for m in regex.finditer(string)]
    for m in matches:

    Screenshot 2022-05-06 at 15.31.04.png

    If the above is not what you were looking for, could you provide a file with the sample data?

  • smp
    smp Registered Posts: 4 ✭✭✭

    Hi, thanks for the support!

    I had tried using regex 101 before, but I don't know what syntax is using Dataiky dataset node. With the default configuration in regex 101, I was able to use a special escape command \K that is not available in dataiku.

    I have found a different solution anyways: I have given more specific restrictions to my parsing, such as forcing the first column to be a numeric. This automatically made the parser skip any row that contained text instead of numbers in the first column. The result is that I am practically skipping the first row because it contains the name of the column instead of a numeric value.


Setup Info
      Help me…