Evaluate isDelimiter once in nextToken whitespace skip#618
Conversation
garydgregory
left a comment
There was a problem hiding this comment.
Hello @rootvector2
Thank you for the PR.
Would you please add a test that demonstrates why this is an issue at public API level?
TY!
exercises the empty-field-dropped regression through CSVParser, not just the lexer.
|
Added |
|
Merged, TY @rootvector2. Do you see any other issues with whitespace in any other configuration parameters? |
|
Took a look. The double-consume was specific to I also ran a whitespace-prefixed multi-char delimiter through the other whitespace-sensitive configs: |
nextTokenevaluates the side-effectingisDelimiter(c)twice on the same character whenignoreSurroundingSpacesis on: once in the leading-whitespace skip loop and again in the start-of-token check.isDelimiterconsumes the trailing characters of a multi-character delimiter as it matches, so for a delimiter whose first character is whitespace the loop eats the delimiter and the second call then peeks past it, mismatches, and the empty field at that boundary is dropped. With delimiter" |"andignoreSurroundingSpacesenabled,a | |bparsed to two fields (a," b") instead of three (a, empty,b), and|ato one field (" a") instead of two (empty,a). Fixed by evaluatingisDelimiteronce in the whitespace loop and reusing that result for the start-of-token decision.Found while auditing the multi-character delimiter look-ahead against
ignoreSurroundingSpaces.mvn; that'smvnon the command line by itself.