Quantcast
Channel: SCN: Message List
Viewing all articles
Browse latest Browse all 2720

Regex - to identify non printable characters

$
0
0

In the online REGEX tester, I am able to see that [^[:print:]] regex is able to correctly identify TAB as a non-printable character.

 

2016-04-14 09_25_19-Online regex tester and debugger_ JavaScript, Python, PHP, and PCRE.jpg

 

But, when I use the same REGEX in ABAP, it doesn't find TAB as a non-printable character. I wrote this simple program to try an loop through all unicode characters and see how many of them ABAP identifies as "printable".

    1  constants:hex type char16 value'0123456789ABCDEF'.

    2 

    3  data:     p   type i,

    4            q   type i,

    5            r   type i,

    6            s   type i,

    7            str type char4,

    8            val type string,

    9            rpl type string.

   10 

   11  do 16 times.

   12    p = sy-index - 1.

   13    do 16 times.

   14      q = sy-index - 1.

   15      do 16 times.

   16        r = sy-index - 1.

   17        do 16 times.

   18          s = sy-index - 1.

   19          str =|{ hex+p(1) }{ hex+q(1) }{ hex+r(1) }{ hex+s(1) }|.

   20          rpl = val =cl_abap_conv_in_ce=>uccp( str ).

   21          replacealloccurrencesofregex'[^[:print:]]'in rpl with` `.

   22          if rpl = val.

   23            write:/ str, val.

   24          endif.

   25        enddo.

   26      enddo.

   27    enddo.

   28  enddo.

 

I got many characters, which were skipped by ABAP, saying they are all printable.

 

TAB:

2016-04-14 09_28_37-1.jpg

Many many other non-printable characters:

2016-04-14 09_29_20-1.jpg

2016-04-14 09_29_30-1.jpg

I understand that some of these characters, may be appearing on my system as [] because of missing font on my system - am I right?


But, how about TAB character? Is this a bug? ABAP correctly identifies new line characters as non-printable.


I tried using regex [:cntrl:] and the condition was worse, as shown below. It couldn't catch TAB as well as NEWLINE.


2016-04-14 09_37_28-1.jpg


Inviting Harald Boeing, Michael Kozlowski, Clemens Li, to this discussion, as I got this idea from their posts.


BTW, I already know that I can replace TAB character by explicitly specifying that in the REPLACE statement.

 

Thanks,

Juwin


Viewing all articles
Browse latest Browse all 2720

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>