Unicode defines a number of "categories", which can be referenced with
"\p{Code}
" and "\P{Code}
", using either one or two letter codes to represent
which category of characters they belong to.
For details of which characters are matched, consult the documentation for java.lang.Character or the Unicode Category details.
Code | Description |
---|---|
C | all control chars |
Cc | cntrl |
Cf | format |
Cn | unassigned |
Co | private use |
Cs | surrogate |
L | all letters |
L1 | Latin-1 |
LD | letter or digit |
Ll | lowercase letter |
Lm | modifier letter |
Lo | other letter |
Lt | titlecase letter |
Lu | uppercase letter |
M | all mark |
Mc | combining spacing mark |
Me | enclosing mark |
Mn | non spacing mark |
N | all numbers |
Nd | decimal digit number |
Nl | letter number |
No | other number |
P | all punctuation |
Pc | connector punctuation |
Pd | dash punctuation |
Pe | end punctuation |
Po | other punctuation |
Ps | start punctuation |
S | all symbols |
Sc | currency symbol |
Sk | modifier symbol |
Sm | math symbol |
So | other symbol |
Z | all separators |
Zl | line separator |
Zp | paragraph separator |
Zs | space separator |