you can try splitting using this regex
([\d,]+|[a-zA-Z]+ *[a-zA-Z]*) //note the spacing between + and *.
- [0-9,]+ // will search for one or more digits and commas
-
[a-zA-Z]+ [a-zA-Z] // will search for a word, followed by a space(if any) followed by another word(if any).
String regEx = "[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*";
you use them like this
public static void main(String args[]) {
String input = new String("2 Marine Cargo 14,642 10,528 16,016 more text 8,609 argA 2,106 argB");
System.out.println("Return Value :" );
Pattern pattern = Pattern.compile("[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*");
ArrayList<String> result = new ArrayList<String>();
Matcher m = pattern.matcher(input);
while (m.find()) {
System.out.println(">"+m.group(0)+"<");
result.add(m.group(0));
}
}
The following is the output as well as a detailed explaination of the RegEx that is autogenerated from https://regex101.com
1st Alternative [0-9,]+
Match a single character present in the list below [0-9,]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
, matches the character , literally (case sensitive)
2nd Alternative [a-zA-Z]+ *[a-zA-Z]*
Match a single character present in the list below [a-zA-Z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)
* matches the character literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [a-zA-Z]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)
solved How to Split text by Numbers and Group of words