[Solved] Truncate text preserving keywords


Javascript Truncate words like Google

const regEsc = (str) => str.replace(/[-\/\\^$*+?.()|[\]{}]/g, "\\$&");

const string = "Lorem Ipsum is simply dummy book text of the printing and text book typesetting industry. Dummy Lorem Ipsum has been the industry's standard dummy Ipsum text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.";
const queryString = "lorem";

const rgxp = new RegExp("(\\S*.{0,10})?("+ regEsc(queryString) +")(.{0,10}\\S*)?", "ig");
const results = [];

string.replace(rgxp, function(m, $1, $2, $3){
  results.push(`${$1?"…"+$1:""}<b>${$2}</b>${$3?$3+"…":""}`);
});

document.body.innerHTML =  string.replace(rgxp, "<span>$1<b>$2</b>$3</span>") ;
span{background:yellow;}
b{color:red}

The RegExp:

Let’s say we have a long string and want to match all book or Book word appearances,
this regex would do it:

/book/ig  

(ig are (case)Insensitive and Global flags)

but we need not only to get book but also some truncated portions of text before and after that match. Let’s say 10 characters before and 10 characters after:

/.{0,10}book.{0,10}/ig

. means any character except linebreak, and {minN, maxN} is the quantifier of how many of such characters we want to match.

To be able to differentiate the prefixed chunk, the match and the suffixed chunk so we can use them separately (i.e: for wrapping in <b> bold tags etc.), let’s use Capturing Group ()

/(.{0,10})(book)(.{0,10})/ig

The above will match both Book and book in

Book an apartment and read a book of nice little fluffy animals”

in order to know when to add Ellipsis we need to make those chunks “optional” let’s apply Lazy Quantifiers ?

/(.{0,10})?(book)(.{0,10})?/ig

now a capturing group might result empty. Used with a Conditional Operator ?: as boolean you can assert ellipsis like: ($1 ? "…"+$1 : "")

now what we captured would look like:

Book an apartm
nd read a book of nice l

(I’ve bolded the queryString just for visuals)

To fix that ugly-cutted words, let’s prepend (append) any number * of non whitespace characters \S

/(\S*.{0,10})?(book)(.{0,10}\S*)?/ig

The result is now:

Book an apartment
and read a book of nice little

(See above’s regex details at regex101)

let’s now convert the Regex notation to RegExp String (escaping the backshash characters and putting our ig flags in the second argument).

new RegExp("(\\S*.{0,10})?(book)(.{0,10}\\S*)?", "ig");

Thanks of the use of new RegExp method we can now pass variables into:

var queryString = "book";
var rgxp = new RegExp("(\\S*.{0,10})?("+ queryString +")(.{0,10}\\S*)?", "ig");

Finally to retrieve and use our three captured Groups we can access them inside the .replace() String parameter using "$1", "$2" and "$3" (See demos).
or also for more freedom we can use instead of String Parameter a callback function passing the needed arguments .replace(rgxp, function(match, $1, $2, $3){

Note:

This code will not return overlapping matches. Let’s say we search in the above string for "an". it’ll not return two matches for “an” & “and” but only for the first "an" since the other one is too close the the first one, and the regex already consumed the later characters due to the up-to-Max 10 in .{0,10}. More info.

If the source string has HTML tags in it, make sure (for ease sake) to search only trough the text content only (not the HTML string) – otherwise a more complicated approach would be necessary.

Useful resources:

https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/RegExp
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/replace
http://www.rexegg.com/regex-quickstart.html

solved Truncate text preserving keywords