Search PDF & extract line of text

Search PDF & extract line of text

  

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on.  Anyone do this or know of an item in the forge to do this?

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on.  Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on.  Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly.  When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

and yes I have tried several different combinations of \r\n, \n\r etc but haven't found the one that seems to work.  It still keeps taking the line I want and also including either the line above it or the line below it.  I think this is my last little part to get it working using itext but I can't seem to come up with the correct separator for the strings so that I only get the line I want to get.


Solution

Jason Herrington wrote:

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on.  Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly.  When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

and yes I have tried several different combinations of \r\n, \n\r etc but haven't found the one that seems to work.  It still keeps taking the line I want and also including either the line above it or the line below it.  I think this is my last little part to get it working using itext but I can't seem to come up with the correct separator for the strings so that I only get the line I want to get.


Hi Jason,

tries to iterate per line, and replaces the line when you find the word you're looking for

https://stackoverflow.com/a/27742836


Solution

Pedro Costa wrote:

Jason Herrington wrote:

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on.  Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly.  When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

and yes I have tried several different combinations of \r\n, \n\r etc but haven't found the one that seems to work.  It still keeps taking the line I want and also including either the line above it or the line below it.  I think this is my last little part to get it working using itext but I can't seem to come up with the correct separator for the strings so that I only get the line I want to get.


Hi Jason,

tries to iterate per line, and replaces the line when you find the word you're looking for

https://stackoverflow.com/a/27742836


Pedro, found the problem and its working great now.  Itext was definitely the way to go.