Search PDF & extract line of text

Jason Herrington

Solved

Question

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

27 Aug 2018

Pedro Costa

Solution

Jason Herrington wrote:

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly. When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

and yes I have tried several different combinations of \r\n, \n\r etc but haven't found the one that seems to work. It still keeps taking the line I want and also including either the line above it or the line below it. I think this is my last little part to get it working using itext but I can't seem to come up with the correct separator for the strings so that I only get the line I want to get.

Hi Jason,

tries to iterate per line, and replaces the line when you find the word you're looking for

https://stackoverflow.com/a/27742836

See solution in context

28 Aug 2018

Pedro Costa

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

27 Aug 2018

3 replies

Last reply 28 Aug 2018

Show thread

Hide thread

Jason Herrington

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly. When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

28 Aug 2018

Pedro Costa

Solution

Jason Herrington wrote:

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly. When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

Hi Jason,

tries to iterate per line, and replaces the line when you find the word you're looking for

https://stackoverflow.com/a/27742836

28 Aug 2018

Jason Herrington

Pedro Costa wrote:

Jason Herrington wrote:

Pedro Costa wrote:

Jason Herrington wrote:

I need to search a pdf for a keyword(s) and then extract any line of text that the keyword(s) appear on. Anyone do this or know of an item in the forge to do this?

Hi Jason,

have you seen the ITextSharp library?

Try https://www.outsystems.com/forge/466/

Pedro - yes I have a solution working with this except with the documents I am scanning the text of it doesn't seem able to separate the lines correctly. When I do

"string[] result = currentText.Split("\r\r\n".ToCharArray(), StringSplitOptions.None);"

Hi Jason,

tries to iterate per line, and replaces the line when you find the word you're looking for

https://stackoverflow.com/a/27742836

Pedro, found the problem and its working great now. Itext was definitely the way to go.

28 Aug 2018

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines