Extract PDF Pages Based on Content

How would we identify pages in a PDF document that contain a certain word and extract those pages into a new document? This can be done with a few lines of JavaScript – there are different ways to do this: We can create a document level JavaScript and install it in the one of Acrobat’s JavaScript folders (see here for more information about how to identify the folder where to install such a script), or we can create an Action that executes the JavaScript. In the past I’ve written about how to create folder level scripts (e.g. here), so let’s create an Action today.

Here is the script that we will be using:

// Iterates over all pages and find a given string and extracts all 
// pages on which that string is found to a new file.

var pageArray = [];

var stringToSearchFor = "Total";

for (var p = 0; p < this.numPages; p++) {
	// iterate over all words
	for (var n = 0; n < this.getPageNumWords(p); n++) {
		if (this.getPageNthWord(p, n) == stringToSearchFor) {
			pageArray.push(p);
			break;
		}
	}
}

if (pageArray.length > 0) {
	// extract all pages that contain the string into a new document
	var d = app.newDoc();    // this will add a blank page - we need to remove that once we are done
	for (var n = 0; n < pageArray.length; n++) {
		d.insertPages( {
			nPage: d.numPages-1,
			cPath: this.path,
			nStart: pageArray[n],
			nEnd: pageArray[n],
		} );
	}

    // remove the first page
    d.deletePages(0);
    
}

The script is pretty straight forward: We are iterating over all pages, and on each page, we are looping over all words until we find the word that we are looking for. In that case, we are adding the page number to an array of page numbers.

If, after all this looping, we have information in this array of page numbers, we process that list by creating a new document (which will add a blank page – a PDF document always has to have at least one page), and then we add each page from the original document that we find in the array. All that’s left now is to remove that initial blank page.

So, let’s convert this into an Action. In Acrobat XI Pro (this will not work in Standard, it does not support Actions), select “Tools>Action Wizard>Create New Action”. This will create an empty action. Do add JavaScript to our Action, select the “Execute JavaScript” option under “More Tools” and move it to the right side (e.g. by clicking on the arrow button.

2014 04 25 12 50 51

Once the “Execute JavaScript” step is on the right side, click on the “Specify Settings” button and paste the script from above into the editor. Once the script is part of the Action, you can prevent the editor from popping up every time you run the Action by deselecting “Prompt User” for this action step.

Save the action, give it a meaningful name and you are ready to execute it.

You can download the action here: ExtractPagesWithString.sequ. Once downloaded, just double-click on it to install it in Acrobat Pro. Again, this will not work with Acrobat Standard or the free Adobe Reader.

This entry was posted in Acrobat, JavaScript, PDF, Tutorial and tagged , , , . Bookmark the permalink.

75 Responses to Extract PDF Pages Based on Content

  1. Elizabeth Celuck says:

    Your script is exactly what I have been searching for, so thank you for sharing it! I am getting an error message saying it is corrupt when I click on it from the download location. I also tried copying the code and pasting into notepad, saving it as an sequ, and then opening it, but still get a corrupt code error. I would appreciate any assistance you can offer. Thanks!

  2. Joe Barry says:

    Hello,

    Good code snippet.

    How might we then save down the .tmp files that pop up ? We’d like this to be more of an operating system script that saves a new file with a name of “filenane+new”, suppress any preview and commit the files to the operating system as files.

  3. Karl Heinz Kremer says:

    Elizabeth, you should be able to create a new action based on the instructions I’ve provided. You cannot just save the code snippet as a SEQU file, you will have to create a new Action, add a JavaScript step and then use the code from above for that JavaScript processing step.

  4. Karl Heinz Kremer says:

    Joe, that’s not what an Acrobat Action is about: An Action will always run in Acrobat and will display the processed file. If you want to do this from outside of Acrobat, you will have to write an application that “remote controls” Acrobat e.g. via the IAC interface using VB. Take a look at my VBA and VBScript related posts for more information. You would have to use the JSObject to use the JavaScript interface from VB or VBScript.

  5. JohnR says:

    Great idea on the posted code. I have implemented per your instructions, the code runs and says that it has executed successfully, but no document is created. The search words are correct and are simply replaced in the ‘Total’ text from the script, but nothing appears to happen. The debugger was no help either. Suggestions?

  6. Nicola F. says:

    Thank you so much for this, it opened me a whole new world!

    I got a question: is there a simple command to highlight somehow the word after the script finds it!?
    Something like:
    this.highlightPageNthWord(p, n) !?

    I just want my eyes to find it quickly when I look at the pdf after the script is executed.
    Thanks in advance!

  7. Karl Heinz Kremer says:

    Nicola, look at the Doc.selectPageNthWord() method in the API documentation.

  8. Nicola F. says:

    Karl, thanks for the quick answer!

    I checked the doc, but what you suggested seems no good for me. Or, I’m doing it wrong.

    While reading the manual I found the addAnnot command to add a Highlight, so, I did my own script to do this:
    1- Look for several words
    2-When found, highlight them
    3-Delete pages where there are no matching words
    4-Save the modifed doc with another name

    And, it works! But, it’s very slow.
    A 10 page pdf where the script finds 12 matching words takes 180 sec to process, while it takes only 2 sec if I skip step 2! And I have hundreds pages to process 🙁

    Could these few lines
    this.addAnnot({
    page: nth_page,
    type: “Highlight”,
    quads: this.getPageNthWordQuads(nth_page, nth_word)
    });
    repeated 12 times make such a huge difference!?

    Thanks again

  9. Nicola F. says:

    Nevermind, for some reason I can’t understand, it didn’t like to “addNote” during the search, so, I stored the pages and quads into 2 vectors. At the end of the search, I did all the necessary addNotes together.
    Now I process approx 2000 pages in 8 minutes. Sounds good enough to me! Thanks!

  10. Stephanie A says:

    You are quite literally my favorite person today. You have taken hours off my work week. Thank you!!!!!

  11. Jason Pretorius says:

    I’m not a developer/coder at all, and this literally saved my life today.

    If I could, I would be buying you a beer right now.

    Thanks.

  12. Karl Heinz Kremer says:

    Jason, just keep me in mind for any professional needs around PDF you may come across in the future. I can only write this blog because nice people are hiring me for PDF related consulting jobs 🙂

  13. adrian says:

    Is there any way I can edit this so that it deletes the pages with the specified string?

  14. Jeff B says:

    Hi Karl,

    Your script is exactly what we were looking for but for some reason I can not get it to work. We have a 1622 page document in Acrobat Pro. Each page has either “page 1 of 1” or “page 1 of 2” or “page 2 of 2” at the bottom. We need to extract all the “page 1 of 1” pages from the document into a new document. I have copy and pasted your script and replaced where you have “title” with “page 1 of 1” . The script seems to run fine but the newly made document is the same as the previous document. Any ideas? Thanks.

  15. Karl Heinz Kremer says:

    Jeff, the “word finder” is does just that, it returns one word at a time. You will have to do a bit more to get the full string containing all four parts (“page”, “1”, “of”, and “1”). There is a method to get the location of the “words”, you may have to use that to get things into the correct order.

  16. Jeff B says:

    Thanks for the quick reply. Unfortunately your answer is out of my expertise. Unless you have a webpage to point me to. Thanks.

  17. Karl Heinz Kremer says:

    Jeff, no, I don’t have any instructions that would cover that. However, if you need help implementing this, I am available. This is actually something I’ve done a few times for my customers. You can find my email address on the “About” page.

  18. Praj says:

    Hell. 🙂
    I was searching the method to extract pdf pages having same words.
    This method is very helpful.
    Thank you very much. 🙂

  19. Michael Harp says:

    Is it possible to DELETE all of the pages from a PDF document that includes a specific string of text? I have a 900+ page document that I don’t really need to extract every page that includes certain text, I need to delete the 200+ pages that includes one specific string of text that doesn’t appear anywhere else in the document. Any thoughts?

  20. Karl Heinz Kremer says:

    Yes, it’s certainly possible. I would start to process the document from the last page to the first, and then whenever you find the string, you call

    Doc.deletePages()

    . See here for documentation for this API function: http://help.adobe.com/livedocs/acrobat_sdk/11/Acrobat11_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat11_HTMLHelp&file=JS_API_AcroJS.89.458.html

  21. Cassia says:

    Hello Karl,
    Thank you for this script! It is fantastic.
    Could you please post the full script to save the extracted documents, with new filenames, in new folders? I see that there is some reference made to this above. However, I am not a programmer, and cannot figure out how to implement it.

  22. Karl Heinz Kremer says:

    Cassia, that’s a bit too much to share in a free blog post. If you do need help implementing such a script, that’s what I do for a living 🙂 If you need professional help, feel free to get in touch with me via email. My email address is on my “About” page.

  23. T. says:

    How would I search for all forms of “total” (e.g., “total” and “totaling”)?

    Or, how would I search for two words (if easier than than the above), such as “total” and “totaling”?

    Thank you!

  24. Malcolm says:

    thanks for creating this script – has saved me a few hours work.

    just one question – is it possible to ignore case in the search ??

  25. Jason says:

    Hi! Awesome script. In case anyone wants it, I adjusted the script as follows to prompt the user for the desired search term instead of it being hard-coded into the script:

    Changed the below line
    “var stringToSearchFor = “Total”;”

    To this
    “var stringToSearchFor = app.response(“Enter search term”);”

    I also noticed that this search ignores characters like the $ character. I also figured out that it ‘IS’ case sensitive, and doesn’t work on strings inside or bumped up against other words without a space in between.

    Is there a complete list somewhere that shows what combinations, characters, etc. it will or will not find? Or is there a bit of code that would adjust what it will or will not find?

    Thanks! ~JTC~

  26. Karl Heinz Kremer says:

    Jason, I am not aware of such a list. Take a look at the documentation: http://help.adobe.com/livedocs/acrobat_sdk/11/Acrobat11_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat11_HTMLHelp&file=JS_API_AcroJS.89.492.html

    There is an option to not strip out punctuation marks and whitespace. That may give you want you want.

  27. Dr. Stephanie Rollins says:

    This script is awesome! How can I delete the pages instead of extracting them. I’ve searched everywhere and hoping you can help. Is there a delete command I can insert in this script?

    When I run this script, the pages extract, however, the pages with the searched word still remain in the original document (no blank pages).

    Hoping you can help!! I work for the government, so don’t have $$ to hire a consultant. Trying really hard to figure this out on my own, but I’m stumped!

  28. Jason says:

    Karl,

    Thanks for your response. I’ll keep experimenting and post back if I figure anything additional out. I really appreciate the information you provide. Helps a lot of people out!

    ~JTC~

  29. Brian Borgstrøm says:

    Hi Karl,

    Thank you so much for this script, it does almost everything I need it to.
    Is there a way to get the script search for more than just one word? I specifically want it to search for two-word phrases but I can’t get the script to do that for me. I think it’s because it doesn’t include blank spaces in the search.

    Thanks again,
    Brian

  30. Karl Heinz Kremer says:

    Brian, the “word finder” can only search for one word at a time. You would have to implement your own method of searching for longer strings.

  31. Sean Osterhout says:

    Could this be modified to extract groups of pages? I have an 800 page report that we want separated into 3-page documents, every third page. I have a script that inserts blank pages for when we print:

    /* Add blank pages every 3 */
    /* To change number of pages between blank, change all “3” to the desired increment */

    for (var i=this.numPages-3; i>=0; i-=3) {
    var Rect = this.getPageBox(“Crop”, i);
    this.newPage(i+3, Rect[2], Rect[1]);
    }

    Now I need to get the printed bills separated so I save them digitally to our customer’s files. Can you point me in the right direction?

  32. Karl Heinz Kremer says:

    Sean, to extract pages you would use a different approach. The loop could be the same, but in the loop, you would use the “Doc.extractPages()” method. See here for more information: http://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FJS_API_AcroJS%2FJavaScript_API.htm%23TOC_extractPagesbc-423&rhtocid=_6_1_8_39_32

  33. Maria Majka says:

    This article has been an incredibly helpful tool for me! Thanks so very much for sharing your knowledge in a clear, concise manner (I know nothing about scripts and you made this simple). This is saving me hours of work in extracting multiple pages.

  34. Duncan Marr says:

    I have an 8000 page PDF. Every even page is addressed and every odd page is not addressed (These need to be kept together). I need to extract all those pages where there are multiples that share the same name and address (including the corresponding non-addressed page) into one file, in order, and all those that only appear once into another file. Is this possible via a script?

  35. Karl Heinz Kremer says:

    Duncan, it may be possible to do this using a script, but it depends on the actual PDF file and how it was generated. Even if it is possible, it requires quite a bit of scripting. I’ve done similar projects where pages needed to be bundled and extracted as individual documents, so if you need professional help, feel free to contact me via email to ask about my consulting services. My email address is on my “About” page: http://khkonsulting.com/about

  36. ash says:

    page extracted but in different file. how can i combine pages. i selected different pdfs i want the result to be combined in one pdf.

  37. Karl Heinz Kremer says:

    ash, you cannot do this in one operation, you need to extract first and then assemble – or, you can keep track of which pages you want, and then remove all pages from your document that you do not need.

  38. Forrest says:

    Hi Karl

    Thanks for this posting – although I’m having a very odd issue. I have Adobe Acrobat Pro XI, and for some reason when I use your script the “stringToSearchFor” must start with the letter V. If I try any other word, it does not work. Any ideas?

    Thanks!

  39. Karl Heinz Kremer says:

    Forrest, sorry, I don’t have any ideas why that would be. Did you make changes to my code?

  40. Josh says:

    Karl, is there a way to modify this code to search for a partial word?

  41. Kendra says:

    Is it possible to look to a certain location on the PDF for a word (in my case a loan number) and include that word in the filename when extracting/splitting?

  42. Karl Heinz Kremer says:

    Kendra, you can try to extract a small portion of the document by cropping the page first to your target area, then getting all words in that target area while assembling e.g. your loan number, and then undoing the crop again to go back to your original page. This page has information about how to do that: https://answers.acrobatusers.com/Reverse-Crop-With-Javascript-q299707.aspx

    You can only use that information as part of a filename if you are saving the document (or spitting it) via JavaScript.

  43. Karl Heinz Kremer says:

    Josh, to match a partial word, you would need to provide your own matching algorithm. You can e.g. use regular expressions to do that. The word finder will always return one word, and you would have to implement the logic to match your partial word.

  44. Vanessa says:

    Brilliant! But how do I get this to search for more than one string at a time and output all the pages in one shot?

  45. Karl Heinz Kremer says:

    Vanessa, that’s just standard JavaScript programming. You need to use an “or” construct to search for one or another string (or a third a fourth or a fifth and so on):


    var nthWord = this.getPageNthWord(p, n);
    if (nthWord == stringToSearchFor_1 || nthWord == stringToSearchFor_2 || nthWord == stringToSearchFor_3) {
    // ...

    I’ve pulled out getting the nth word from the if statement so that I don’t have to call it multiple times. I assign it to a variable, and then just compare that variable to all the words I am looking for.

  46. Vanessa says:

    YOU ARE AMAZING. THANK YOU!!!!!!!!!!!! you just saved me hours and hours of work <3

  47. Brandon says:

    Hi. I had to delete the Actions (Find and Highlight, Extract Highlighted) from my Adobe, but now I’m getting an error message stating “Unable to Import the Action “ExtractPagesWithString’. The file is either invalid or corrupt. I have a huge project that will require 4400 pages to be marked and extracted out of 14000. I can’t figure this out. Thank you in advance!

  48. Karl Heinz Kremer says:

    Brandon, which version of Adobe Acrobat are you using? This should work without problems in any recent version.

  49. srihari says:

    Hi. I had to delete folios in PDF. I am currently using edit document text in Adobe X pro. If few pages I can do it manually but for more pages its tough. So can I have a script to remove the folios in pdf

  50. Karl Heinz Kremer says:

    Srihari, with the information provided here and some basic JavaScript knowledge, you should be able to create this script yourself. If this is not enough, I can certainly help you via my professional consulting services. If you are interested in that, feel free to get in touch with me via email.

  51. srihari says:

    Thank you Karl. If you could provide me basic script for folio I can develop it and use it.

  52. Stanley J says:

    Hi, this has really helped. Thank you so much. How would the script look to extract content using a date format (eg. 06JUL17)? Im having difficulty with this. Thanks

  53. Karl Heinz Kremer says:

    Stanley, if you already know the exact string, you can just adjust this one line:


    var stringToSearchFor = "06JUL17";

    This should do the job. If the date is not fixed, you need to use the util.printd() method to create the string to search for. E.g. something like this for today’s date:


    var today = new Date();
    var stringToSearchFor = util.printd("ddmmmyy", today).toUpperCase();

  54. Stanley J says:

    Karl,

    When I use :

    var stringToSearchFor = “06JUL17”;

    The javascript runs and states “completed’ but does not create a new (temp) file with the “06JUL17” pages. I’ve tried this several times. It seems, the only time the new extracted pages are created is if I use a purely alpahbetical search string and not a alphanumeric one like 06JUL17. Your thoughts.

    Thanks

  55. Hermie says:

    Hi Carl, Running the script on Acrobat Pro DC. It says completed, but where is the extracted file saved? What’s the default location? Thanks!

  56. Rafique Khan says:

    Karl, how can I modify this file to use it on a folder and use the same file name with just appending an extra string. I would really appreciate your help.

  57. Karl Heinz Kremer says:

    Rafique, can you please elaborate on what it is you want to do. From your short description, it’s not clear to me.

  58. Karl Heinz Kremer says:

    Hermie, the file does not get saved, you need to do that. It will be open in Acrobat (you should have two files open after the script runs: The original PDF file and the one with the extracted pages).

  59. elcartu says:

    Hola, Are any posibility to extract pages with a variable content from an external csv/txt file or similar?

    for example, extract pages who have the ref “X” from this csv/txt file… and inside the txt file are…
    “f34″;”r45″;”k43”

  60. Karl Heinz Kremer says:

    elcartu, you can certainly do that. It’s just a matter of reading the text file into Acrobat via util.readFileIntoStream(), then processing the stream and parsing out your CSV data. Other than that, it’s just plain JavaScript. The actual implementation is a bit outside of the scope of what I can do here on my blog, so if you need help with this, you can contact me via email for my consulting services.

  61. elcartu says:

    Thanks, I nearly do it… whit the action “Find, Highlight, and Extract Words” from https://acrobatusers.com/actions-exchange… I need to do some extra process but is enough for me for now.

  62. Adam says:

    Karl, thanks for keeping up with these replies years after the original post. Let’s say I want to delete (instead of extract) all pages from a PDF that contain a certain string. Would it be easy for you to throw something together to do that?

  63. Karl Heinz Kremer says:

    Adam, the key for deleting pages is that you need to process the file in reverse page order (starting with the last page and ending with the first). Something like this should work (disclaimer: I did not try this, I just modified the original version of the script in a way I think will work):


    var stringToSearchFor = "Total";

    for (var p = this.numPages-1; p >= 0; p--) {
    // iterate over all words
    for (var n = 0; n < this.getPageNumWords(p); n++) { if (this.getPageNthWord(p, n) == stringToSearchFor) { if (this.numPages > 1) {
    this.deletePages(p);
    }
    else {
    app.alert("Cannot delete last remaining page in document");
    }
    break;
    }
    }
    }

  64. Jo-an says:

    Hi Karl,

    Thanks for the post. It’s very useful. I took the liberty to modify the script to search for 5 strings. In my case, the 4th and 5th strings could have multiple occurrences in the PDF, and I only need one of the pages that contain those strings. I also know that the 1st, 2nd, and 3rd string always come before 4th and 5th. I am wondering, is there a way to modify the script that it only outputs 1 result when it comes to multiple pages containing the same string?

    Thank you in advance.

    The script I modified is as follows:

    // Iterates over all pages and find a given string and extracts all
    // pages on which that string is found to a new file.

    var pageArray = [];

    var stringToSearchFor1 = “A”;
    var stringToSearchFor11 = “B”;
    var stringToSearchFor12 = “C”;

    var stringToSearchFor2 = “D”;
    var stringToSearchFor21 = “E”;

    var stringToSearchFor3 = “A”;
    var stringToSearchFor31 = “B”;
    var stringToSearchFor32 = “C”;
    var stringToSearchFor33 = “D”;

    var stringToSearchFor4 = “E”;
    var stringToSearchFor41 = “F”;
    var stringToSearchFor42 = “G”;
    var stringToSearchFor43 = “H”;

    var stringToSearchFor5 = “I”;
    var stringToSearchFor51 = “J”;
    var stringToSearchFor52 = “K”;
    var stringToSearchFor53 = “L”;

    for (var p = 0; p < this.numPages; p++)
    {
    // iterate over all words
    for (var n = 0; n 0) {
    // extract all pages that contain the string into a new document
    var d = app.newDoc(); // this will add a blank page – we need to remove that once we are done
    for (var n = 0; n < pageArray.length; n++)
    {
    d.insertPages( {
    nPage: d.numPages-1,
    cPath: this.path,
    nStart: pageArray[n],
    nEnd: pageArray[n],
    } );
    }
    // remove the first page
    d.deletePages(0);
    }

  65. Karl Heinz Kremer says:

    Jo-an, you need to use a logical “AND” operation. This is standard JavaScript, and has nothing to do with Acrobat specifically. Lookup logical operations in the JavaScript documentation (e.g. here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Logical_Operators)

  66. Soren Eustis says:

    Wonderful script! Is there an easy way to reference a text file (.csv) with a list of search words. I have a large PDF that is a group of class assignments. I mail-merged the assignment so that each student has their username on the top right of each page. I would like to have Acrobat find each of these usernames, extract the pages with the current index search term, and name the new PDF username.pdf. Thoughts?

  67. Jhuanderson Maci says:

    Hey Karl,

    This is great! I modified the code so I can add multiple string in an array. I was wondering if you could give advise to be able to extract based on two strings. Ex: instead of searching “Total”, it would search “Total Diff”.

  68. Julian says:

    Hi Karl,

    Love many of your posts, and thanks so much for posting this script – it’s getting me 99% of the way to where I want to be. My current challenge is saving the resulting file once it’s produced.

    I’ve modified your script such that it looks through an array of unique IDs, extracting just those pages on which the various unique IDs appear. This works like a charm and the 4,000 page (20,405 KB) PDF I have is being filtered to exactly the 1,280 pages I need.

    The challenge is that the .tmp file that is created won’t save. When I try to save or save as, I eventually get a pop up error message that says “Out of memory.” When I located the .tmp file in the local temp folder on my windows machine, I found that in addition to a file of the same name as what was appearing in the new file your script creates “A9RE459.tmp” (4 KB) there was another file with the same creation time with the name “A9RE45A.tmp” (2,534,408 KB).

    Any thoughts on how to troubleshoot this would be greatly appreciated.

  69. Karl Heinz Kremer says:

    Julian, this is a hard one… Because we don’t know why exactly Acrobat is failing (the error message may not even be accurate), I would try to process the 4000 page document in multiple batches (e.g. 1000 pages at a time), and then concatenate the output files.

  70. Karl Heinz Kremer says:

    Jhuanderson, you will have to manually find the two strings that are in the same area of the page. If both are on the same line, it’s pretty straight forward, but when there is a line break between the two words, it gets very complex. Take a look at the Doc.getPageNthWordQuads() function: http://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FJS_API_AcroJS%2FDoc_methods.htm%23TOC_getPageNthWordQuadsbc-55&rhtocid=_6_1_8_23_1_54

  71. Karl Heinz Kremer says:

    Soren, there is no easy way to do that. You would have to open the CSV file as a text stream, convert that stream to a string and then parse the CSV data. This is how you would start: http://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FJS_API_AcroJS%2Futil_methods.htm%23TOC_readFileIntoStreambc-6&rhtocid=_6_1_8_78_0_5

  72. Prakash Dara says:

    Hi,
    I Want to Extract Particular String from pdf and also particular Page

    Prakash Dara

  73. Karl Heinz Kremer says:

    Prakash, if you need to search for a string containing multiple words, you need to do a lot more in order to test for proximity of words. That’s a much larger and much more complex project. If you need help with that, and you are interested in my professional services, feel free to get in touch with me via email. My email address is on the “About” page.

  74. Adriana Rojas says:

    Hi Karl, thank you for posting this info. I’m not a coder or programmer, just a heads up. I’m close to getting the results I want (searching for pages with specific text and deleting them from PDF) – 1) regarding the string I’m searching for, can I use symbols? the exact term I want to search for is “(0)” (paren and number zero) and 2) I modified your script from page extraction to deletion as you recommended above however, it’s either not working or working VERY slowly (PDF is only 189 pgs). Any ideas what I can do to reduce the time?

  75. Karl Heinz Kremer says:

    Adriana, this process is very slow. There is no way around it. It’s not using the “normal” search/find function, so you cannot compare the execution speed with how fast search/find would find something. Take a look at the “bStrip” parameter that controls if punctuation marks will be removed or not: https://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FJS_API_AcroJS%2FDoc_methods.htm%23TOC_getPageNthWordbc-54&rhtocid=_6_1_8_23_1_53

Leave a Reply

Your email address will not be published. Required fields are marked *