30 Mar 2009

Splitting PDF Pages

Posted by khk

Update:

Please visit the same post on my business site. The comments are closed here, so if you want to comment, you have to head over to http://khkonsulting.com/2009/03/splitting-pdf-pages/

No, this is not about my patent pending idea of a sheet splitter that turns duplex documents into simplex documents… This post is about a problem that comes up every now and then: When you scan a book or a magazine, chances are that you end up with two physical pages on your scanned image, and your document looks something like this:

JoinedPages.png

Pages one and two are on the same scan, three and four are, and five and six and so on. How can we split such a combined page into it’s two parts? There are of course different solutions to this problem, some more complicated than others, some producing better results than others. The most straight forward approach would be to write an Acrobat plug-in or a standalone application (e.g. using the iText library) that takes the source page, determines what needs to be copied to the new page that should represent the left half of the original page, and then just copy those page elements. With a scanned source document, this would potentially mean that the scanned image needs to be cropped and placed on the target page. Sounds complicated, and it is complicated. Is there an easier way to accomplish the same results? [More after the jump] If we have access the Acrobat, we can use JavaScript to mimic the behavior of the just described application. With Acrobat’s JavaScript, we do of course not have access to the page content elements, so we need to find a different way to end up with the same results. JavaScript gives us access to the crop box of a PDF page (if you don’t know what that is, read up on page boxes in the PDF Reference), this means we can find out how big a page is, but also set the crop box to configure which part of the page should be displayed in the viewer, or should get printed. So, if our combined page is 11×17″, and we want to extract two 8 1/2×11″ pages, we first need to “select” the left half of the page, and then the right half of the same page. Here is the script that will do just that: SplitPages.js Copy that file to the Acrobat JavaScript directory – for Acrobat 9 on a Windows machine that would be

c:\Program Files\Adobe\Acrobat 9.0\Acrobat\JavaScript

Now let’s take a look at the different parts of the script: ProcessDocument = app.trustedFunction(function() { // create a new document app.beginPriv(); var newDoc = app.newDoc(); app.endPriv(); This snippet shows that we are declaring a trusted function. This is necessary because we need to execute the app.newDoc() method, which requires (since Acrobat 7) a privileged context. The first thing we do in this script is to create that new document – the call is wrapped with the beginPriv() and endPriv() calls. When creating a new document in JavaScript, the document is no only created, Acrobat will also add a page to that document. We don’t need that page, but every PDF document that is getting displayed in Acrobat, needs at least one PDF page. We will deal with that extra page later.
var i = 0;
while (i < this.numPages)
{
  newDoc.insertPages( {
  nPage: newDoc.numPages-1,
  cPath: this.path,
  nStart: i
});
newDoc.insertPages( {
  nPage: newDoc.numPages-1,
  cPath: this.path,
  nStart: i
});
// we did this twice so that we can then split each copy of the page into a left
// and right half.
i++;
}

In these few lines we copy every page from the source document (this.path, which is the path to the active document) to our newly created document – and we are doing that twice. This is necessary because we need to crop out the left half of the page for the first page, and the right half of the page for the second page. After this loop, we will have twice as many pages in our new document than we have in our source document.
if (newDoc.numPages > 1)
{
  newDoc.deletePages(0);  // this gets rid of the page that was created with the newDoc call.
}

Now that we have all the pages in our new document, we no longer need the blank page we got when we created the document. So, we delete it. // at this point we have a documnent with every page from the source document // copied twice
 for (i=0; i<newDoc.numPages; i++)
{
  // determine the crop box of the page
  var cropRect = newDoc.getPageBox("Crop", i);
  var halfWidth = (cropRect[2]-cropRect[0])/2;

We loop over all pages in our new document (that is twice the number of pages in the original document). And for every page we get the crop box. We also calculate a value that we will need later: The half of the width of that page. That is the location where the page will get split. var cropLeft = new Array(); cropLeft[0] = cropRect[0]; cropLeft[1] = cropRect[1]; cropLeft[2] = cropRect[0] + halfWidth; cropLeft[3] = cropRect[3]; The previous few lines create a new Array. A page box is represented as an array of four values. We can now assign the partially modified values to the new page box. As you can see, three of the four values just are copies of the original crop box. Array element 2 however gets modified. It is the original X value for the lower left point of the crop box, and we add half of our page width to that number. This means, the new right edge of the modified page box is now at the halfway point between the two sides. var cropRight = new Array(); cropRight[0] = cropRect[2] - halfWidth; cropRight[1] = cropRect[1]; cropRight[2] = cropRect[2]; cropRight[3] = cropRect[3]; We do something similar for the new crop box for the right side of the page. if (i%2 == 0) { newDoc.setPageBoxes( { cBox: "Crop", nStart: i, rBox: cropLeft }); } else { newDoc.setPageBoxes( { cBox: "Crop", nStart: i, rBox: cropRight }); } } } ) This is a bit tricky… Acrobat starts to count pages with page 0. The first page in the document is on the left half of our first sheet. All of a sudden, our first page is actually an even page number (0), and not like in “normal” books an odd number (1). This means that for all even numbers we need to crop the left half, and for all odd numbers we need to crop the right half. We test for “evenness” by performing the modulo operation on our page number. If the result is 0, we know we have an even number, so we can use the left side crop box. If the operation returns 1, we are dealing with an odd page number, and we will use the ride side crop box. The new crop box gets applied with the doc.setPageBoxes() method. // add the menu item app.addMenuItem({ cName: "splitPagesJS", // this is the internal name used for this menu item cUser: "Split Pages", // this is the label that is used to display the menu item cParent: "Document", // this is the parent menu. The file menu would use "File" cExec: "ProcessDocument()", // this is the JavaScript code to execute when this menu item is selected cEnable: "event.rc = (event.target != null);", // when should this menu item be active? nPos: 0 });   Almost done… We’ve implemented the functionality to split the pages, now we just need a mechanism to actually start our little program. I’ve chosen to create a menu item under the “Document” menu. That menu item is only available (not grayed out), if we have an active document. Disclaimer: Of course, there is no such thing as a sheet splitter, and therefore there is no patent application. It’s a joke.

Subscribe to Comments

32 Responses to “Splitting PDF Pages”

  1. Brilliant, Thank you very much. Now I only have to learn how to use it.

     

    David French

  2. Wow, thanks! Works great! (I had to adapt the script to Acrobat 6).

     

    Beat Döbeli Honegger

  3. Hi, thats amazing exactly what I was searching for. Based on this neat little snipplet I added a reorder java script which resequentialises booklet pages, i.e. booklet or broshures scanned doublesided two pages on a side. If you apply document/reorder after document/split, the pdf pages are in sequentiel order again. Here is the code:

    ReorderDocument = app.trustedFunction(function()
    {
    // create a new document
    app.beginPriv();
    var newDoc = app.newDoc();
    app.endPriv();

    // reorder pages in booklet order into new document in sequential order

    var i = 0;

    var n = this.numPages;

    for (var i = 1; i <= n / 4; i++) { newDoc.insertPages( { nPage: newDoc.numPages-1, cPath: this.path, nStart: i * 4 - 3, nEnd: i * 4 - 2 }); } newDoc.insertPages( { nPage: newDoc.numPages - 1, cPath: this.path, nStart: n - 1 }); for (var i = 1; i <= n / 4 - 1; i++) { newDoc.insertPages( { nPage: newDoc.numPages - 1, cPath: this.path, nStart: n - i * 4 }); newDoc.insertPages( { nPage: newDoc.numPages - 1, cPath: this.path, nStart: n - i * 4 - 1 }); } newDoc.insertPages( { nPage: newDoc.numPages - 1, cPath: this.path, nStart: 0 }); if (newDoc.numPages > 1)

    {
    newDoc.deletePages(0); // this gets rid of the page that was created with the newDoc call.
    }

    })

    // add the menu item
    app.addMenuItem({
    cName: “reorderPagesJS”, // this is the internal name used for this menu item
    cUser: “Reorder Pages”, // this is the label that is used to display the menu item
    cParent: “Document”, // this is the parent menu. The file menu would use “File”
    cExec: “ReorderDocument()”, // this is the JavaScript code to execute when this menu item is selected
    cEnable: “event.rc = (event.target != null);”, // when should this menu item be active?
    nPos: 1
    });

     

    Dirk Rother

  4. GREAT!!!! Exactly what I was looking for! Great how easy pages can be splitted!

    @Dirk:
    I tried your js as well, but I always receive an error message:
    “syntax error
    14:Folder-Level:App:2_sortpages.js”
    I use A9Pro Extended and copied the source 1:1. Can you give me a hint what to change?

    Thanks
    Fritz

     

    Fritz Bischof

  5. The script is wonderful. Thank you.

    Remember to change the setting to allow execution of javascript from the menu in Preferences

     

    Thank You

  6. Thanks for the reply. Yes, you need to make sure that the menu setting is enabled (which is the problem that Fritz ran into earlier).

     

    khk

  7. If you are still around.
    The scripts are awesome. Exactly what we need.
    I would like to use the re-order script as well.

    I keep getting the same error as Dirk Rother.

    “syntax error
    14:Folder-Level:reorderPages.js”.

    I’ve enabled “Enable menu items JavaScript exectuion privileges” in the Preference menu. If you have any further suggestions, or maybe I’m looking in the incorrect spot.
    Using Adobe 9 Standard.

    Thanks guys, great stuff.

     

    Jrb

  8. There is a space between “<" and "=" on line `14 in Dirk's script. when you remove that, it should work.

     

    khk

  9. Thanks Kirk.

    Using what Kirk sent, I removed that space from Dirk’s script.
    Next error was on Line 63. I took the part from the original script that inserts the buttons, paste into this one, and just changed the names accordingly.
    Works great now.

    I could not find the difference. But when I loaded them both into a javascript editor I downloaded, it identified them as different colors (the last portion). Figured something was coming up incorrectly.

    Thanks again guys, works great now.

    ReorderDocument = app.trustedFunction(function()
    {
    // create a new document
    app.beginPriv();
    var newDoc = app.newDoc();
    app.endPriv();

    // reorder pages in booklet order into new document in sequential order

    var i = 0;

    var n = this.numPages;

    for (var i = 1; i <= n / 4; i++)

    {
    newDoc.insertPages( {
    nPage: newDoc.numPages-1,
    cPath: this.path,
    nStart: i * 4 – 3,
    nEnd: i * 4 – 2
    });
    }

    newDoc.insertPages( {
    nPage: newDoc.numPages – 1,
    cPath: this.path,
    nStart: n – 1
    });

    for (var i = 1; i 1)

    {
    newDoc.deletePages(0); // this gets rid of the page that was created with the newDoc call.
    }

    })

    // add the menu item
    app.addMenuItem({
    cName: “reorderPagesJS”, // this is the internal name used for this menu item
    cUser: “Reorder Pages”, // this is the label that is used to display the menu item
    cParent: “Document”, // this is the parent menu. The file menu would use “File”
    cExec: “ReorderDocument()”, // this is the JavaScript code to execute when this menu item is selected
    cEnable: “event.rc = (event.target != null);”, // when should this menu item be active?
    nPos: 1
    });

     

    Jrb

  10. This is wonderful script. I´m struggling (and few million other InDesign CS5 users) with inDesign CS5´s new interactive PDF. It makes all PDF by spreads. Cover page (and usually last page too) is single page and rest of the pages comes as pairs… so page 1 is single page, but pages 2 and 3 are in PDF´s page 2, pages 4 and 5 in PDF´s page 3 and so on… so you script is very usefull. There´s few things still that I would like to ask.

    1) would it be hard to change it so that it didn´t split first page, because it´s practically always single page. Last page is also very often single page.

    2) Now it seems to leave some interactive objects “floating” outside of page after splittind. For instance if I have a button in page 3 (page 2 in original PDF). After splitting, that button is exactly right place in page 3, but it´s also in page 2, but outside of the page, in a grey area.

    I´d like to ecommend this script anyway to InDesign users… really great job…

     

    petteri paananen

  11. Petteri, try this file instead:
    http://www.khk.net/wordpress/wp-content/uploads/2010/09/splitpages.js
    It does not process the first page, and only processes the last page when it’s size is twice that of the first page. It also hides the form fields that show up floating outside of the crop box.

     

    khk

  12. You sir, are my hero. This script rocks my face off.

    Fantastic work!

     

    DJ Lein

  13. […] but what if you have a PDF that is spreads and you need to split it into two individual pages? Here’s an Acrobat script from Karl Heinz […]

     
  14. it’s not working for me sadly :(, i keep getting “An internal error occurred.”

    working on acrobat 9.0 on a mac 10.6.4. acrobat also has enfocus pitstop plug-in

     

    colin flashman

  15. @Colin: Go to Preferences – JavaScript and enable JS from menu items (or similar, don’t have the exact wordings…). Worked well for me 🙂

     

    Stefan S

  16. that did the trick. thank you so much for that 😀

     

    colin flashman

  17. Works for me on Acrobat 10.0 on PC. Amazingly useful, can’t thank you enough.

     

    Jason V

  18. Possibly the greates thing since sliced bread! Thank you all so much!

     

    Philip

  19. Superb! Doesn’t quite work as is on Acrobat X Windows because the Document menu no longer exists. Simply edit the script to use ‘File’ as the parent menu.

     

    Matthew Greet

  20. Hello togther,
    I use this wonderfull script and want to modify it a little bit by changing the first and last Page (TitleandBacktitle).
    I use the following code for this found on Adobes Webside:
    app.addMenuItem({ cName: “Reverse”, cParent: “Document”, cExec: “PPReversePages();”, cEnable: “event.rc = (event.target != null);”, nPos: 0
    });
    function PPReversePages()
    {
    var t = app.thermometer;
    t.duration = this.numPages;
    t.begin();
    for (i = this.numPages – 1; i >= 0; i–)
    {
    t.value = (i-this.numPages)*-1;
    this.movePage(i);
    t.text = ‘Moving page ‘ + (i + 1);
    }
    t.end();
    }// JavaScript Document

    Using the script standalone is runs perfect but I want to integrate it in the splitpages script. I´m not able to get this run because I´m not so fit in Java.
    So is it possible to combine this scripts?
    Another Question is how to compare the original PDFname I always get a Acrobat *.tmp file so I have always to rename the Pdfs manually
    Thank you very much

     

    Gebhard

  21. Great job…just what I needed! Thanks.

     

    Alex Lush

  22. […] wordpresser Karl Heinz Kremer has a purely Acrobat solution to this issue. His post can be found here. Rate this: Share this:FacebookPrintEmailLike this:LikeBe the first to like this […]

     
  23. Working with Acrobat Pro 9.4.6 on a Mac. The Split Pages script is working like a charm, but I cannot get the Reorder Pages script to work at all. It doesn’t even appear in my Document menu, which leads me to believe I haven’t created the Javascript correctly, and I can’t figure out what I’m doing wrong. Can someone send me the script file, or post a link to it?
    Thanks much.

     

    Todd

  24. Well I finally found your solution… Thank you very much!

     

    Radek

  25. […] it into seperate pages so that i have A5 sized pdf with 64 pages serially. is that possible? Look here Reply With […]

     
  26. hi everybody,

    many thanks for all the efforts you put into this. This looks indeed very helpful.

    Unfortunately, I am not very familiar with the inner workings of adobe acrobat. I created a java script file and copied the text into the folder with the path adobe acrobat pro/contents/resources/javascirpts/; I am on a os x lion and use adobe acrobat x pro.

    is this the correct folder? cause i couldn’t see any new menu item etc.

    I also copied the code into the section “edit all java scripts” which you can find in the java script section in adobe. however, when running it i get several error messages.

    I would be very grateful if you could point me into the right direction (or, if sby finds the time to put up a youtube video ;-).

    many thanks again! really great site.

    ro

     

    ro

  27. Hi ro,

    you ran into a problem that was caused by Acrobat X’ new user interface: You may notice that there are a lot less menus on the menu bar than with Acrobat 9. The “Documents” menu, which I used in the sample code (which of course pre-dates Acrobat X by a few years). When you change the parent menu for your menu item to “Edit”, you’ll find the new functionality under the Edit menu:

    
    cParent: "Edit",  // this is the parent menu. The file menu would use "File"
    

     

    khk

  28. […] clunky and not a true turnkey solution to the problem. At the end of the article, there was a link to a post that featured a script available by fellow WordPresser Karl Heinz Kremer, but his solution was not […]

     
  29. Hi,

    I am new to Acrobat javascript, can anybody tell me where to start from… I mean is there any software/SDK i need to install to begin with or just Javascript is needed. I am looking for some javascript code to split pdf file in multiple pdf files. I need an assistance from scratch since i am a beginner in acrobat JS.

    Thnks.

     

    Samish Kumar

  30. I am using this script as a personal project. I have about 600 PDF documents as spreads, including first and last pages. I have the script added to the Edit menu and for single docs, works perfectly under Acrobat X Pro for Mac. However when I try building an Action to process a folder, it seems to do nothing. Am I going about this wrong?

     

    David Zizza

  31. Hi khk,
    Thank you for this script and your annotations.
    I am new to javascript and it is quite educational to see how you build your work!
    I am encountering a problem with trusted functions:
    Running Acrobat X Pro on Windows 7 x64.
    I have successfully placed the script into the app-level folder (with the cParent modified as ‘file’) and I have used edit>preferences>javascript to enable acrobat javascript and enable menu items JS execution privileges.
    I even added the file to my advanced security privileges in adobe.
    But the error still reads:
    “SyntaxError: illegal character
    1:Folder-Level:App:splitpages.js
    NotAllowedError: Security settings prevent access to this property or method.
    App.trustedFunction:2:Document-Level:Split Pages:

    Might you have an suggestion on how to resolve this?
    Thanks
    aln

     

    aln

  32. You are f****** awesome, man. Don’t you ever forget that.

    I’ve been searching for a software that can do just that for almost two hours. And now i’ve found that yours script does just that perfectly. Wonderful! Thanks, man.

     

    Neno