Next Post Previous Post

30 Mar 2009

Splitting PDF Pages

Posted by khk

No, this is not about my patent pending idea of a sheet splitter that turns duplex documents into simplex documents… This post is about a problem that comes up every now and then: When you scan a book or a magazine, chances are that you end up with two physical pages on your scanned image, and your document looks something like this:

JoinedPages.png

Pages one and two are on the same scan, three and four are, and five and six and so on.

How can we split such a combined page into it’s two parts?

There are of course different solutions to this problem, some more complicated than others, some producing better results than others.

The most straight forward approach would be to write an Acrobat plug-in or a standalone application (e.g. using the iText library) that takes the source page, determines what needs to be copied to the new page that should represent the left half of the original page, and then just copy those page elements. With a scanned source document, this would potentially mean that the scanned image needs to be cropped and placed on the target page. Sounds complicated, and it is complicated. Is there an easier way to accomplish the same results?

[More after the jump]

If we have access the Acrobat, we can use JavaScript to mimic the behavior of the just described application. With Acrobat’s JavaScript, we do of course not have access to the page content elements, so we need to find a different way to end up with the same results.

JavaScript gives us access to the crop box of a PDF page (if you don’t know what that is, read up on page boxes in the PDF Reference), this means we can find out how big a page is, but also set the crop box to configure which part of the page should be displayed in the viewer, or should get printed.

So, if our combined page is 11×17″, and we want to extract two 8 1/2×11″ pages, we first need to “select” the left half of the page, and then the right half of the same page.

Here is the script that will do just that:
SplitPages.js

Copy that file to the Acrobat JavaScript directory – for Acrobat 9 on a Windows machine that would be

c:\Program Files\Adobe\Acrobat 9.0\Acrobat\JavaScript

Now let’s take a look at the different parts of the script:

ProcessDocument = app.trustedFunction(function()
{
    // create a new document
    app.beginPriv();
    var newDoc = app.newDoc();
    app.endPriv();

This snippet shows that we are declaring a trusted function. This is necessary because we need to execute the app.newDoc() method, which requires (since Acrobat 7) a privileged context. The first thing we do in this script is to create that new document – the call is wrapped with the beginPriv() and endPriv() calls. When creating a new document in JavaScript, the document is no only created, Acrobat will also add a page to that document. We don’t need that page, but every PDF document that is getting displayed in Acrobat, needs at least one PDF page. We will deal with that extra page later.

    var i = 0;
    while (i < this.numPages)
    {
        newDoc.insertPages( {
            nPage: newDoc.numPages-1,
            cPath: this.path,
            nStart: i
        });
        newDoc.insertPages( {
            nPage: newDoc.numPages-1,
            cPath: this.path,
            nStart: i
        });
        // we did this twice so that we can then split each copy of the page into a left
        // and right half. 
        i++;
    }

In these few lines we copy every page from the source document (this.path, which is the path to the active document) to our newly created document – and we are doing that twice. This is necessary because we need to crop out the left half of the page for the first page, and the right half of the page for the second page. After this loop, we will have twice as many pages in our new document than we have in our source document.

    if (newDoc.numPages > 1)
    {
        newDoc.deletePages(0);  // this gets rid of the page that was created with the newDoc call.
    }

Now that we have all the pages in our new document, we no longer need the blank page we got when we created the document. So, we delete it.

    // at this point we have a documnent with every page from the source document
    // copied twice

    for (i=0; i<newDoc.numPages; i++)
    {
        // determine the crop box of the page
        var cropRect = newDoc.getPageBox("Crop", i);
        var halfWidth = (cropRect[2]-cropRect[0])/2;

We loop over all pages in our new document (that is twice the number of pages in the original document). And for every page we get the crop box. We also calculate a value that we will need later: The half of the width of that page. That is the location where the page will get split.

        var cropLeft = new Array();
        cropLeft[0] = cropRect[0];
        cropLeft[1] = cropRect[1];
        cropLeft[2] = cropRect[0] + halfWidth;
        cropLeft[3] = cropRect[3];

The previous few lines create a new Array. A page box is represented as an array of four values. We can now assign the partially modified values to the new page box. As you can see, three of the four values just are copies of the original crop box. Array element 2 however gets modified. It is the original X value for the lower left point of the crop box, and we add half of our page width to that number. This means, the new right edge of the modified page box is now at the halfway point between the two sides.

        var cropRight = new Array();
        cropRight[0] = cropRect[2] - halfWidth;
        cropRight[1] = cropRect[1];
        cropRight[2] = cropRect[2];
        cropRight[3] = cropRect[3];

We do something similar for the new crop box for the right side of the page.

        if (i%2 == 0)
        {
     newDoc.setPageBoxes( {
         cBox: "Crop",
         nStart: i,
         rBox: cropLeft
         });
        }
        else
        {
     newDoc.setPageBoxes( {
         cBox: "Crop",
         nStart: i,
         rBox: cropRight
         });
        }
    }
}
)

This is a bit tricky… Acrobat starts to count pages with page 0. The first page in the document is on the left half of our first sheet. All of a sudden, our first page is actually an even page number (0), and not like in “normal” books an odd number (1). This means that for all even numbers we need to crop the left half, and for all odd numbers we need to crop the right half.

We test for “evenness” by performing the modulo operation on our page number. If the result is 0, we know we have an even number, so we can use the left side crop box. If the operation returns 1, we are dealing with an odd page number, and we will use the ride side crop box.

The new crop box gets applied with the doc.setPageBoxes() method.

// add the menu item
app.addMenuItem({
     cName: "splitPagesJS",     // this is the internal name used for this menu item
     cUser: "Split Pages",       // this is the label that is used to display the menu item
     cParent: "Document",              // this is the parent menu. The file menu would use "File"
     cExec: "ProcessDocument()",  // this is the JavaScript code to execute when this menu item is selected
     cEnable: "event.rc = (event.target != null);",       // when should this menu item be active?
     nPos: 0
});

Almost done… We’ve implemented the functionality to split the pages, now we just need a mechanism to actually start our little program. I’ve chosen to create a menu item under the “Document” menu. That menu item is only available (not grayed out), if we have an active document.

Disclaimer: Of course, there is no such thing as a sheet splitter, and therefore there is no patent application. It’s a joke.

Subscribe to Comments

9 Responses to “Splitting PDF Pages”

  1. Brilliant, Thank you very much. Now I only have to learn how to use it.

     

    David French

  2. Wow, thanks! Works great! (I had to adapt the script to Acrobat 6).

     

    Beat Döbeli Honegger

  3. Hi, thats amazing exactly what I was searching for. Based on this neat little snipplet I added a reorder java script which resequentialises booklet pages, i.e. booklet or broshures scanned doublesided two pages on a side. If you apply document/reorder after document/split, the pdf pages are in sequentiel order again. Here is the code:

    ReorderDocument = app.trustedFunction(function()
    {
    // create a new document
    app.beginPriv();
    var newDoc = app.newDoc();
    app.endPriv();

    // reorder pages in booklet order into new document in sequential order

    var i = 0;

    var n = this.numPages;

    for (var i = 1; i < = n / 4; i++)

    {
    newDoc.insertPages( {
    nPage: newDoc.numPages-1,
    cPath: this.path,
    nStart: i * 4 - 3,
    nEnd: i * 4 - 2
    });
    }

    newDoc.insertPages( {
    nPage: newDoc.numPages - 1,
    cPath: this.path,
    nStart: n - 1
    });

    for (var i = 1; i <= n / 4 - 1; i++)

    {
    newDoc.insertPages( {
    nPage: newDoc.numPages - 1,
    cPath: this.path,
    nStart: n - i * 4

    });
    newDoc.insertPages( {
    nPage: newDoc.numPages - 1,
    cPath: this.path,
    nStart: n - i * 4 - 1
    });
    }

    newDoc.insertPages( {
    nPage: newDoc.numPages - 1,
    cPath: this.path,
    nStart: 0
    });

    if (newDoc.numPages > 1)

    {
    newDoc.deletePages(0); // this gets rid of the page that was created with the newDoc call.
    }

    })

    // add the menu item
    app.addMenuItem({
    cName: “reorderPagesJS”, // this is the internal name used for this menu item
    cUser: “Reorder Pages”, // this is the label that is used to display the menu item
    cParent: “Document”, // this is the parent menu. The file menu would use “File”
    cExec: “ReorderDocument()”, // this is the JavaScript code to execute when this menu item is selected
    cEnable: “event.rc = (event.target != null);”, // when should this menu item be active?
    nPos: 1
    });

     

    Dirk Rother

  4. GREAT!!!! Exactly what I was looking for! Great how easy pages can be splitted!

    @Dirk:
    I tried your js as well, but I always receive an error message:
    “syntax error
    14:Folder-Level:App:2_sortpages.js”
    I use A9Pro Extended and copied the source 1:1. Can you give me a hint what to change?

    Thanks
    Fritz

     

    Fritz Bischof

  5. The script is wonderful. Thank you.

    Remember to change the setting to allow execution of javascript from the menu in Preferences

     

    Thank You

  6. Thanks for the reply. Yes, you need to make sure that the menu setting is enabled (which is the problem that Fritz ran into earlier).

     

    khk

  7. If you are still around.
    The scripts are awesome. Exactly what we need.
    I would like to use the re-order script as well.

    I keep getting the same error as Dirk Rother.

    “syntax error
    14:Folder-Level:reorderPages.js”.

    I’ve enabled “Enable menu items JavaScript exectuion privileges” in the Preference menu. If you have any further suggestions, or maybe I’m looking in the incorrect spot.
    Using Adobe 9 Standard.

    Thanks guys, great stuff.

     

    Jrb

  8. There is a space between “< ” and “=” on line `14 in Dirk’s script. when you remove that, it should work.

     

khk

  • Thanks Kirk.

    Using what Kirk sent, I removed that space from Dirk’s script.
    Next error was on Line 63. I took the part from the original script that inserts the buttons, paste into this one, and just changed the names accordingly.
    Works great now.

    I could not find the difference. But when I loaded them both into a javascript editor I downloaded, it identified them as different colors (the last portion). Figured something was coming up incorrectly.

    Thanks again guys, works great now.

    ReorderDocument = app.trustedFunction(function()
    {
    // create a new document
    app.beginPriv();
    var newDoc = app.newDoc();
    app.endPriv();

    // reorder pages in booklet order into new document in sequential order

    var i = 0;

    var n = this.numPages;

    for (var i = 1; i <= n / 4; i++)

    {
    newDoc.insertPages( {
    nPage: newDoc.numPages-1,
    cPath: this.path,
    nStart: i * 4 – 3,
    nEnd: i * 4 – 2
    });
    }

    newDoc.insertPages( {
    nPage: newDoc.numPages – 1,
    cPath: this.path,
    nStart: n – 1
    });

    for (var i = 1; i 1)

    {
    newDoc.deletePages(0); // this gets rid of the page that was created with the newDoc call.
    }

    })

    // add the menu item
    app.addMenuItem({
    cName: “reorderPagesJS”, // this is the internal name used for this menu item
    cUser: “Reorder Pages”, // this is the label that is used to display the menu item
    cParent: “Document”, // this is the parent menu. The file menu would use “File”
    cExec: “ReorderDocument()”, // this is the JavaScript code to execute when this menu item is selected
    cEnable: “event.rc = (event.target != null);”, // when should this menu item be active?
    nPos: 1
    });

     

    Jrb

  • Leave a Reply

    Message: