23 Sep 2010

Reading PDF Form Fields with VBA

Posted by khk

Update:

Please visit the same post on my business site. The comments are closed here, so if you want to comment, you have to head over to http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

I’ve written about VBA and Acrobat JavaScript before, and I’ve also mentioned that you can combine VBA and JavaScript to access PDF form fields, but I still owe a sample for that. I had to answer another question today about how to exactly do that, so I whipped up a quick sample program that demonstrates the use of the JavaScript Object (JSO) to read and write AcroForm fields.

We start the same way as in my old VBA sample to create a VBA program that references the Acrobat TLB and to add a button to a document. When we now use the following script as the button handler, we can work with form fields:

Private Sub CommandButton1_Click()
    Dim AcroApp As Acrobat.CAcroApp
    Dim theForm As Acrobat.CAcroPDDoc
    Dim jso As Object
    Dim text1, text2 As String

    Set AcroApp = CreateObject("AcroExch.App")
    Set theForm = CreateObject("AcroExch.PDDoc")
    theForm.Open ("C:\temp\sampleForm.pdf")
    Set jso = theForm.GetJSObject

    ' get the information from the form fields Text1 and Text2
    text1 = jso.getField("Text1").Value
    text2 = jso.getField("Text2").Value

    MsgBox "Values read from PDF: " & text1 & " " & text2

    ' set a text field
    Dim field2 As Object
    Set field2 = jso.getField("Text2")

    field2.Value = 13   ' assign the number 13 to the fields value

    ' get the information from the form fields Text1 and Text2
    text1 = jso.getField("Text1").Value
    text2 = jso.getField("Text2").Value

    MsgBox "Values read from PDF: " & text1 & " " & text2

    theForm.Close

    AcroApp.Exit
    Set AcroApp = Nothing
    Set theForm = Nothing

    MsgBox "Done"
End Sub

This program requires a PDF file with text fields called “Text1″ and “Text2″ to be stored as C:\temp\sampleForm.pdf. With the explanation in the previous two blog posts, it should not be hard to understand what’s going on here. The only new command introduced is the getField() function, which returns a form field. The form field object has a property “value” which contains the actual value that’s assigned to the field. Give it a try and let me know how it works for you. The updated form field is not saved (because the document does not get saved) – I’ll leave that up to the reader to figure out.

Also, this program will not work with XFA forms (the ones you create in Designer). For those, you need to use the XFA DOM to access the form data. For anybody interested in XFA forms, the LifeCycle Designer ES Scripting Reference is a must read.

Subscribe to Comments

26 Responses to “Reading PDF Form Fields with VBA”

  1. Hi there,

    This is exactly what I am looking for. Thanks so much.
    However, my assignments are a bit different. I am wondering if you can give me a hand.

    The data source:
    1. An excel spreadsheet storing the raw data
    2. A PDF file with an interactive form used to store the data input by the user according to the above excel spreadsheet

    My assignments:
    1. input the raw data from excel spreadsheet to the PDF interactive form
    2. double check if the data input in the PDF interactive form is correct.

    I am not allowed to convert the Excel spreadsheet to the PDF file directly as the PDF file is the template with precise paragraphing and wording embedded. It is a heavy job when there are over hundreds of number. I am thinking if the excel VBA can do both assignments automatically or at least double check my input.

    Thanks.
    Wayne

     

    Wayne

  2. Hi khk,
    I have adobe 9.1 professional and excel 2007, adobe TLB added.

    F8 stepinto F8 found “Runtime error ’424′, object required”
    debug at this code ” text1 = jso.getField(“Text1″).Value”

    don’t know why?

    Thanks
    Wayne

     

    Wayne

  3. oh, “t” not “T”, i got it.
    btw, we can use the call function to save the updates.
    call theForm.Save(PDSaveFull,”C:\temp\sampleForm.pdf).

     

    Wayne

  4. I am having the same problem as Wayne, but cannot find the solution. Error 424 Object Required at -

    text1 = jsoGetField(“TextField1″).Value

    Further, text1 changes from a “” string to an empty Variant

    Any help would be greatly appreciated. Access 2007 with Adobe 8 btw.

    Thanks,

    Todd

     

    Todd

  5. Hi Khk,

    i was looking for the similar, kind of program.

    am objective is:
    A PDF document contains a stamps like tick 1, tick 2, etc in different pages. i want to read the page numbers of each stamps. . is there any java function to read the page number of PDF ?

     

    chandu

  6. Dear Khk

    I have done this example in Visual Basic 2008 and read fields from a document that I made in Adobe Acrobat 9 Pro.
    It works fine on my PC. But when I make an installation on another PC and start the program and click the button, the message “Cannot create an ActiveX component” appear and close down the program.
    Look forward to a solution for distribution.

    Best regards
    Leif

     

    Leif

  7. Leif,
    do you have Adobe Acrobat installed on the second machine? Acrobat’s API is a tool to automate Acrobat, so Acrobat needs to be installed for your application to work. Based on the error message, I assume that you are trying to run your application on a system without Acrobat.

     

    khk

  8. Dear Khk

    Thanks for your prompt response.
    I have Adobe Reader 9 on the other PC.
    Is it necessary to also have Acrobat 9 pro installed ?
    If so, is there another way to distribute without have
    Acrobat pro installed on target PC ?

    Leif

     

    Leif Hemmingsson

  9. Khk

    Thank You for prompt answer.
    The target PC have Adobe Reader 9.
    Is there no other way to have it run without having Acrobat Pro installed?

    Leif

     

    Leif Hemmingsson

  10. Dear khk,
    Thank u for your share of this post.
    I tried the code ,everything is OK, but the line “field2.Value = 13″ didn’t work, no value assigned.
    I have done this example in MS Excel 2007 & Adobe Acrobat 9 Pro.
    I have selected the “Tools>References” menu item. On the dialog that pops up, I choose “Acrobat”, but I am not sure that it was the acrobat.tlb or acrobat.dll selected. When I check the References again, it was not the “Acrobat” selected, but the Adobe Acrobat 9.0 Type Library. “Acrobat” disappeared in the references box.
    Looking forward for your help.
    Thanks a lot.
    Best Regards.
    Casey YAN

     

    Casey

  11. Hi khk,

    I have adobe acrobat 10.0. I was triing to run the above macro, but i am getting an error ‘Activex component can’t create object’

    Looking forward for you help
    Thanks a lot
    Regards,
    Shekhar

     

    shekhar

  12. [...] several threads that suggest various methods. This one seems to be exactly what you are looking for http://www.khk.net/wordpress/2010/09…elds-with-vba/ __________________ If this resolves your issue, please use the Thread Tools and mark the thread [...]

     

    Importing data from PDF to Access

  13. Shekar,

    are you using the full version of Acrobat or the free Reader? This will not work in the Reader. If it’s Acrobat, then unfortunately I cannot help you: I am not a VB expert, and all I know is that if you follow the steps outlined above, it should work.

     

    khk

  14. Thanks so much for posting this! It was EXACTLY what I needed to solve my dilemma. I knew I needed to code but I couldn’t find anything clear and understandable about the Acrobat object in VBA anywhere.

    I ran into one snag in that I have a large number of fields to fill and my code would bomb after only three fields were filled. I figured out that it didn’t like me creating a separate form object for each field. I worked around that by creating a single object that I used in a function where I passed my field names and field values to as variables to set the field object and the field value for each field.

    Thanks again!

     

    Cathi

  15. I have recently come across your blog and it is extremely helpful. The level of detail coupled with the explanations are proving extremely helpful. I figured that while I endeavor to figure something out on my own I might as well ask in the event that you can save me a lot of time.

    You mentioned at the end of this post that this code will not work with XFA forms. I am working with an XFA form and I am trying to do something relatively strait forward: I need to reference the data entered into a specific text box. I will use that data (someone’s name) to resave the pdf with the person’s name in the file name. Would you be able to help me get the same basic code working using the XFA DOM?

    Sincerely – Joe

     

    Hi khk

  16. here is the fix for the error.
    text1 = jso.getField(“topmostSubform[0].Page1[0].f1_01_0_[0]“).Value

     

    sameer

  17. KHK:

    This is a very helpful example, so thank you for posting.

    I am trying to ADD a signature field to PDF through VBA, but am having difficulty. My code is:

    Private Sub AddSignature(docName As String, pNum As Long)

    Dim jso As Object
    Dim AcroApp As acrobat.CAcroApp
    Dim theDoc As acrobat.CAcroPDDoc
    Dim signatureField As Object

    Set AcroApp = CreateObject(“AcroExch.App”)
    Set theDoc = CreateObject(“AcroExch.PDDoc”)

    theDoc.Open (docName)
    Set jso = theDoc.GetJSObject

    signatureField = jso.addField(“Signature”, “signature”, pNum, [90, 385, 250, 420])

    theDoc.Save 1, docName
    theDoc.Close
    Set AcroApp = Nothing
    Set theDoc = Nothing
    End Sub

    I am getting a type mismatch error on the “signatureField = ” line.

    Any ideas?

    Thanks.
    MP.

     

    Martin Petrey

  18. Write a read routine to see the names of your fields. Most likely the field name is not just “signature”. Just as you see in my example. I am new to PDF and not sure how modify the field names well but will post more when I figure it out.

     

    Sameer Alzouby

  19. Hello,
    is it possible to copy data ,once selected, as table (copy as table option) in a pdf using vba?

    Thanks in advance

     

    Jeffrey

  20. Hello,
    Can we label data in pdf file with php?

     

    waqar

  21. Dear Khk,
    Thankyou somuch , It’s Very good post given by you which i used to fill data in my PDF form generated using Acrobat pro.
    It would be great if anyone give idea regarding importing/Exporting XML/XSD Data in the VB.net Code or please guide which method to use so that i can directly load PDF with my XML data instead of getting fields and iterating it to bind to respective fields

    Thanks
    Mayank

     

    Mayank Kashyap

  22. Thanks so much for your examples.

    I am trying to write a macro which will extract embedded files from a document (in this case, a Word doc), and am trying to use some of your code and other snippets I’ve found.

    If (InStr(myshape.OLEFormat.ClassType, “Acro”) > 0) Then
    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ‘ Open the first embedded pdf

    ‘ Now I want to save it

    Set AcroApp = GetObject(, “AcroExch.App”)
    If AcroApp Is Nothing Then
    ‘ Excel is not running, create new instance
    Set AcroApp = CreateObject(“AcroExch.App”)
    Set avdoc = AcroApp.GetActiveDoc ‘ Set AcroDocument = CreateObject(“AcroExch.PDDoc”)
    End If

    ‘ older Word files had ” on each end of the IconLabel of the embedded file, resulting in a path name with “” in it
    ‘ this was causing an error 4148 when the SaveAs line executed. Go figure :)
    ‘ temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), “”), Chr(34), “”))

    outFileName = StrOutFold & “\” + temp ‘
    ‘ ok = ActiveDocument.DocSaveAs(outFileName, outFileName)
    If avdoc.Save(PDSaveFull, outFileName) = False Then
    MsgBox “Cannot save document”
    End If

    AcroApp.Exit
    Set AcroApp = Nothing
    End If

    However, I am struggling with getting a valid object reference to the .pdf document displayed using the myshape.OLEFormat.DoVerb (wdOLEVerbOpen)
    line. Any hints as to how to get a reference to this? Also, it would be best if this would work with Acrobat reader (I don’t think I need full Acrobat functionality to do this.

    Thanks for any hints!

     

    Chris

  23. Hello Sir,

    Your example above help me a lot to accomplish my project. I able to populate the PDF.

    My question is how to rename the data field programmatically, I know how using the Adobe interface.

    Example: Field1 is the field I want to rename jsoGetField(“Field1″).Value

    Also, how to insert pdf and bookmark?

    Thank you and more power to you!

     

    Rey Hernandez

  24. Hi KHK,

    I’m trying to use your code and I have a PDF here with rather complex setting with different subforms that has similar field names such as:

    (These are the field names given when I export the form to a CSV file)

    form1[0].Table1[0].Row24[0].TextField71[0]
    form1[0].Table1[0].Row24[1].TextField71[0]

    My question is….how should I put this into the getField() function?

    I’m guessing I would at least need to include the row portion since they determine which row to point to in order to get the field.

    Thanks!

     

    Min

  25. This sounds like a XFA or LiveCycle Designer form, and not a simple AcroForm. I’ve never tried to access form fields in such forms from VB. You can try to put the whole name (form1….TextField71[0]) in and hope for the best, but it may not work.

     

    khk

  26. Regarding my previous post regarding saving pdf files embedded in a Word doc, I got a variant of that code to work. It depends upon Acrobat, and not Reader, being installed and the default for AcroExch (and pdf) documents. I haven’t found a way to test what program is the default. I had both Acrobat and Reader installed on my system (not recommended), asd at the time Reader was the default. Uninstalled now. Here is a subset of the code:

    ‘=====================================
    Sub ExtractEmbeddedDocs()
    Dim MyObj As Object
    Dim xlApp As Object
    Dim xlWkb As Object

    Dim myshape As InlineShape
    Dim embedObj As OLEObject

    Dim FileExtStr As String
    Dim FileFormatNum As Long
    Dim StrInFold As String, StrOutFold As String
    Dim StrDocFile As String, Obj_App As Object, i As Long
    Dim StrFile As String, StrFileList As String, StrMediaFile As String, j As Long
    Dim outFileName As String
    Dim SBar As Boolean
    Dim exten As String
    Dim embedCount As Integer, wordCount As Integer, excelCount As Integer, visioCount As Integer, pptCount As Integer
    Dim pdfCount As Integer
    Dim msg As String, temp As String
    Dim ok As Boolean

    Dim docs As Variant, doc As Variant ‘, temp As Variant
    Dim AcroApp As Acrobat.CAcroApp
    Dim AcroPDDoc As Acrobat.CAcroPDDoc
    Dim AcroAVDoc As Acrobat.CAcroAVDoc
    Dim jso As Object

    StrInFold = ActiveDocument.Path
    If StrInFold = “” Then Exit Sub
    ‘ Store current Status Bar status, then switch on
    SBar = Application.DisplayStatusBar
    Application.DisplayStatusBar = True
    StrOutFold = StrInFold & “\Embedded Files”

    Application.ScreenUpdating = False

    ‘ On Error GoTo error_handler

    ‘Test for existing output folder, create if they don’t already exist
    If Dir(StrOutFold, vbDirectory) = “” Then MkDir StrOutFold

    embedCount = ActiveDocument.InlineShapes.Count

    ‘ This opens the embedded documents, each in their own instance of the program
    For Each myshape In ActiveDocument.InlineShapes
    If (myshape.Type = wdInlineShapeEmbeddedOLEObject) Then
    If (InStr(myshape.OLEFormat.ClassType, “Word”) > 0) Then
    ‘ handle Word file here
    End If

    If (InStr(myshape.OLEFormat.ClassType, “Visio”) > 0) Then
    ‘handle Visio file here
    End If

    If (InStr(myshape.OLEFormat.ClassType, “Excel”) > 0) Then
    ‘handle Excel here
    End If

    If (InStr(myshape.OLEFormat.ClassType, “PowerPoint”) > 0) Then
    ‘handle Powerpoint here
    End If

    If (InStr(myshape.OLEFormat.ClassType, “Acro”) > 0) Then
    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ‘ Open the first embedded pdf

    myshape.OLEFormat.Activate ‘ probably not needed

    ‘ GetObject(,”AcroExch.App”) ‘ causes an error
    Set AcroApp = CreateObject(“AcroExch.App”)
    ‘ If AcroApp Is Nothing Then
    ‘ ‘ Acrobat is not running, create new instance
    ‘ Set AcroApp = CreateObject(“AcroExch.App”)
    ‘ End If

    Set AcroAVDoc = CreateObject(“AcroExch.AVDoc”)

    Set AcroAVDoc = AcroApp.GetActiveDoc ‘ get the logical doc
    Set AcroPDDoc = AcroAVDoc.GetPDDoc ‘ get the physical doc

    ‘ older Word files had ” on each end of the IconLabel of the embedded file, resulting in a path name with “” in it
    ‘ this was causing an error 4148 when the SaveAs line executed. Go figure :)
    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), “”), Chr(34), “”))

    outFileName = StrOutFold & “\” + temp ‘

    If AcroPDDoc.Save(PDSaveFull, outFileName) = False Then
    MsgBox “Cannot save document”
    End If
    AcroAVDoc.Close (1)
    AcroPDDoc.Close

    pdfCount = pdfCount + 1

    AcroApp.Exit
    Set AcroApp = Nothing
    Set AcroAVDoc = Nothing
    Set AcroPDDoc = Nothing
    End If

    End If
    Next myshape

    ‘ Clear the Status Bar
    Application.StatusBar = False
    ‘ Restore original Status Bar status
    Application.DisplayStatusBar = SBar
    Application.ScreenUpdating = True

    temp = “Embedded file counts” & vbCrLf & “Total ” & vbTab & vbTab & embedCount & vbCrLf & “Word Files ” & vbTab & wordCount & vbCrLf & _
    “Excel Files ” & vbTab & vbTab & excelCount & vbCrLf & “Visio Files ” & vbTab & vbTab & visioCount & vbCrLf & “PowerPoint Files ” & vbTab & pptCount & vbCrLf
    temp = temp & “PDF Files ” & vbTab & vbTab & pdfCount & vbCrLf & “Unknown files” & vbTab & embedCount – (wordCount + excelCount + visioCount)
    msg = temp

    msg = msg & vbCrLf & vbCrLf & “You should have ” & vbTab & (wordCount + excelCount + visioCount + pptCount + pdfCount) & ” files”

    MsgBox msg, vbInformation + vbOKOnly
    End Sub
    ‘=====================================

    Thanks again for getting me started/pointed in the right direction!

     

    Chris