Is there a way to locate a value on textfield inside a PDF?

I am trying to create a service for Finder, using Automator, that works like this:

I select multiple PDF files, that are invoices, and it sums the values of specific fields and gives me the total.

This is the PDF model with dummy data for your reference: Upload files for free - xxx.pdf - ufile.io

I want the script to do two sums.

  1. Sums all values that are after the field called VALOR BASE in all PDFs, in the dummy case 271,86.
  2. Sums all values that are after the field called IVA : IVA - regime de isenção [art.o 53.o] ; in all PDFs, that in the dummy case is 0,00 but may contain a value. The name of this field may vary, but always start with IVA:. Note that the field uses commas as a decimal point.
  3. Present the two sums, separately.

My question is, how do I scan the PDF using AppleScript and look for these fields?

Is there a way to obtain a list of text fields, boxes or whatever and enumerate them?

Thanks in advance.

macos_boy. I don’t know how to do exactly what you want. Hopefully another forum member will be able to help.

FWIW I wrote a script that gets the desired amounts on your sample form using a regular expression. The regex pattern is 1 to 3 decimal characters, followed by a comma, followed by 2 decimal characters. The script returns two lists containing the first and second regex matches on each PDF. If the desired data is not the first and second regex matches on each page then, obviously, the script won’t do what you want. I didn’t include the actual calculations, because my locale uses a period as a decimal separator, but a simple repeat loop will do the job.

-- revised 2023.01.07

use framework "Foundation"
use framework "Quartz"
use scripting additions

set theFiles to (choose file of type {"pdf"} with multiple selections allowed) -- two PDFs selected in testing

set countOne to current application's NSMutableArray's new()
set countTwo to current application's NSMutableArray's new()
repeat with aFile in theFiles
	set theString to getStringFromPDF(aFile)
	set patternMatches to getPatternMatches(theString)
	try
		(countOne's addObject:(patternMatches's objectAtIndex:0))
		(countTwo's addObject:(patternMatches's objectAtIndex:1))
	on error
		display dialog "The correct amounts could not be extracted from a selected file" buttons {"OK"} cancel button 1 default button 1
	end try
end repeat

set theCounts to {countOne as list, countTwo as list} --> {{"271,86", "271,86"}, {"0,00", "0,00"}}

-- insert code to calculate totals from countOne and countTwo

on getStringFromPDF(theFile)
	set theURL to current application's |NSURL|'s fileURLWithPath:(POSIX path of theFile)
	set thePDF to current application's PDFDocument's alloc()'s initWithURL:theURL
	return (thePDF's |string|())
end getStringFromPDF

on getPatternMatches(theString)
	set thePattern to "\\d{1,3},\\d{2}" --> refine if necessary
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range")
	set theMatches to current application's NSMutableArray's new()
	repeat with aRange in theRanges
		(theMatches's addObject:(theString's substringWithRange:aRange))
	end repeat
	return theMatches
end getPatternMatches

I was curious if this could be done with a shortcut using the same approach as in my previous script, and it does work. The shortcut will not be used for anything, so I wrote it to accept one PDF and to return the matching strings only.

Get Amounts from PDF.shortcut (21.9 KB)

The test PDF from the OP is:

Test.pdf (286.6 KB)

As far as I know, Apple’s PDFKit doesn’t have an option to retrieve values of specific fields in PDF forms.

There are 3rd-party frameworks that can do this but you won’t be able to access them directly by AppleScript (let alone that they cost hundreds of dollars, although it’s possible that some free open-source libraries that can do this also exist).

I’ll be happy to be proven wrong if there are actually ways to do this using built-in macOS frameworks.

wow, fantastic. I never knew shortcuts could do that! Thanks!!!

Fantastic solution! WOW! I am blown away! Thanks.

thanks! Very interesting.