My Own iBook PDF exported without "Title" and "Author"

pedro1 · January 9, 2024, 6:12pm

I struggle here. I store interesting web articles to iBook PDF libary. I export webpage to PDF. When I click on it “Get Info” it has this in iBook database not in file metadata directly. I want to copy “Title” and “Author” to PDF file metadata. I can do it manualy but it would take months. Any way to automate?

Thanks in advance.

Peter

pedro1 · January 9, 2024, 7:43pm

It looks powerful. I don’t know how to read “Title” an “Author” from Books.app. Info is there when I chose “Get Info” after copying or Exporting is missing in Metadata.

pedro1 · January 9, 2024, 10:25pm

I’ve tried and it didn’t. Unless I was doing something wrong.

Mockman · January 10, 2024, 1:18am

My system is old so things may have changed but on it, the books that are in ibooks are kept inside a directory that’s in ~/Library/containers/com.apple.BKAgentService. Perhaps the ‘bk’ stands for book.

Anyway, you might try looking in there for a file that is certain to have some author/subject/keywords metadata and then run exiftool on that and look for those fields in the results.

To make it simple, I created a new pdf with a relatively unique file name and with entries for each of the metadata fields and then added it to ibooks. I then ran find in the terminal to see where it was, and finally exiftool on the resulting file.

Assuming that you can find the file you can test writing new metadata to it and then presumably script the whole thing.

pedro1 · January 10, 2024, 11:46am

I haven’t scripted yet anything on MacOS. I was using Autohotkey on Windows and was able to manipulate and record mouse clicks. This would be breeze there. Unfortunately there is no iBook app for Windows.

Mockman · January 10, 2024, 1:22pm

I added a pdf to the ibooks app that lacked both a title and an author. ibooks puts a copy of it in the directory that I referred to previously. I use Terminal, changed to the Books directory and ran exiftool, a command-line tool which I have installed, to add a title and author tag to the specified pdf file.

In Terminal, from the Books directory, this command will write tags to the file:

exiftool -author='Siluriantechnologies' -title='Carboniferous Read me' wordservice-read-me-unmarked-copy.pdf
    1 image files updated

Then look at the pdf file using exiftool again to read the newly added tags; I can see the following:

exiftool -author -title wordservice-ex-read-me-skim-unmarked-copy.pdf 
Author                          : Siluriantechnologies
Title                           : Carboniferous Read me

Now, in ibooks, when I type silurian in the search field, this pdf will be the only result. If I look at the pdf in Preview and Get Info.

Title: Carboniferous Read me
Author: Siluriantechnologies

I should note that you can add the metadata prior to importing into ibooks. I framed it this way because you mentioned that you had an existing library of pdfs.

Finally, you can write a script that will automate the adding of the metadata to a file. There are a couple of approaches to take but you would need to provide some more information, such as where the author and title information will come from.

pedro1 · January 10, 2024, 2:06pm

Sometimes I don’t have time to finish reading website or I want to use information later. I make PDF of website. So far the best conversion website to PDF was in Safari chose “Save to Books”. To keep the link I paste the website link to Author in Books. PDFs produced this way look excellent and keep all site links in PDF. Only Adobe Acrobat was able to do such a good job, but it is expensive. Now I want to be able to export PDFs from Books to be used anywhere but that site link from Author is lost in process of export.

Mockman · January 10, 2024, 2:35pm

What happens if you run mdls on an example pdf in the terminal?

$ mdls 'example.pdf'

This command will list much or all of the file’s metadata and this can include the source of the file (ie the URL).

Two prerequisites though: spotlight indexing must be enabled, file must not be quarantined

Actually… you can probably use the Finder’s Get Info on such a file and it may show where from information. If it does, then you could likely use that as a source to update the author tag.

By the way, if you look for where the pdfs in ibooks are stored and find them, then you can just copy them out of that folder and you don’t need to go through a process of exporting them.

pedro1 · January 10, 2024, 3:14pm

I’ve tried “mdls” command. A lot of PDF info but that Author and Title from “Books” is not there. I am nearly sure that those 2 are in iBook.app internal database.

Mockman · January 10, 2024, 3:30pm

Open the pdf in Preview and Get Info (Tools > Show Inspector). The first tab (General Info) of the inspector should show what is in the ‘title’, ‘author’ and ‘subject’ fields.

Mockman · January 10, 2024, 3:41pm

FWIW, I opened this page in Safari and then saved/printed a pdf into ibooks. I left Safari with this page open. I then found the pdf in the Books folder, selected it, and ran the applescript below. There are other things that you can do but this is a simple example of how you could set the pdf author.

The script does the following:

gets the URL from the open safari page
gets the file name and location of the selected pdf
runs exiftool on the resulting file and sets the author tag to the safari URL
duplicates the file to the desktop

tell application "Safari" to set surl to URL of document 1
--> "https://www.macscripter.net/t/my-own-ibook-pdf-exported-without-title-and-author/75381/10"

tell application "Finder"
	if (target of front window as text) ends with "iBooks:Books:" and name extension of (get selection as alias) is "pdf" then
		set salt to selection as alias as text
	else
		tell me to display dialog "A pdf from the Books folder must be selected"
	end if
end tell

set seaSalt to POSIX path of salt

set cmd to "/usr/local/bin/exiftool -author='" & surl & "' " & quoted form of seaSalt
--> "/usr/local/bin/exiftool -author='https://www.macscripter.net/t/my-own-ibook-pdf-exported-without-title-and-author/75381/10' '/Users/username/Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks/Books/My Own iBook PDF exported without \"Title\" and \"Author\" - Scripting Forums - AppleScript | Mac OS X - MacScripter.pdf'"

do shell script cmd
--> "    1 image files updated"
tell application "Finder"
	duplicate (file salt as «class furl») to desktop replacing yes
end tell

pedro1 · January 10, 2024, 3:45pm

Those are empty.

Mockman · January 10, 2024, 4:00pm

Interesting. I would guess that ibooks stores the ‘author’ in a separate place. Perhaps it wants to have a single place for any of the documents it stores, whether pdf or ebook. Just a guess though.

pedro1 · January 10, 2024, 9:47pm

Thank you, for this script. I will use it from now on. Old ones I will do manualy, slowly. It is hudreds of them.

Mockman · January 10, 2024, 10:35pm

You can automate those as well, you just need to be able to match up the URL to the pdf.

For example, if you replace the first line of the script with this:

use scripting additions
set surl to the clipboard

Then, if the URL is on the clipboard, it will add the author tag using it.

If you had a list of the urls in a separate file, you could probably automate it so you could process the files in bunches.

Mockman · January 11, 2024, 3:35am

And for fun… this script will take a selection of pdfs and attempt to grab the url from the footer of each pdf (ie in Print dialogue, Print headers and footers is checked. By default, it seems that this will put the page title in the header and the page url in the footer.

The script also depends upon being able to read the text of the pdf, from which it will extract an https line from near the bottom. I do this using a utility called pdftotext, which is a component in Poppler, a suite of command-line pdf utilities. By running this command, you can extract the text from the pdf and then use some standard tools (tail, grep) to hopefully grab the url embedded by Safari. Essentially, pdftotext sends the text to stdout, tail grabs the bottom 6 lines, and grep checks among those lines for any beginning with https. It’s obviously not ironclad but it seems to work well enough. I haven’t tested for when the url is absent however, or for what happens when there are multiple urls at the bottom of the page. There are likely other methods of getting at the text but this worked for me.

tell application "Finder"
	set salt to selection as alias list
end tell

repeat with sel in salt
	set seaSalt to quoted form of POSIX path of sel
	set getPageUrl to "/opt/local/bin/pdftotext " & seaSalt & " - | tail -n 6 | grep '^https' "
	set surl to do shell script getPageUrl
	
	set addAuthorTag to "/usr/local/bin/exiftool -author=" & quoted form of surl & space & seaSalt
	do shell script addAuthorTag
	
	tell application "Finder"
		duplicate (contents of file sel as «class furl») to desktop replacing yes
	end tell
end repeat

pedro1 · January 11, 2024, 9:56am

Thank you, this is very useful.