Saturday, September 19, 2020

#1 2020-02-14 07:58:53 am

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Every notes in separate file with linefeed, returns...

Hi. All.

In a PDF document with text I open in Skim. I highlight text in 3 location...
The script will count how many highlights and create for every highlight an textfile.

The problem I face, would it be possible to include linefeed, carrier return and be able
to include this into my textfile. Same way copy select work on a document...

Now every highlight text is 1 line... ??, maybe it's a limited feature on Skim.


Edit: 14/2-2020
1. I could edit manual the notes on the document list with ctrl + return but it's a bad idea.
2. I could maybe compare the text in notes with (every paragraph of the page), and replace whose with the ones from page.

Hmm... smile

Thanks, Regards.
Fredrik

Applescript:

property _titleName_ : "Script"
property _extention_ : ".applescript"
property _destPath_ : ((path to desktop) as text)

tell application "Skim"
   tell document 1
       set noteCount to count every note
       set noteList to text of notes
   end tell
end tell

repeat with i from 1 to noteCount
   set theText to item i of noteList
   set outputFile to _destPath_ & _titleName_ & (i) & _extention_
   try
       set fileReference to open for access file outputFile with write permission
       write theText to fileReference
       close access fileReference
   on error
       try
           close access file outputFile
       end try
   end try
end repeat

Last edited by Fredrik71 (2020-02-14 10:43:18 am)


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#2 2020-02-14 12:14:51 pm

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 4569

Re: Every notes in separate file with linefeed, returns...

The object text of a note doesn't contain the linefeed so there is no way to grab them.

Maybe you may achieve your goal with the script below.

Applescript:

use AppleScript version "2.5"
use scripting additions
use framework "Foundation"

property _titleName_ : "Script"
property _extention_ : ".applescript"
property _destPath_ : path to desktop as text

Germaine()

on Germaine()
   try
       set savedNumber to ((path to desktop as text) & "savedNumber.txt") as «class furl»
       set theIndex to paragraph 1 of (read savedNumber)
       set theIndex to theIndex as integer
   on error
       set theIndex to 0
   end try
   -- Copy the selection to the clipboard.
   tell application "System Events" to tell process "Preview"
       set frontmost to true
       keystroke "c" using {command down}
   end tell
   
   -- Get the clipboard contents, convert it as text, and put back onto the clipboard.
   delay 0.5
   set |⌘| to current application
   set theClipboard to |⌘|'s class "NSPasteboard"'s generalPasteboard()
   set clipboardContents to (theClipboard's readObjectsForClasses:{|⌘|'s class "NSAttributedString"} options:({}))'s mutableCopy()
   
   -- the original contents maybe RTF data
   set theAttributedString to clipboardContents's firstObject()'s mutableCopy()
   set theData to theAttributedString's |string|()
   
   set theIndex to theIndex + 1
   set targetHfs to _destPath_ & _titleName_ & theIndex & _extention_
   set targetURL to (|⌘|'s NSArray's arrayWithObject:(targetHfs as «class furl»))'s firstObject()
   set theResult to theData's writeToURL:targetURL atomically:true -- write the extracted text in the file
   
   set targetURL to (|⌘|'s NSArray's arrayWithObject:savedNumber)'s firstObject()
   set theData to |⌘|'s NSString's stringWithString:(theIndex as text)
   set theResult to theData's writeToURL:targetURL atomically:true -- write the index value in the text file
end Germaine

save it as an application.
Open your PDF in Preview.
Select what you want to save
Double click the icon of the script application.
It will create a text file containing the selection with its linefeeds.

It will also save the index used to number the file so if you apply the script to a new selection, the file index will be incremented

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 14 février 2020  19:14:43

Last edited by Yvan Koenig (2020-02-14 02:18:22 pm)

Offline

 

#3 2020-02-14 01:56:14 pm

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

Thanks Yvan

Your workflow is like this...

Select text (single selection), run your script... output file.
... back to pdf file second selection, run your script.... output file

That is one way to do it... and thanks for that.

====>

My idea was to use highlight notes because I could fill a PDF document with multi highlights.
When that is done... I will run the script to extract the text from PDF.
I also thought maybe it could be possible to extract highlights in specific color or fonts.

I will think more... smile

Thanks, Yvan.


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#4 2020-02-14 03:11:44 pm

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 4569

Re: Every notes in separate file with linefeed, returns...

I understand what you wish to achieve.
Alas, as I already wrote, grabbing the properties of a note in Skim returns a text with no linefeed.
There is no color information too.

This late feature isn't supposed to be a problem as you save your datas in .applescript files which contain only bare text.
To get colors you have to save in rtf files.
I'm not sure that it would really be useful.
For instance, in applescript language guide-2013-19.pdf, the samples of script aren't colorized.
If you store them in .applescript files, the colorization will be achieved when you will open the files in Script Editor.

If you wish to check what I wrote, run :

Applescript:

property _titleName_ : "Script"
property _extention_ : ".applescript"
property _destPath_ : ((path to desktop) as text)

tell application "Skim"
   tell document 1
       set noteCount to count every note
       set noteList to text of notes
   end tell
end tell

repeat with aNote in noteList
   log (count paragraphs of (contents of aNote))
end repeat

You will see that the blocks of text extracted from the notes are all single paragraphs ones.
We can't change that.

Starting from a pdf containing a list of handlers used here,

The script using Skim returned:
#===== (* set le_sous_dossier to my makeNewSubFolder(le_dossier, le_nom_du_sous_dossier) *) on makeNewSubFolder(dossier_hote, nom_du_sous_dossier) local chemin_du_sous_dossier, date_de_modification (* Utilise la routine horoDateur() *) tell application id "com.apple.systemevents" set chemin_du_sous_dossier to dossier_hote & nom_du_sous_dossier if exists folder chemin_du_sous_dossier then set name of disk item chemin_du_sous_dossier to (nom_du_sous_dossier & "_" & my horoDateur(modification date of disk item chemin_du_sous_dossier)) end if -- exists folder... make new folder at end of folder dossier_hote with properties {name:nom_du_sous_dossier} end tell -- System Events return chemin_du_sous_dossier end makeNewSubFolder #=====

Mine returned :
#=====
    
(*

    set le_sous_dossier to my makeNewSubFolder(le_dossier,
le_nom_du_sous_dossier) *)
on makeNewSubFolder(dossier_hote, nom_du_sous_dossier) local chemin_du_sous_dossier,
    date_de_modification
    
(*
Utilise la routine horoDateur() *)
tell application id "com.apple.systemevents"

    set chemin_du_sous_dossier to dossier_hote &
nom_du_sous_dossier
if exists folder chemin_du_sous_dossier then

    set name of disk item chemin_du_sous_dossier to
(nom_du_sous_dossier & "_" & my horoDateur(modification date of disk item
    chemin_du_sous_dossier))
end if -- exists folder...

    make new folder at end of folder dossier_hote with properties
{name:nom_du_sous_dossier} end tell -- System Events
return chemin_du_sous_dossier end makeNewSubFolder
#=====

which is not perfect but is better.
In fact it's what I get when I copy a sample code from Shane Stanley's Everyday AppleScriptObjC 3ed.pdf.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) vendredi 14 février 2020  22:05:02

Offline

 

#5 2020-02-14 04:13:37 pm

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

Yvan

I know you are right, so I agree with you, but that doesn't change 'how I feel'

Are you talking about Script C28-5 in Everyday AppleScriptObjc ??

Regards


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#6 2020-02-15 05:13:18 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 4569

Re: Every notes in separate file with linefeed, returns...

No, the script C28.5 save the entire text contents of a PDF.

Mine save only the selected part which when copied in the clipboard may be available as «class RTF » but not as «class utf8»

If you want to continue with Skim, you must ask its authors to modify the way they grab the text contents of notes.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) samedi 15 février 2020  12:13:11

Offline

 

#7 2020-02-15 08:44:22 am

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

On that forum the suggestion was to use option + return for notes... but on my mac its ctrl + return
I will send a message to them...

I could understand if highlight notes would be useful to only be 1 paragraph that only include
character of letters and numbers.

For people who use ex. PDF for reference for coding... maybe its enough to select and copy.

I did some test with a command-line tool call 'pdfgrep'

If we could store character as notes in a variable.
then it maybe are possible to search for the string in the document, select it and make a text copy.
'pdfgrep' is not so fast.. but it has a option to store the main pd a --cache.

I guess it's not a good idea.

Other interesting thought is

Preview store highlight notes with carriage return in tabview.

Do you think it would be possible to use AppleScriptObjC to call PDFKit... to extract it... smile

Yvan Koenig wrote:


If you want to continue with Skim, you must ask its authors to modify the way they grab the text contents of notes.


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#8 2020-02-15 11:59:13 am

Yvan Koenig
Member
Registered: 2006-09-14
Posts: 4569

Re: Every notes in separate file with linefeed, returns...

Fredrik71 wrote:


Preview store highlight notes with carriage return in tabview.

Do you think it would be possible to use AppleScriptObjC to call PDFKit... to extract it... smile



I retrieved a thread of <applescript-users@lists.apple.com> entitled "collectdata" in which Shane Stanley  posted a script on 2017/03/14.
It was designed to highlight some items in a PDF.
I would not be surprised to read that Shane is able to retrieve highlighted items in such a PDF.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) samedi 15 février 2020  18:57:56

Offline

 

#9 2020-02-15 04:53:19 pm

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

Interesting.

Yvan Koenig wrote:


I would not be surprised to read that Shane is able to retrieve highlighted items in such a PDF.



===>

Today I got a message from the people at Skim.
The respond was that whitespace or returns is not reliability, but selecting text is.
Skim use Apple PDFKit.

===>

Ex.  from the PDF AppleScript Language Guide 2013

Skim and Preview have problem to select text and copy from that PDF.

Adobe Acrobat do not.

===> Solution for Preview and Skim was...
My solution was to change the character spacing for that paragraph and it was able to extract text
correctly. My understand is the author of a PDF file has to know if that PDF file are used to
select text and copy and do test on that before releasing it. If they do not it could be problem later. Its because every pdf viewer use different algorithm to do the guessing work.

Some pdf viewer have good algorithms and other do not.

So in the end...

1. Extracting text maybe are incorrect
2. Extracting all text and later work on it maybe would feel better.

But if...

The pdf file was correctly made for selecting text,
I truly believe highlight notes and extract text is a wonderful way to take notes.

Regards.


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#10 2020-02-15 07:48:59 pm

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6459

Re: Every notes in separate file with linefeed, returns...

You can think of a PDF page as a bit like a whiteboard, with lines of text drawn on it. As a reader, you can usually tell where two lines are separate paragraphs or not, but the rules for computing it are complex. For accuracy you need to take into account geometry, line lengths, punctuation, and so on -- and even then there it's only going to be a guess. The rules for prose are going to differ from those for code, or poetry, or headlines. So software makes an educated guess, and PDFKit takes a fairly conservative approach.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#11 2020-02-16 04:30:15 am

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

Shane, great way to describe the PDF.

===>

If a reader can tell the different between 2 lines draw on a whiteboard, it could also tell
when it will include a carriage return or linefeed also for highlights notes.

I do not get it, why would it be difference.

===>

Yvan point me to a old thread on Apple's AppleScript users-list.
I find this amazing command-line tool.

pdftotext, https://gitlab.freedesktop.org/poppler/poppler

Poppler is PDF rendering library a based on xpdf, and still active.

$ pdftotext -layout -f 14 -l14 AppleScript.Language.Guide.(2013).pdf... wow... smile

It feels like pdf in command-line interface... I'm really impressed.
Now it would be possible to use 'grep'...

pdfgrep directly on pdf file... have maybe lost smile


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#12 2020-02-16 04:37:08 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 6459

Re: Every notes in separate file with linefeed, returns...

Fredrik71 wrote:

I do not get it, why would it be difference.



It's the complexity of the decision-making, and the trade-off in time involved.


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/
latenightsw.com

Offline

 

#13 2020-02-16 04:55:29 am

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

Apple always say... 'We create tools that we (Apple employees, or executive) like to use.

Maybe they have never done highlight text notes in PDF and thought, I like to send this notes
to someone. Or... put it inside my keynote... and if they did... maybe they do it with copy and select.

There are some people who thinking outside the box, to do highlight text notes on mobile device.
and able to send them as email.

Shane Stanley wrote:


It's the complexity of the decision-making, and the trade-off in time involved.

Last edited by Fredrik71 (2020-02-16 04:56:20 am)


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#14 2020-02-17 07:02:17 am

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

I know how to get a specific page in PDF from AppleScript
I know how to get highlight notes from  the PDF
I know how to output text from a PDF and have a good styling.

Know I need to figure out the best approach to make all this working. smile

So to get the text I could use
pdftotext -layout AppleScript.Language.Guide.pdf - > AppleScript.txt --> store on file

or

This will print page 50 of AppleScript.Language.Guide.pdf on standard output (stdout).
pdftotext -layout -f 50 -l 50 AppleScript.Language.Guide.pdf -


Lets say I have 1 paragraph of text... (we call this highlight text notes)

1. make the string (paragraph) as list
2. get items 1 & 2 --> this will be searching start
3. get the last 2 items of the list --> this will be the searching end

4. search on page pdftotext -layout (-f myBeginPageNumber -l myLastPageNumber)...

5. (step 2) give me the begin of regex search
6. (step 3) give me the end...

7. get all paragraph between (step 5 & step 6) + step 5 and step 6

8. print result...

Good luck to me... this will be fun smile


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

#15 2020-02-17 04:56:53 pm

Fredrik71
Member
Registered: 2019-10-23
Posts: 383

Re: Every notes in separate file with linefeed, returns...

UPDATED VERSION: 18/2/2020

Applescript:

property _titleName_ : "Script"
property _extention_ : ".applescript"
property _destPath_ : ((path to desktop) & "AppleScriptLanguageGuide_Scripts:") as text

set thePDF to POSIX path of (path to desktop) & "AppleScript.Language.Guide.pdf"

tell application "Skim"
   tell document 1
       -- set theName to name
       set noteCount to count every note
       set noteList to text of notes
   end tell
end tell

repeat with i from 1 to noteCount
   -- Get the page number from 'Skim' for the current note
   tell application "Skim" to tell document 1 to set thePageIndex to get index for (page of (get item i of notes))
   log "Page: " & thePageIndex
   -- Get the text string from the current note
   set theText to item i of noteList
   log theText
   set AppleScript's text item delimiters to " "
   set theTextList to theText as text
   set resultText to styledPDFAsText(thePDF, thePageIndex, thePageIndex, (words 1 thru 2 of theTextList), (word -1 of theText))
   set resultText to "-- AppleScript Language Guide 2013, Script from page: " & ¬
       thePageIndex & linefeed & linefeed & resultText
   set outputFile to _destPath_ & _titleName_ & (i) & _extention_
   try
       set fileReference to open for access file outputFile with write permission
       write resultText to fileReference
       close access fileReference
   on error
       try
           close access file outputFile
       end try
   end try
end repeat

(*
   The handler use PDF rendering library pdftotext to extract the page of current note so we have less to work with
       https://poppler.freedesktop.org
*)

on styledPDFAsText(thePDF, pageBeginNumber, pageEndNumber, beginString, endString)
   set parms to "-layout -f" & space & pageBeginNumber & space & "-l" & space & pageEndNumber
   set getString to "sed -n '/" & beginString & "/,/" & endString & "/p'"
   do shell script "echo | " & "/usr/local/bin/pdftotext " & parms & space & quoted form of thePDF & " -" & " | " & getString
end styledPDFAsText

Last edited by Fredrik71 (2020-02-17 07:44:57 pm)


The best knowledge is always free, we share ideas, thoughts and expressions. So we could build better worlds together.

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)