How to convert an RTF file where I find hyperlinks and standard text?
The file is about 200 pages and ALWAYS keeps follows this format:
Intro text (max 4 or 5 words)
HYPERLINK 1 (Hides my student link to the web page with his data)
HYPERLINK 2 (Hides my student link to his home work page)
HYPERLINK 3 (Hides my student link to additional page)
Date In standard tex
Converting from RTF to TXT looses the Links
Using TextEdit Manually is possible but it needs to select manually ang paste it manually
for 200 pages is a nightmare
I have found this script
however:
1st it does not return the Plain text part of the file so I miss the “date” information
It does not write the results to I file I can later import in Filemaker
set rtfFile to (choose file with prompt "Choose the RTF file.")
set filePath to rtfFile
set startDelimiter to "{HYPERLINK \""
set endDelimiter to "\"}}"
set hyperlinks to {}
set rtfText to read file filePath
set text item delimiters to startDelimiter
set theItems to text items of rtfText
if (count of theItems) is greater than 1 then
set text item delimiters to endDelimiter
repeat with i from 2 to count of theItems
set a to text items of (item i of theItems)
set end of hyperlinks to item 1 of a
end repeat
end if
set text item delimiters to ""
return hyperlinks
writeTo(adjustedText, filePath, false, string)
(*==================== SUBROUTINES ===================*)
on writeTo(this_data, target_file, append_data, mode) -- append_data is true or false, mode is string etc. (no quotes around either)
try
set target_file to target_file as text
set target_file to POSIX file target_file as text
set the open_target_file to open for access file target_file with write permission
if append_data is false then set eof of the open_target_file to 0
write this_data to the open_target_file starting at eof as mode
close access the open_target_file
return true
on error
try
close access file open_target_file
end try
return false
end try
end writeTo
Thanks of the kind people who will help me solve this problem
I wrote a very primitive Command Line Tool which extracts the links from an RTF text,
the AppleScript usage is
do shell script "/path/to/RTFLinkParser /path/to/file.rtf"
the result is the plain text. If there is a link in a paragraph the link is put after the plain text separated by a tab character.
If there is more than one link in a single paragraph only the first link is considered
You can download it here: RTFLinkParser
It should work on PPC and Intel 10.5 or higher
Where do I need to put the script in my Mac? I am not familiar with UNIX unfortunately and I do not have the developper installed as I do not develop anything?
Also: is there a way to create an Applescript which allows to keep the plain text to the resulting document?
Example in plain english
Find file with prompt
set para to paragraphs in file
repeat
from para to last para
if para is unicode txt
copy para + tab
else
do shell script RTFLinkParser
end repeat
I put the Parser in the Applescript MYSCripts folder
biut how choose the file to process?
set tFile to choose file with prompt "Choose the tfile" without invisibles
do shell script "/path/to/RTFLinkParser /path/to/file.rtf" & quoted form of POSIX path of tfile
also:
do shell script "/path/to/RTFLinkParser /path/to/file.rtf" & quoted form of POSIX path of (choose file of type "rtf")
I always get the same error:
error “sh: /path/to/RTFLinkParser: No such file or directory” number 127
set tFile to choose file with prompt "Choose the tfile" without invisibles
do shell script "/path/to/RTFLinkParser " & quoted form of POSIX path of tfile
replace /path/to/RTFLinkParser with the full (POSIX) path to the executable
→ error “2011-05-05 15:33:11.988 RTFLinkParser[2057:60f] (null): unrecognized selector sent to class 0x7fff70a88698
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100111610 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100111660 of class NSException autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.990 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x1001159f0 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115a50 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115d70 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100116890 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.991 RTFLinkParser[2057:60f] *** __NSAutoreleaseNoPool(): Object 0x100115e60 of class NSConcreteMutableData autoreleased with no pool in place - just leaking
2011-05-05 15:33:11.992 RTFLinkParser[2057:60f] *** Terminating app due to uncaught exception ‘NSInvalidArgumentException’, reason: ‘(null): unrecognized selector sent to class 0x7fff70a88698’
*** Call stack at first throw:
(
0 CoreFoundation 0x00007fff824157b4 __exceptionPreprocess + 180
1 libobjc.A.dylib 0x00007fff83f820f3 objc_exception_throw + 45
2 CoreFoundation 0x00007fff8246f1a0 __CFFullMethodName + 0
3 CoreFoundation 0x00007fff823e791f forwarding + 751
4 CoreFoundation 0x00007fff823e3a68 _CF_forwarding_prep_0 + 232
5 RTFLinkParser 0x0000000100000a0a main + 44
6 RTFLinkParser 0x00000001000009bc start + 52
)
terminate called after throwing an instance of ‘NSException’” number 1006
Second part has also a chinese font set (fcharset128 HiraKakuProN-W3; \f3\fnil\fcharset134 STHeitiSC-Light;o)
but I think the command line works with any charset so this should not be the problem as I did a test only with latin chars and the errors are the same
‘NSInvalidArgumentException’, reason: '(null): unrecognized selector sent to class
I tried copying the same file from this site, reopening it and saving as RTF on Textedit but I still get the same errors on the result window:
Am I doing something wrong?
error “2011-05-05 17:26:35.652 RTFLinkParser[2500:60f] (null): unrecognized selector sent to class 0x7fff70a88698
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100111610 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100111660 of class NSException autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x1001159f0 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.654 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115a50 of class _NSCallStackArray autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115d70 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100116890 of class NSCFString autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** __NSAutoreleaseNoPool(): Object 0x100115e60 of class NSConcreteMutableData autoreleased with no pool in place - just leaking
2011-05-05 17:26:35.655 RTFLinkParser[2500:60f] *** Terminating app due to uncaught exception ‘NSInvalidArgumentException’, reason: ‘(null): unrecognized selector sent to class 0x7fff70a88698’
*** Call stack at first throw:
(
0 CoreFoundation 0x00007fff824157b4 __exceptionPreprocess + 180
1 libobjc.A.dylib 0x00007fff83f820f3 objc_exception_throw + 45
2 CoreFoundation 0x00007fff8246f1a0 __CFFullMethodName + 0
3 CoreFoundation 0x00007fff823e791f forwarding + 751
4 CoreFoundation 0x00007fff823e3a68 _CF_forwarding_prep_0 + 232
5 RTFLinkParser 0x0000000100000a0a main + 44
6 RTFLinkParser 0x00000001000009bc start + 52
)
terminate called after throwing an instance of ‘NSException’” number 1006