I have several long documents with different content and encoding. Each contains some keywords which can allow me to break each document in different parts.
While the keywords are the same within each Document they are different for every single document
Example for document_01 in Folder Named “Collected Docs”
keyword: Mountain
text text text
keyword: Mountain (same as previous)
text text text
Example for document_02 still in Folder Named “Collected Docs”
keyword: China
text text text
keyword: China (same as previous)
text text text
and so on
I hope to receive some kind help to get a script to
1st Assign the first keywords (example Mountain for document_01) to the script and flag document_01 as “done”
2nd break the document_01 in several others according to how many keywords the script finds
3nd move all the new files from document_01 into a folder which I can name at the end of the script
4th go back to the “Collected Docs” folder and repeat the script to document_02 assigning the second keyword (example China for document_02)until every document goes through this process
I assume that this piece of code may be a starting point.
(*
Structure of the parameters file :
document01.txt<TAB>keyword1
document02.txt<TAB>keyword2
document03.txt<TAB>keyword3
Yvan KOENIG (VALLAURIS, France)
2010/08/13
*)
property nom_du_fichier_parametres : "parameters.txt"
property extension_txt : ".txt"
on run
set le_dossier to "" & (choose folder)
tell application "System Events"
set fichiers_text to name of every file of folder (le_dossier) whose type identifier is "public.plain-text"
end tell
if fichiers_text does not contain nom_du_fichier_parametres then
error "The file "" & nom_du_fichier_parametres & "" is unavailable !"
end if
set les_parametres to paragraphs of (read file (le_dossier & nom_du_fichier_parametres))
repeat with ref_des_parametres in les_parametres
set {nom_source, delimiteur} to my decoupe(ref_des_parametres, tab)
if nom_source ends with extension_txt then
set nom_du_dossier_cible to text 1 thru -(1 + (length of extension_txt)) of nom_source
set chemin_cible to le_dossier & nom_du_dossier_cible
tell application "System Events"
if exists folder chemin_cible then set name of disk item chemin_cible to (nom_du_dossier_cible & (do shell script "date +_%Y%m%d-%H%M%S"))
make new folder at end of folder le_dossier with properties {name:nom_du_dossier_cible}
end tell -- System Events
set les_blocs to my decoupe(read file (le_dossier & nom_source), delimiteur)
repeat with i from 1 to count of les_blocs
set nom_numero_i to nom_du_dossier_cible & "#" & text -3 thru -1 of ("000" & i) & extension_txt
tell application "System Events" to make new file at end of folder chemin_cible with properties {name:nom_numero_i}
write "" & item i of les_blocs to file (chemin_cible & ":" & nom_numero_i)
end repeat
end if -- nom_source ends.
end repeat
end run
--=====
on decoupe(t, d)
local oTIDs, l
set oTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to d
set l to text items of t
set AppleScript's text item delimiters to oTIDs
return l
end decoupe
--=====
I edited it a bit because I wasn’t fully satisfied by the date_time stamp applied to existing folders.
Now, I no longer use the current date_time but the modification_date_time of the existing folder.
(*
Structure of the parameters file :
document01.txt<TAB>keyword1
document02.txt<TAB>keyword2
document03.txt<TAB>keyword3
Yvan KOENIG (VALLAURIS, France)
2010/08/13
changed the date_time stamp of existing folders.
Now it's no longer built according to current date_time but to the folder's modification date.
*)
property nom_du_fichier_parametres : "parameters.txt"
property extension_txt : ".txt"
on run
set le_dossier to "" & (choose folder)
tell application "System Events"
if not (exists file (le_dossier & nom_du_fichier_parametres)) then
error "The file "" & nom_du_fichier_parametres & "" is unavailable !"
end if
end tell
set les_parametres to paragraphs of (read file (le_dossier & nom_du_fichier_parametres))
repeat with ref_des_parametres in les_parametres
set {nom_source, delimiteur} to my decoupe(ref_des_parametres, tab)
if nom_source ends with extension_txt then
set nom_du_dossier_cible to text 1 thru -(1 + (length of extension_txt)) of nom_source
set chemin_cible to my makeNewFolder(le_dossier, nom_du_dossier_cible)
set les_blocs to my decoupe(read file (le_dossier & nom_source), delimiteur)
repeat with i from 1 to count of les_blocs
set nom_numero_i to nom_du_dossier_cible & "#" & text -3 thru -1 of ("000" & i) & extension_txt
tell application "System Events" to make new file at end of folder chemin_cible with properties {name:nom_numero_i}
write "" & item i of les_blocs to file (chemin_cible & ":" & nom_numero_i)
end repeat
end if -- nom_source ends.
end repeat
end run
--=====
on makeNewFolder(dossier_hote, sous_dossier)
tell application "System Events"
set chemin_cible to dossier_hote & sous_dossier
if exists folder chemin_cible then
set date_de_modification to modification date of disk item chemin_cible
set les_secondes to time of date_de_modification
set name of disk item chemin_cible to (sous_dossier & "_" & year of date_de_modification & text -2 thru -1 of ("0" & (month of date_de_modification as integer)) & text -2 thru -1 of ("0" & day of date_de_modification) & "_" & text -2 thru -1 of ("0" & les_secondes div 3600) & text -2 thru -1 of ("0" & (les_secondes mod 3600) div 60) & text -2 thru -1 of ("0" & les_secondes mod 60))
end if
make new folder at end of folder dossier_hote with properties {name:sous_dossier}
end tell -- System Events
return chemin_cible
end makeNewFolder
--=====
on decoupe(t, d)
local oTIDs, l
set oTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to d
set l to text items of t
set AppleScript's text item delimiters to oTIDs
return l
end decoupe
--=====
In any case the script stops unfortunately at the end of scanning the first document and before breaking it through the keyword.
The text in question are notes taken during trips and they all the keywords are indeed the ones I say: Unique to each document. I did this in Text Wrangler and I am totally certain the same keyword does not appear but in any separate doc.
I could change all “wrong chars” in Text Wrangler all at once if this is the case.
I tried to record a script there as it has powerful scripting capacities but the results are poor.
Give this a try. It works on only 1 file and the script will ask you for 1) the file to convert, 2) the keyword, and 3) the folder where you want the created text files saved.
set theFile to choose file of type {"txt"} with prompt "Choose the file to convert."
set theKeyword to text returned of (display dialog "What is the keyword for this file?" default answer "")
set outFolder to (choose folder with prompt "Where would you like the output files saved?") as text
-- read the file
try
set theText to read theFile
on error
display dialog "The file could not be read:" & return & (theFile as text) buttons {"OK"} default button 1 with icon stop
return
end try
-- get the parts of the file separated by the keyword
set text item delimiters to theKeyword
set textList to text items of theText
set text item delimiters to ""
set {Nm, Ex} to getName_andExtension(theFile)
repeat with i from 1 to count of textList
-- calculate the output file path
set theNum to text -3 thru -1 of ("000" & (i as text))
set newPath to outFolder & Nm & "_" & theNum & ".txt"
-- write the text to newPath
set success to writeTo(item i of textList & theKeyword, newPath, false, text)
if success is false then
display dialog "There was a problem writing to the file:" & return & newPath buttons {"OK"} default button 1 with icon stop
exit repeat
end if
end repeat
(*============== SUBROUTINES =============*)
on writeTo(fileData, targetFile, apendData, mode) -- apendData is true or false... mode is string, list, or record (no quotes around either)
try
set targetFile to targetFile as text
if targetFile does not contain ":" then set targetFile to POSIX file targetFile as text
set openFile to open for access file targetFile with write permission
if apendData is false then set eof of openFile to 0
write fileData to openFile starting at eof as mode
close access openFile
return true
on error
try
close access file openFile
end try
return false
end try
end writeTo
on getName_andExtension(F)
set F to F as Unicode text
set {name:Nm, name extension:Ex} to info for file F without size
if Ex is missing value then set Ex to ""
if Ex is not "" then
set Nm to text 1 thru ((count Nm) - (count Ex) - 1) of Nm
end if
return {Nm, Ex}
end getName_andExtension