Search doc's B,C & D for number from each line of Doc A

Objective: Search the number (ip address) of each line of a TextEdit document A to see if they exist in TextEdit documents B, C or D. If they exist then (do nothing and) move onto next line to search that number. If the number does not exist in doc. B, C or D then delete that entire line from document A and move onto next line to search. Alternative is to copy each found number address into a new document & save as Document X to Desktop after all lines have been searched.

The format of the numbers to search for are in ip address format with a colon to represent port number. The search needs to search the early part of the number prior to the colon and ignore anything that comes after it be it numbers, spaces or words.
Example of listing:
20.1.255.255:3424 Test .
12.73.8.125:82943 Pilot .
254.5.37.95:12635 Beta .

Easiest to consider all documents are located on Desktop.

I’m not even sure where to start on this so I apologise for no start script. I’ve searched for comparable scripts but cannot find something quite as close.

I’ve no doubt such a script could be adapted for other purposes.

Hi,

assuming the TextEdit files are plain text files, here a starting point to extract all IPv4 (even IPv6) addresses as list.
It uses regex and AppleScriptObjC

The result is in the variable ipList


use framework "Foundation"
use scripting additions

set sourceText to read (choose file of type "txt")
set sourceTextAsNSString to current application's NSString's stringWithString:sourceText
set pattern to "([0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4}|(\\d{1,3}\\.){3}\\d{1,3}?"
set regex to current application's NSRegularExpression's regularExpressionWithPattern:pattern options:(current application's NSRegularExpressionCaseInsensitive) |error|:(missing value)
set range to {location:0, |length|:(sourceTextAsNSString's |length|())}
set matches to regex's matchesInString:sourceTextAsNSString options:0 range:range
set ipList to {}

repeat with aMatch in matches
	set end of ipList to (sourceTextAsNSString's substringWithRange:(aMatch's range)) as text
end repeat
log ipList


You may try :

set p2d to path to desktop as text
set a to p2d & "A.txt"
set B to p2d & "B.txt"
set C to p2d & "C.txt"
set D to p2d & "D.txt"
set largeList to (read file B) & (read file C) & (read file D)

set smallList to paragraphs of (read file a)
repeat with i from 1 to count smallList
	set itemI to item 1 of my decoupe(smallList's item i, ":")
	if (count my decoupe(largeList, itemI)) = 1 then set smallList's item i to missing value
end repeat

smallList's every text

#=====

on decoupe(t, D)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, D}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Yvan KOENIG (VALLAURIS, France) dimanche 5 juillet 2015 15:23:55

Hey lotr,

I prefer to use the Satimage.osax AppleScript Extension for this sort of job. It adds regular expression support and a host of other enhancements to AppleScript.

This script operates on the front Finder window, although you can easily change that.

It grabs all .txt files from the specified folder and assumes the following:

A) File 1 is the control-file.
B) Files 2-x are the files to test against.

All found lines are written to a date-stamped report on the Desktop.


------------------------------------------------------------
set foundLines to {}

# Get the location where a new folder would be placed (front window or Desktop).
tell application "Finder" to set srcFolder to insertion location as alias

# Get all .txt files as an alias list.
set fileList to glob "*.txt" from srcFolder as alias

# Separate the first (control) file.
set fileA to item 1 of fileList

# Separate the rest of the files.
set otherFiles to rest of fileList

# Extract IP-Addresses from Control File.
set ipAddressList to find text "\\b(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}):" in fileA using "\\1" with regexp, all occurrences and string result

# Extract lines with Control-IP-Addresses from file-list.
if ipAddressList ≠ {} then
	repeat with i in ipAddressList
		set ipAddress to change "." into "\\." in i
		repeat with _file in otherFiles
			set foundLines to foundLines & (find text ".*\\b" & ipAddress & "\\b.*" in _file with regexp, all occurrences and string result)
		end repeat
	end repeat
end if

# Coerce 'foundLines' list to text for report.
set reportText to join foundLines using linefeed

# Write the report to the Desktop with a date-stamp.
writetext reportText to ("~/Desktop/IP-Address-Report " & (strftime (current date) into "%Y.%m.%d · %H.%M.%S") & ".txt")
------------------------------------------------------------

Note that glob (a function of the Satimage.osax) will take a standard or home-folder-based Posix Path, so you can easily give it a fixed location to search.

set fileList to glob "*.txt" from "~/Desktop" as alias

I would urge you to NOT use TextEdit for plain-text processing. TextWrangler (freeware) or BBEdit are much better choices and are very scriptable.

Hi. I’ll leave this TextWrangler method here as an option.

set searchTerm to "(\\d+\\.)(\\d+\\.)(\\d+\\.)(\\d+\\:)"
set theFolder to ((path to desktop) as text) & "BCD" --assumes [b]B[/b],[b]C[/b],[b]D[/b] files live in this folder
set theList to {}

tell application "TextWrangler" to repeat with isFound in (find searchTerm searching in theFolder options {search mode:grep, showing results:0, returning results:1})'s found matches
	set theList's end to isFound's match_string
end repeat

tell application "TextWrangler"'s document 1 to repeat with lineIndex from (count lines) to 1 by -1 --assumes an open document is [b]A[/b]
	if (find searchTerm searching in line lineIndex options {search mode:grep, showing results:0, returning results:1})'s found text is not in my theList then set line lineIndex's contents to ""
end repeat

This builds on Stefan’s answer and goes through the rest of the process: removing addresses that aren’t found from the original document.

use AppleScript version "2.3.1"
use framework "Foundation"
use scripting additions

set mainFile to (choose file of type "txt" with prompt "Choose the main file:")
set otherFiles to (choose file of type "txt" with prompt "Choose the other files:" with multiple selections allowed)

-- get a set of all addresses in B, C, D, etc
set foundSet to current application's NSMutableSet's |set|() -- will hold each address at most once each
set pattern to "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"
set regex to (current application's NSRegularExpression's regularExpressionWithPattern:pattern options:(current application's NSRegularExpressionCaseInsensitive) |error|:(missing value))
repeat with aFile in otherFiles
	set sourceText to read aFile as «class utf8»
	set sourceTextAsNSString to (current application's NSString's stringWithString:sourceText)
	set range to {location:0, |length|:(sourceTextAsNSString's |length|())}
	-- find addresses and add to set
	set matches to (regex's matchesInString:sourceTextAsNSString options:0 range:range) as list
	repeat with i from 1 to count of matches
		set foundRange to (item i of matches)'s range()
		(foundSet's addObject:(sourceTextAsNSString's substringWithRange:foundRange))
	end repeat
end repeat

-- read main file and get its addresses
set mainFoundSet to current application's NSMutableSet's |set|() -- will hold each address at most once
set mainText to read mainFile as «class utf8»
set mainNSMutableString to current application's NSMutableString's stringWithString:mainText
set range to {location:0, |length|:(mainNSMutableString's |length|())}
-- find addresses and add to main set
set matches to (regex's matchesInString:mainNSMutableString options:0 range:range)
repeat with i from 1 to count of matches
	set foundRange to (item i of matches)'s range()
	(mainFoundSet's addObject:(mainNSMutableString's substringWithRange:foundRange))
end repeat

-- delete addresses found in B, C, D files from main set
mainFoundSet's minusSet:foundSet
set addressesToDelete to mainFoundSet's allObjects() as list

-- delete appropriate paragraphs
repeat with anAddress in addressesToDelete
	-- make pattern for whole paragraph beginning with address
	set pattern to ((current application's NSRegularExpression's escapedPatternForString:anAddress) as text) & ".+(\\n|\\r)?"
	(mainNSMutableString's replaceOccurrencesOfString:pattern withString:"" options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:(mainNSMutableString's |length|())})
end repeat
-- overwrite the main file
set eof mainFile to 0
write (mainNSMutableString as text) to mainFile as «class utf8»

Edited to fix regex

Many thanks to everyone who replied with very good script suggestions. Thanks for your time and effort. I will need to check I have scripting additions installed.

I’m still on OSX10.6 which I suspect Shane realised. TextEdit appears to have been downgraded in later OSX, besides that always had some anomalies such as line-break type inconsistency. I also greatly dislike the OSX10.8 & later version TextEdit’s auto-save feature which cannot be disabled. That proved to be a major headache.

I don’t know for older systems but with Yosemite we may disable Auto-Save if we don’t want it to apply.
To check one more time that it may be done, I decided to disable it for . TextEdit.

It’s not the easiest case.

The preferences file is stored as :
:Users::Library:Containers:com.apple.TextEdit:Data:Library:Preferences:com.apple.TextEdit.plist

I opened it with TextWrangler.

At the beginning I found :

CheckGrammarWithSpelling

I inserted the two lines bolded below.

[b] ApplePersistence [/b] CheckGrammarWithSpelling

Save the file.
Reboot.

After that, TextEdit don’t auto-save.

During the process I discovered an odd behavior.

When I open the preferences thru the menu TextEdit > Preferences
I see a checkbox entitled : Adapter à la page (on the same line than Format RTF
The corresponding key appears when I open the file in TextWrangler:
ShowPageBreaks

but it doesn’t when I open it with the good old Property List Editor or with Pref Setter.
But it’s not the more important feature.
I tested the two settings. New RTF documents are always created with a rectangular frame and I must trigger the shortcut command + shift + W to get rid of it.

Yvan KOENIG (VALLAURIS, France) mardi 7 juillet 2015 12:35:31

FWIW, I suspect this from the command line will do it too:

defaults write com.apple.TextEdit ApplePersistence -bool no

I’d forgotten. As it recedes, it’s probably worth putting your OS in a sig or mentioning it in posts – it saves people working up solutions that can’t be used.

Thanks Shane

Once again I forgot the simple protocol.

Is TextEdit behaving as I described on your machine ?

Yvan KOENIG running Yosemite 10.10.4 (VALLAURIS, France) mardi 7 juillet 2015 15:14:17

To be honest, I haven’t tried it. I actually like auto-saving…

You only need the Satimage.osax for my script, and it should work without issue on Snow Leopard.

http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html

It’s also very easy to change, so if it doesn’t work quite the way you need let me know.

I’m presuming none of these scripts are complete? I know replies are often just the main part of the script that handles the question of the post.

I’ve tried the following scripts on both SL and ML 10.8. (I do have ML for testing purposes & did have Mav.)
Installed Saltimage osax and TextWrangler.

Yvan Koenig your script:
Reads txt doc’s B,C,D & then A listing them under Events, then:
Result:
error “Can’t get item 1 of {}.” number -1728 from item 1 of {}

Syntax Error: Expected end of line, etc. but found “”". I googled glob and tried various methods to list this, all to no avail.

searchTerm Syntax Error: Expected “,” but found identifier.

:pattern options Syntax Error: Expected “,” but found “:”.
use framework “Foundation” ? Is this built-in? Excuse my ignorance.

Re: TextEdit: one of the problems I have is when I copy-paste paragraphs from TE to another text app, be it Tex-Edit Plus or Word or open the document on Windows or copy-paste to Windows basic text program such as WordPad or NotePad, I often find there’s been one or more blank lines added.

Model: MacPro
Browser: Firefox 33.0
Operating System: Mac OS X (10.6)

Hello

I tested in details my version and Shane’s one. Both work flawlessly.
For these tests, B.txt, C.txt and D.txt contained 10,000 lines.
A.txt contained 30,015 lines.

Sometimes is wrong in the way you tested them.
You’re not the first one and alas you will not be the last.
Here is a version in which I added some log instructions to trace what is done.

set p2d to path to desktop as text
set a to p2d & "A.txt"
set B to p2d & "B.txt"
set C to p2d & "C.txt"
set D to p2d & "D.txt"
set largeList to (read file B) & (read file C) & (read file D)
log largeList

set smallList to paragraphs of (read file a)
log smallList
repeat with i from 1 to count smallList
	set fullItemI to smallList's item i
	log fullItemI
	set itemI to item 1 of my decoupe(fullItemI, ":")
	if (count my decoupe(largeList, itemI)) = 1 then set smallList's item i to missing value
end repeat

smallList's every text

#=====

on decoupe(t, D)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, D}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

Please, run it with the events log pane open.
When it will stop, select the entire content of this event log pane, paste in a textEdit document and send it as attachment to :
koenig yvan sfr fr

I’m quite sure that with that I will be able to discover what you are doing wrongly.

Yvan KOENIG (VALLAURIS, France) samedi 11 juillet 2015 10:45:52

It is – but only in 10.9 or later.

Hello all of you.

I received the asker’s files. The problem is that in the file A.txt, some lines doesn’t contain the required colon (in fact some lines are empty).

Attached is an edited version of my proposal.
Of course it would be fine if other helpers may edit their proposal too.

--run script Germaine

--script Germaine

property smallList : missing value

set p2d to path to desktop as text
set mainFile to (p2d & "A1.txt")
set B to (p2d & "B1.txt")
set C to (p2d & "C1.txt")
set D to (p2d & "D1.txt")
set destFile to (p2d & "ABCD1.txt") as «class furl»

tell current application
	set largeList to (read (file B) as «class utf8») & (read (file C) as «class utf8») & (read (file D) as «class utf8»)
	set my smallList to paragraphs of (read file mainFile as «class utf8»)
end tell
repeat with i from 1 to count my smallList
	if my smallList's item i does not contain ":" then
		set my smallList's item i to missing value
	else
		set itemI to item 1 of my decoupe(my smallList's item i, ":")
		# With this new test the script runs faster
		if largeList does not contain itemI then set my smallList's item i to missing value
		--if (count my decoupe(largeList, itemI)) = 1 then set my smallList's item i to missing value
	end if
end repeat
set cleanText to my recolle(my smallList's every text, return)

my writeto(destFile, cleanText, «class utf8», false)

--end script

#=====

on decoupe(t, D)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, D}
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

#=====

on recolle(l, D)
	local oTIDs, t
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, D}
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end recolle

#=====
(*
Handler borrowed to Regulus6633 - http://macscripter.net/viewtopic.php?id=36861
*)
on writeto(targetFile, theData, dataType, apendData)
	-- targetFile is the path to the file you want to write
	-- theData is the data you want in the file.
	-- dataType is the data type of theData and it can be text, list, record etc.
	-- apendData is true to append theData to the end of the current contents of the file or false to overwrite it
	try
		set targetFile to targetFile as «class furl»
		set openFile to open for access targetFile with write permission
		if not apendData then set eof of openFile to 0
		write theData to openFile starting at eof as dataType
		close access openFile
		return true
	on error
		try
			close access targetFile
		end try
		return false
	end try
end writeto

#=====

Yvan KOENIG running Yosemite 10.10.4 (VALLAURIS, France) samedi 11 juillet 2015 13:15:51

Many thanks Yvan, this works very well! Excellent. :slight_smile:
I didn’t realise there were blank lines, but your fix overcomes such an issue.
This script will help me overcome tedious long hours of searching these items one by one through multiple docs. :smiley:
Without this script my future at continuing to do this tedious work would have been fast becoming short.

My apologies to everyone for not mentioning my OSX in the beginning.

The other thing TextEdit has changed is the search function. Instead of a universal search box where you can use the same search term & search multiple open docs one by one by making each active, the newer TE only has an inline search & does not include case, wrap around or the contains/begins with/full word. Nor does it contain a search history. TextWrangler seems to use a similar search box to the old TE. I also found TE on SL can open more file-types than in ML or Mavericks. Apple may have downgraded this for security reasons.

Model: MacPro
Browser: Firefox 33.0
Operating System: Mac OS X (10.6)

It doesn’t make any difference; you hit command-F, and the same search string appears in the new document.

Yes it does – they’re all in the search field’s menu. Plus it adds search by pattern.

Can you provide any examples? The thing with TextEdit is that the code is available, and I don’t believe that side of things has changed in ages.

If you don’t like newer versions of the OS and prefer 10.6, fine. But it sounds like your comparison of TextEdit versions is based on a fairly superficial look.

My script was complete (and tested), but the portion you quoted is truncated; the end repeat is necessary to terminate that block.

Thanks, Yvan. If this is the case, my example’s search should not require the colon. I’m assuming a space might take its place.

set searchTerm to "(\\d+[.])(\\d+[.])(\\d+[.])(\\d+[: ]?\\d+)"

Here is a slightly modified version of Shane’s script.

As is, it requires at least 10.10 but it may easily be edited for using it under Mavericks (10.9.x).

The given version treated lotd’s text files in 7.446277022362 seconds.

use AppleScript version "2.3.1"
use framework "Foundation"
use scripting additions

--run Germaine

--script Germaine

(*
set mainFile to (choose file of type "txt" with prompt "Choose the main file:")
set otherFiles to (choose file of type "txt" with prompt "Choose the other files:" with multiple selections allowed)
*)
set beg to current application's NSDate's |date|()

--repeat 1 times
set p2d to path to desktop as text
# destfile is used only by the script so define it as class furl
set destfile to (p2d & "A.txt") as «class furl»

# In real life mainfile and otherfiles are returned by choose file so they are alias
set mainfile to (p2d & "A.txt") as alias
set otherfiles to {(p2d & "B.txt") as alias, (p2d & "C.txt") as alias, (p2d & "D.txt") as alias}

-- get a set of all addresses in B, C, D, etc
set foundSet to current application's NSMutableSet's |set|() -- will hold each address at most once each
set pattern to "[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}"
set regex to (current application's NSRegularExpression's regularExpressionWithPattern:pattern options:(current application's NSRegularExpressionCaseInsensitive) |error|:(missing value))
repeat with aFile in otherfiles
	set sourceText to read aFile as «class utf8»
	set sourceTextAsNSString to (current application's NSString's stringWithString:sourceText)
	set range to {location:0, |length|:(sourceTextAsNSString's |length|())}
	-- find addresses and add to set
	set matches to (regex's matchesInString:sourceTextAsNSString options:0 range:range) as list
	repeat with i from 1 to count of matches
		set foundRange to (item i of matches)'s range()
		(foundSet's addObject:(sourceTextAsNSString's substringWithRange:foundRange))
	end repeat
end repeat

-- read main file and get its addresses
set mainFoundSet to current application's NSMutableSet's |set|() -- will hold each address at most once
set mainText to read mainfile as «class utf8»
set mainNSMutableString to current application's NSMutableString's stringWithString:mainText
set range to {location:0, |length|:(mainNSMutableString's |length|())}
-- find addresses and add to main set
set matches to (regex's matchesInString:mainNSMutableString options:0 range:range)
repeat with i from 1 to count of matches
	set foundRange to (item i of matches)'s range()
	(mainFoundSet's addObject:(mainNSMutableString's substringWithRange:foundRange))
end repeat

-- delete addresses found in B, C, D files from main set
mainFoundSet's minusSet:foundSet
set addressesToDelete to mainFoundSet's allObjects() as list

-- delete appropriate paragraphs
repeat with anAddress in addressesToDelete
	-- make pattern for whole paragraph beginning with address
	set pattern to ((current application's NSRegularExpression's escapedPatternForString:anAddress) as text) & ".+(\\n|\\r)?"
	(mainNSMutableString's replaceOccurrencesOfString:pattern withString:"" options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:(mainNSMutableString's |length|())})
end repeat
-- overwrite the main file
try
	set openFile to open for access destfile with write permission
	set eof of openFile to 0
	write (mainNSMutableString as text) to openFile as «class utf8»
	close access openFile
on error
	try
		close access destfile
	end try
end try
--end repeat


set theDiff to (beg's timeIntervalSinceNow()) as real
tell application "SystemUIServer" to display dialog "effectué en : " & -theDiff & " secondes"
--end script

Huge thanks to Shane for this gem.

Yvan KOENIG (VALLAURIS, France) samedi 11 juillet 2015 15:04:06