Search doc's B,C & D for number from each line of Doc A

I think the issue was blank lines. I’m not so sure any numbers lacked a colon.

Thanks, I had not noticed it was placed into the menu bar. Off-hand I cannot provide TE examples, just by test opening various files.
It’s getting off-topic, but be my reasons petty it comes down to personal habit of OSX use (& earlier.)

Shortcut I frequently use changes after many years; Selecting multiple random files using CMD & click.
Open the same folders on startup. I always have 9+ folders open. Mav removed open folders in new window option. https://discussions.apple.com/thread/5467614?start=0&tstart=0 There’s a workaround. Hidden Library, though I know there’s a workaround.

It’s those little things that just don’t suit my normal way of using a Mac. I do like the open same docs option when opening an app.

I use a couple of apps (one an audio editor) that are not intel based, thus Rosetta. I know eventually I will probably run SL in an VM. But for now there’s not a lot I find appealing about Mavericks (I haven’t seen Yosemite.) Many of the improvements I might rarely if ever use.

Not a workaround, a definitive answer :

do shell script "chflags nohidden ~/Library"

For Yosemite (I’m not sure if it apply to Mavericks), there is an official GUI to make the library file visible.
From the Finder, open your Home folder then press command + j
In the proposed options you may see a checkbox allowing us to Display the Library folder.
I’m puzzled by the fact that most of the users which I met ignore this cute feature.

Yvan KOENIG (VALLAURIS, France) samedi 11 juillet 2015 17:12:24

Hey lotr,

Huh?

My script is complete and tested although only against the incomplete data sample you provided.

You did install the Satimage.osax ” YES? My script requires it.

http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html

As for the glob line example you gave ” it is short:

You cite:

set fileList to glob "*.txt"

But the actual code is:

set fileList to glob "*.txt" from srcFolder as alias

glob is plainly defined in the Satimage.osax’s AppleScript dictionary:


glob (verb)list the files or the folders matching a unix pathname pattern (from the Satimage File Additions suite, defined in Satimage.osax)
function syntax

set theResult to glob string or list of string ¬
     from alias ¬
     invisibles boolean ¬
     of extension list of string ¬
     not conforming to list of string ¬
     after date ¬
     before date ¬
     names only boolean ¬
     as type class

All of these work:


-----------------------------------------------------------------
tell application "Finder" to set srcFolder to target of front window as alias
-----------------------------------------------------------------
set fileList to glob "*.txt" from srcFolder as alias
-----------------------------------------------------------------
set srcFolderPosix to POSIX path of srcFolder
set fileList to glob "*.txt" from srcFolderPosix as alias
-----------------------------------------------------------------
set srcFolderPosixHome to "~/Downloads/"
set fileList to glob "*.txt" from srcFolderPosixHome as alias
-----------------------------------------------------------------

So you have quite a lot of choice in how you represent the folder to be searched.

as alias in the glob command is the output format which can also be as text, path, and posix path.

So again ” you have a lot of choice.

If you provide me with the files in question I’d be happy to make certain it works correctly.

I’m embarrassed I discovered I accidentally installed XMLLib osax instead. oops! After installing saltimage, the script works fine! :slight_smile: Sorry I had shortened the script reference earlier simply to show where AS displayed the error.

Comparing the search results of your script:
A line-count size of 38764 compared to 2743 of Yvan’s scipt. I noticed one number that is listed 11 times in Doc A is listed in the results 132 times (Yvan’s result listed it 11 times.) I realise one reason for more listings is it also lists the number found in Docs B-D which is good in the respect I know exactly which doc they were found in. The number that was listed 132 times is found once in Doc D. There appears to be a reiteration of the search results.

Docs A-D can be found here. Many thanks for taking the time and concern.

What is the correct/wanted behavior ?
My script does what you desribed in your original question :

If the number does not exist in doc. B, C or D then delete that entire line from document A and move onto next line to search.

Yvan KOENIG (VALLAURIS, France) dimanche 12 juillet 2015 09:53:13

Hey lotr,

Not having seen the actual data files I didn’t account for duplicate search results OR duplicates in the control IP-Address list.

Now that I’ve seen a proper data sample I’ve made a couple of adjustments.

This script removes all duplicates from both.

On my system it produces a report with 896 lines.


------------------------------------------------------------
# Edit: 2015/07/13 13:08
# Vers: 1.01
-------------------------------------------------------------------------------------------
set foundLines to {}
set largeDataText to ""

# Get the location where a new folder would be placed (front window or Desktop).
tell application "Finder" to set srcFolder to insertion location as alias
set srcFolderPosix to POSIX path of srcFolder

# Get all .txt files as an alias list.
set fileList to glob "*.txt" from srcFolder as alias

# Separate the first (control) file.
set fileA to item 1 of fileList

# Separate the rest of the files.
set otherFiles to rest of fileList
repeat with i in otherFiles
	set largeDataText to largeDataText & (read i as «class utf8») & linefeed
end repeat
set largeDataText to change "^[[:blank:]]*\\n" into "" in largeDataText with regexp
set largeDataText to change "\\A\\s+|\\s+\\Z" into "" in largeDataText with regexp
set largeDataText to join (sortlist (get paragraphs of largeDataText) with remove duplicates) using return

# Extract IP-Addresses from Control File.
set ipAddressList to find text "^\\b(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\b" in fileA using "\\1" with regexp, all occurrences and string result
# Remove Duplicates.
set ipAddressList to sortlist ipAddressList with remove duplicates
set ipAddressList to change "." into "\\." in ipAddressList

# Extract lines with Control-IP-Addresses from file-list.
if ipAddressList ≠ {} then
	repeat with i in ipAddressList
		set foundLines to foundLines & (find text ".*\\b" & i & "\\b.*" in largeDataText with regexp, all occurrences and string result)
	end repeat
end if

# Remove Duplicates and coerce 'foundLines' list to text for report.
set reportText to join (sortlist foundLines with remove duplicates) using linefeed

# Write the report to the Desktop with a date-stamp.
writetext reportText to (srcFolderPosix & "IP-Address-Report " & (strftime (current date) into "%Y.%m.%d · %H.%M.%S") & ".txt")
------------------------------------------------------------

Hey lotr,

I’ve taken a different approach with this script.

It deletes unfound IP lines from a COPY of the original A.txt file ” A-Filtered.txt in the same directory as the other files.

It consolidates the B.txt, C.txt, & D.txt files into one temporary file ” Temp_Data_File.txt in the same directory as the other files.

This script runs in about 16 seconds from FastScripts on my system.

It’s a bit slow, because I’m reading A.txt line-by-line, finding against Temp_Data_File.txt on disk, and writing that line to A-Filtered.txt ONLY if it is FOUND in the temp file.

AppleScript can get bogged down with really large files and/or really large lists, but this approach can handle quite massive files albeit with a speed-hit.

I’ll rewrite it to do everything in memory later.


-------------------------------------------------------------------------------------------
# Auth: Christopher Stone <scriptmeister@thestoneforge.com>
# dNam: lotr IP Script { write-file-version }
# dCre: 2015/07/06 16:09
# dMod: 2015/07/12 04:38
# Appl: Finder & Satimage.osax
# Task: Filter NOT-FOUND IP-Addresses out of file A.txt and write to A-Filtered.txt.
# Osax: Satimage.osax { http://tinyurl.com/dc3soh }
# Tags: @Applescript, @Script, @Finder, @Filter, @IP-Addresses
# Vers: 1.00
-------------------------------------------------------------------------------------------
# Approximately 16 seconds runtime from FastScripts.
-------------------------------------------------------------------------------------------

tell application "Finder" to set srcFolder to insertion location as alias
set srcFolderPosix to POSIX path of srcFolder
set tempDataFile to srcFolderPosix & "Temp_Data_File.txt"
set aFilteredFile to srcFolderPosix & "A-Filtered.txt"

# Get all .txt files as an alias list.
set fileList to glob "*.txt" from srcFolder as POSIX path

# Separate the first (control) file.
set fileA to item 1 of fileList

# Separate the rest of the files.
set otherFiles to rest of fileList

# Create 1 consolidated data file (Temp_Data_File.txt) from otherFiles in save dir as other files.
repeat with i in otherFiles
	writetext (readtext i) to tempDataFile with append
end repeat

try
	set foundIpCounter to 0
	set fNum to open for access fileA
	
	repeat
		# Read Line in Control File.
		set _line to read fNum until linefeed
		try
			# Extract IP-Address from read line.
			set ipAddress to find text "^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*" in _line using "\\1" with regexp and string result
			# Create appropriate regular expression from extracted IP-Address.
			set regexStr to ".*\\b" & (change "." into "\\." in ipAddress) & "\\b.*"
			# Search against contents of Temp_Data_File.txt
			find text regexStr in POSIX file tempDataFile with regexp and string result
			# If IP-Address regular expression is found then write original line to new file.
			writetext _line to aFilteredFile with append
			set foundIpCounter to foundIpCounter + 1
		end try
	end repeat
	
on error
	try
		close access fNum
		beep
	end try
end try

-------------------------------------------------------------------------------------------

The scripts work very well and as requested!
Chris’s script issue with size was overcome when I changed source to a sub-folder on desktop instead of top level of desktop.

Thanks everyone for your great input. :slight_smile:

Edit, I didn’t realize Chris had replied.

This works well. I found this works faster if I use a sub-folder instead of top-level of Desktop. (Desktop clutter I guess lol)

Hey lotr,

Very good. In that case I’ll leave it be (unless I decide to write it in Perl later for fun).

The Desktop is a dangerous place… :cool:

Since I didn’t know your file-name convention would be static the script as is looks for any text file in the Finder insertion location.

Here’s some code to look explicitly for files A.txt, B.txt, C.txt, & D.txt:


tell application "Finder" to set frontFinderFolder to insertion location as alias
set fileList to glob "[ABCD].txt" from frontFinderFolder as alias

By the way ” would you confirm for me that insertion location works in the Finder on Snow Leopard?

Hey lotr,

I had some free time and wrote it in Perl for fun and for practice.

It’s a trifle crude, because I haven’t been practicing my Camel riding lately. :cool:

But it runs in sub-1-second times on my Mid-2010 i7 MacBook Pro.

I’ve tested it only on OSX 10.9.5 and run it from both BBEdit and FastScripts.

It is best if you continue to use a sub-folder to contain only those files operated on, as the script zeros in on the front Finder folder.

The script might or might not run on Snow Leopard. If it does NOT then please determine what version of Perl you have and let me know.

In the Terminal run:

perl --version

I’d also want to see any error messages generated.

The script should just run from BBEdit or TextWrangler.

If you run it from FastScripts or another script-runner you might have to make it executable.

The script still creates a ‘A-Filtered.txt’ file in the same folder as the source files, but it no longer creates a consolidated data-file.

It does specifically target files A.txt, B.txt, C.txt, & D.txt though.

#! /usr/bin/perl 
    use strict; use warnings;
    use File::Slurp;
#--------------------------------------------------------------------------------------
# Auth: Christopher Stone <scriptmeister@thestoneforge.com>
# dCre: 2015/07/12 18:00
# dMod: 2015/07/12 21:37 
# Appl: Perl & App to run Perl
# Task: Filter lines out of control file if ip-address is not found in data-file-set.
# Tags: @Perl, @Script, @Filter, @IP-Address
# Vers: 1.00
#--------------------------------------------------------------------------------------
# Notes:
#
# Written for user lotr on MacScripter.net
#--------------------------------------------------------------------------------------

my ($applescript, $controlFile, $dataFileText, $fh, $myDir, $outputFilePath, $regExStr, @controlFileData, @newFileData);

$applescript = 'osascript -e "
    tell application \"Finder\" to set frontFolderAlias to insertion location as alias
    return POSIX path of frontFolderAlias
"';

$myDir           = `$applescript`;
chomp $myDir;
$outputFilePath  = $myDir.'A-Filtered.txt';

$controlFile     =  read_file( $myDir."A.txt" ) ;
$controlFile     =~ s!(?m)^[[:blank:]]*\n!!g;
$dataFileText    =  read_file( $myDir."B.txt" ) . "\n";
$dataFileText    =  $dataFileText . read_file( $myDir."C.txt" ) . "\n";
$dataFileText    =  $dataFileText . read_file( $myDir."D.txt" ) . "\n";
$dataFileText    =~ s!(?m)^[[:blank:]]*\n!!g;
@controlFileData =  split("\n", $controlFile);

foreach (@controlFileData) {

    if ( m!^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b! ) {
        $regExStr = quotemeta($&);
        $regExStr = '(?m)^'.$regExStr.'\b';

        if ( $dataFileText =~ m!$regExStr! ) {
            push(@newFileData, $_);
        }
    }

}

open($fh, '>', $outputFilePath) or die "Could not open file '$outputFilePath' $!";
$, = "\n";
print $fh @newFileData;
close $fh;

Yes this works on SL with all the script versions you’ve posted. :wink: The names didn’t matter for your earlier script(s).

I just realised I had not tested this version of your script:

set largeDataText to largeDataText & (read i as «class utf8») & linefeed

read i as «class utf8»
error “Can’t make some data into the expected type.” number -1700
But this version does runs in most part on ML. Again the insertion point works fine. However there was an error An error of type -10 has occurred. upon the final writetext line.

	use strict; use warnings;

strict
Expected end of line, etc. but found identifier.
SL version perl, v5.10.0 but also same error in ML (perl v5.12.4), perhaps because my ML is missing module additions?

Model: MacPro
Browser: Firefox 33.0
Operating System: Mac OS X (10.6)

Hello Chris.

The Perl script worked flawlessly under 10.10.4.

Yvan KOENIG running Yosemite 10.10.4 (VALLAURIS, France) lundi 13 juillet 2015 16:15:21

Hey Yvan,

Thanks for letting me know.

Hey lotr,

Good.

Nor should they have if there were only the 4 of them named as they were, but if you had more than those 4 text files in the insertion location the additional files would have been added to the B, C, & D data.

I looked at that and realized that I had hard-coded the report path ” now fixed in the original post.

The Perl script failed?

Hmm… ‘File’ is supposed to be a core module.

I’ll look into it some more.

I’m still fiddling with this.

My MacPorts installed Perl v5.16.3 doesn’t run it either due to a missing file::slurp module. I had to back off to the default system Perl in Mavericks.

This my not be completely comprehensive, so if someone more knowledgeable than me can improve upon it please do.


-------------------------------------------------------------------------------------
# Create Report of Installed Perl Modules in TextEdit.
-------------------------------------------------------------------------------------
set shCMD to text 2 thru -1 of "
DIR=$(/usr/bin/perl -e 'print join(\"\\n\", @INC);' | sed '$d' | tr '\\n' ' ');
find $DIR -iname \"*.pm\" 2>/dev/null | open -f
"
do shell script shCMD
-------------------------------------------------------------------------------------

Update: 2015/07/13 21:52

I was able to update the Macports Perl without too much trouble:

sudo port -d install p5-file-slurp

Once I did that the original Perl script worked fine with my normal shebang line:

#! /usr/bin/env perl

Hey lotr,

Okay, I rewrote the script without using any modules, so hopefully it will work for you now.

I changed things up quite a bit too.

Globbed the data-files before reading them into a variable with a loop.

Worked the control-file from disk instead of reading it into a variable.

Still sub-1-second run-times on my system.

#! /usr/bin/env perl
	use v5.010; use strict; use warnings;
#--------------------------------------------------------------------------------------
# Auth: Christopher Stone <scriptmeister@thestoneforge.com>
# dCre: 2015/07/12 18:00
# dMod: 2015/07/13 21:48
# Appl: Perl & App to run Perl
# Task: Filter lines out of control file if ip-address is not found in data-file-set.
# Tags: @Perl, @Script, @Filter, @IP-Address
# Vers: 1.50
#--------------------------------------------------------------------------------------
# Notes:
#
# Written for user lotr on MacScripter.net
#--------------------------------------------------------------------------------------

my (@files, @NewFileData, $AppleScript, $ControlFilePath, $DataFileText, $fh, $file, $MyDir, $OutputFilePath, $RegExStr);

$AppleScript = 'osascript -e "
	tell application \"Finder\" to set frontFolderAlias to insertion location as alias
	return POSIX path of frontFolderAlias
"';
$MyDir           = `$AppleScript`;
chomp $MyDir;

$OutputFilePath  = $MyDir.'A-Filtered.txt';
@files = glob("\"$MyDir\"".'[ABCD].txt');
$ControlFilePath = $files[0];
shift(@files);
$DataFileText = "";

foreach $file (@files) {
 	open($fh, '<', $file) or die "Could not open file: $file $!";
 	{
		local $/;
		$DataFileText = $DataFileText.<$fh>."\n";
	}
	close($fh);
}

open($fh, '<', $ControlFilePath) or die "Could not open file $ControlFilePath $!";

while (<$fh>) {
	chomp;
	if ( m!^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b! ) {
		$RegExStr = '(?m)^'.quotemeta($&).'\b';

		if ( $DataFileText =~ m!$RegExStr! ) {
			push(@NewFileData, $_);
		}
	}
}

open($fh, '>', $OutputFilePath) or die "Could not open file '$OutputFilePath' $!";
$, = "\n";
print $fh @NewFileData;
close $fh;

This now works. :wink:
Unfortunately I must be lacking something on my systems because AS refuses to compile the Perl version.
Syntax error: Expected end of line, etc. but found identifier. And highlights v5 from the 2nd line of use v5.010; use strict; use warnings;
But as Yvan said, it works fine on Yosemite & obviously on Mav.

I didn’t ran the Perl script from Script Editor.
I opened it in TextWrangler and then triggered the menu item #! > Run (as it was written in the first message delivering the code).

Yvan KOENIG running Yosemite 10.10.4 (VALLAURIS, France) mardi 14 juillet 2015 09:59:01

Oops. Yes the perl script works fine in SL & took a second to finish! :slight_smile: The previous version perl script also works on SL.
Now I see why everyone recommends TextWrangler. Super fast, particularly the perl script!

A huge thanks for all your time and effort Chris and also the above contributors for your excellent scripts.
This will reduce multiple hours or days of laborious searching through docs into a fraction of time. :smiley:

Model: MacPro
Browser: Firefox 33.0
Operating System: Mac OS X (10.6)