I posted this earlier to the TextWrangler list, and thought I’d post here as well (I can use all the help I can get).
I’ve been using Applescript and TextWrangler to massage large volumes of XML files, and it’s working very well. I just hit a new problem, though, while trying to massage some large MIF text files and was wondering if I should be concentrating more on figuring out Grep more vigorously or counting more on Applescript to handle things. Knowing where to start will (hopefully) save me a huge amount of valuable time.
In a nutshell: I need to scour a large MIF containing marker tags of various types inside Paragraph and ParaLine tags and, depending on what type of markers they are, move them to the beginning or end of <Paraline tags, allowing in some cases for a <TextRectID ###> between the start of the <Paraline and the start of the <Marker tag.
The following example contains two Markers; the first, a ‘Cross-Ref’, is located correctly, while the second, a ‘Type 18’, should be moved to just before the closing of the ParaLine tag:
Before:
<Para
<Unique 1009287>
<PgfTag HeadP18'> <PgfNumString
5.1’>
<ParaLine
<TextRectID 581>
<Marker
<MType 9>
<MTypeName Cross-Ref'> <MText
SPECIFICATIONS’>
<MCurrPage 1'> <Unique 1026941> > # end of Marker <String
SP’>
<String E'> <Marker <MType 18> <MTypeName
Type 18’>
<MText Specifications'> <MCurrPage
1’>
<Unique 1026948>
> # end of Marker
<String CIFICATIO'> <String
N’>
<String `S’>
end of ParaLine
end of Para
After:
<Para
<Unique 1009287>
<PgfTag HeadP18'> <PgfNumString
5.1’>
<ParaLine
<TextRectID 581>
<Marker
<MType 9>
<MTypeName Cross-Ref'> <MText
SPECIFICATIONS’>
<MCurrPage 1'> <Unique 1026941> > # end of Marker <String
SP’>
<String E'> <String
CIFICATIO’>
<String N'> <String
S’>
<Marker
<MType 18>
<MTypeName Type 18'> <MText
Specifications’>
<MCurrPage `1’>
<Unique 1026948>
> # end of Marker
end of ParaLine
end of Para
I’ve been having great luck with using AS with TextWrangler’s grep searches, but I haven’t done anything this complex (search,-test and-move- if-appropriate rather than search-and-replace). The files can run as high as 100,000 to 200,000 lines, though they’re short lines. I have a large batch of these files coming in around the end of the month, and it would be really great if I could do most of the cleaning using Applescript (or some other scripting language, if needed… and relatively easy to get up to speed on) instead of moving the markers and such manually.
As always, any help is greatly appreciated.