More regex questions?

Just as a point of information, the \s metacharacter matches a space but also matches a tab and various line endings. In some instances these can be used interchangeably but not always.

I have inserted spaces in the text after M1, A34, and A1, and the following is the result with technomorph’s pattern:

The following is identical except that I have replaced the last \s with a literal space:

I don’t know which dbrewood wants, but it’s good to be aware of the difference.

Wow, thanks guys you are flipping awesome.

I guess as I want to handle as per the second option? In reality, I’d actually be removing them completely so replacing with ‘null’.
I’ve tried to copy out the suggested regex but I’m missing something as it’s not working with respect to removing the ‘Toll’ in Toll road designation. Also the preceding spaces are still in place.(as far as I can see - poor eyesight)

In the final shortcut the .$$$. Marker will be replaced with ‘null’ to eradicate the road codes completely.

Hoping that all makes sense? Very much getting there though!

@peavine is here any chance you could post the shortcut?

I’ve included the second shortcut below:

Test.shortcut (21.7 KB)

BTW, if a literal space is hard to see, you might consider using the \h horizontal whitespace metacharacter. Also, I don’t think you need to escape a comma.

Many thanks, I will start to get this lot understood one day :slight_smile:

I’ll put it in place now :slight_smile:

I think / hope that this is the last of the regex changes, the rest I need to work with is more regular shortcut work.

Right… I’ve just done some testing… Renamed shortcut below)"

contains ROAD CODE 7 FINAL.shortcut (21.9 KB)

As you can see here entries such as road name followed my a comma space leave the command and space behind,.

I’m quite happy to do another replace on ', ’ if needed, but I thought it worth a final mention… :slight_smile:

You just need to add the comma to the end of the pattern just before the space. The horizontal whitespace metacharacter is easier to see then a literal space, so my suggestion would be:

(?m)^,?\h?[ABCDHMUQV]\d{1,4}(?:(M)|\h?[Tt]oll)?,?\h?

It may well forgetting about this, I’ve just come up with another ‘issue’ where the road name comes at the end of a Street as you can see here for the Alnwick Garden:

I can see this going on for a long time. It seems it’s not that easy to identify and delete the ‘road code’ portion of a string

contains ROAD CODE 7 FINAL.shortcut (22.0 KB)

Feel free to forget about it and I’ll carry on manually editing the data prior to posting.

We crossed posts there… Your prior post did fix the ‘Wroxeter’ issue :slight_smile:

Would it be easier to use the regex just to identify the ‘road code’ itself and then I’ll deal with the remnant commas and spaces etc?

That might be a workable approach. The following is on a proof-of-concept basis, and it removes whitespace at the beginning and end of each line of the string, a comma followed by a whitespace within the string, and then uses Nigel’s regex pattern to remove the road codes. A Replace Text action only takes a few milliseconds to run, so having two such actions should not be a concern.

contains ROAD CODE 7 FINAL.shortcut (22.1 KB)

It could well indeed work but it’d still need to take care of roads like ‘The Alnwick Garden, B1340’ where the road code is at the end?
Is this possible?

I downloaded and installed a fresh copy of the shortcut. It did remove B1340 from the end of The Alnick Garden and inserted .$$$. in its place. This can be seen in the screenshot above. I added another similar entry with the road code at the end, and it was also removed and replaced as expected.

This is so weird, I see the same as well myself after redownloading:


I’ll look to put it in place tomorrow.
Many thanks as always!

Hence adding the (?m) option.
Marking starts and end on match day line endings

That is because as
I explained before your pattern all of your patterns are especially group 1, is OPTIONAL.

You have a pattern that will always match

Learn about look ahead and look behind match conditions, both positive and negative.

Learn about capturing your address into a group…. You can scan and ignore everything else ie .*(whatyouwanttocapture)

Thanks… Appreciated.

I put the new code in place today and noticed some weirdness. When run over a street of:

Preston Mill, Preston Road

It changed it to:

Preston MillPreston Road
That ism in addition to the ‘road code’ removal it seems to have stripped the ‘comma space’ out of it. I duplicated this in the revised test shortcut attached.

ROAD CODE 10 FINAL.shortcut (22.2 KB)
@peavine Did I do something wrong?

That happened because the first regex removed all instances of a comma followed by a space. I’ve edited the first regex to remove a comma-space only at the beginning of each line of the string. I also edited the second regex to remove a comma-space after a road name.

After testing the above, I found that a comma-space was left after retained text (like the Alnwick Garden), so I added a comma-space before the road name to the regex. Everything in the Text action now seems to work OK.

ROAD CODE 10 FINAL.shortcut (22.2 KB)

BTW, did you upgrade to macOS Tahoe RC1. I did and everything seems to be working well. :slight_smile:

Here’s some great resources for learning.

I use RegExKit all the time.
It allows you to work on your RegEx’s and see immediate results… lots of great hints and shows you what’s capturing etc. and code generation for different languages (I use PHP generation for AppleScript / Objective C as it escapes everything properly

This site is awesome for explaining more advanced topics and even simple ones
Also helpful for making your RegExs more efficient.

Look ahead/behind

https://https//www.regular-expressions.info//lookaround.html

Thanks guys all the effort is appreciated. I’ll put the revisions in place later and test. Many thanks.