More regex questions?

Just as a point of information, the \s metacharacter matches a space but also matches a tab and various line endings. In some instances these can be used interchangeably but not always.

I have inserted spaces in the text after M1, A34, and A1, and the following is the result with technomorph’s pattern:

The following is identical except that I have replaced the last \s with a literal space:

I don’t know which dbrewood wants, but it’s good to be aware of the difference.

Wow, thanks guys you are flipping awesome.

I guess as I want to handle as per the second option? In reality, I’d actually be removing them completely so replacing with ‘null’.
I’ve tried to copy out the suggested regex but I’m missing something as it’s not working with respect to removing the ‘Toll’ in Toll road designation. Also the preceding spaces are still in place.(as far as I can see - poor eyesight)

In the final shortcut the .$$$. Marker will be replaced with ‘null’ to eradicate the road codes completely.

Hoping that all makes sense? Very much getting there though!

@peavine is here any chance you could post the shortcut?

I’ve included the second shortcut below:

Test.shortcut (21.7 KB)

BTW, if a literal space is hard to see, you might consider using the \h horizontal whitespace metacharacter. Also, I don’t think you need to escape a comma.

Many thanks, I will start to get this lot understood one day :slight_smile:

I’ll put it in place now :slight_smile:

I think / hope that this is the last of the regex changes, the rest I need to work with is more regular shortcut work.

Right… I’ve just done some testing… Renamed shortcut below)"

contains ROAD CODE 7 FINAL.shortcut (21.9 KB)

As you can see here entries such as road name followed my a comma space leave the command and space behind,.

I’m quite happy to do another replace on ', ’ if needed, but I thought it worth a final mention… :slight_smile:

You just need to add the comma to the end of the pattern just before the space. The horizontal whitespace metacharacter is easier to see then a literal space, so my suggestion would be:

(?m)^,?\h?[ABCDHMUQV]\d{1,4}(?:(M)|\h?[Tt]oll)?,?\h?

It may well forgetting about this, I’ve just come up with another ‘issue’ where the road name comes at the end of a Street as you can see here for the Alnwick Garden:

I can see this going on for a long time. It seems it’s not that easy to identify and delete the ‘road code’ portion of a string

contains ROAD CODE 7 FINAL.shortcut (22.0 KB)

Feel free to forget about it and I’ll carry on manually editing the data prior to posting.

We crossed posts there… Your prior post did fix the ‘Wroxeter’ issue :slight_smile:

Would it be easier to use the regex just to identify the ‘road code’ itself and then I’ll deal with the remnant commas and spaces etc?

That might be a workable approach. The following is on a proof-of-concept basis, and it removes whitespace at the beginning and end of each line of the string, a comma followed by a whitespace within the string, and then uses Nigel’s regex pattern to remove the road codes. A Replace Text action only takes a few milliseconds to run, so having two such actions should not be a concern.

contains ROAD CODE 7 FINAL.shortcut (22.1 KB)

It could well indeed work but it’d still need to take care of roads like ‘The Alnwick Garden, B1340’ where the road code is at the end?
Is this possible?