How: Regular Expression to insert string

khannan

Newcomer
Joined
Dec 28, 2004
Messages
1
I have some relative URL's like the followings:

<a href="/some/folder/index.html">Sports</a>
<a href="some2/folder2/default.htm">Weather</a>

What I want to do:

<a href="http://www.domain.com/some/folder/index.html">Sports</a>
<a href="http://www.domain.com/some2/folder2/default.htm">Weather</a>

Basically, I want to insert the domain name at some index. I can match the regular expression without any problem and I did not want to use the groupping in regular expression, because then I have to use a while loop. So I wanted to use the regular expression replace function to enter the domain name in C#.

This is something I have used in the past -

(?<regSRC>href=[^"']*["'])

Then I could easily replace the entire text in C# with a different one using ${regSRC} variable. I wanted to use a similar trick for this solution - Can anyone help?

Thanks.
 
Simple answer?

khannan said:
I have some relative URL's like the followings:

<a href="/some/folder/index.html">Sports</a>
<a href="some2/folder2/default.htm">Weather</a>

What I want to do:

<a href="http://www.domain.com/some/folder/index.html">Sports</a>
<a href="http://www.domain.com/some2/folder2/default.htm">Weather</a>

You could search for

\<a href\=\"

and replace with

<a href="http://www.domain.com

but this may too simple. Is this what you are thinking or am I misunderstanding your question?
 
Richard Crist said:
You could search for

\<a href\=\"

and replace with

<a href="http://www.domain.com

but this may too simple. Is this what you are thinking or am I misunderstanding your question?

hi richard,
i think this will replace all url patters like
<a href="some/x.htm">...
<a href="www.somesite.com/x.htm">...
<a href="http://www.xtremedotnettalk.com



which is a wrong pattern match
i think apart from matching for <a href="
we should filter only those urls not starting with http|www
or atleat
the url doenst start with a literal which is he wants to insert the string

i just started working with reggies
may be i am wrong
 
You are correct

dev2dev said:
hi richard,
i think this will replace all url patters like
<a href="some/x.htm">...
<a href="www.somesite.com/x.htm">...
<a href="http://www.xtremedotnettalk.com



which is a wrong pattern match
i think apart from matching for <a href="
we should filter only those urls not starting with http|www
or atleat
the url doenst start with a literal which is he wants to insert the string

i just started working with reggies
may be i am wrong

You are correct. :cool:

My suggestion would do just as you said, so you have a good understanding of regex. My suggestion was based on my assumption (and you know what assume does) that all his candidate strings were of the form of his example, which did not show a www or http as part of the url. If his data does contain the www or http, then further analysis is warranted and attention to situations like you have brought up would have to be considered.

To handle situations you have brought up you could search for:

(\<a href\=\")([^hw][^tw][^tw])

and replace with:

\1http://www.domain.com\2

This says search for:

<a href="
followed by 3 characters where the first is not an h or w, the second and third are not t or w

This will find strings where the first three characters after the double quote are not htt and not www. Now....depending on the data this might also exclude some desirable strings like two.three and so forth. However, the search string above errs on the side of safety.

Parentheses in the search string allow reference to groups. The first parentheses is group one, the second is group 2, etc. This comes in handy in the replacement string. Using this ability the replacement string above inserts the desired string in between the two parenthetical groups in the search string.

Folks please comment on this, because there are many ways to accomplish regex things, all depending on data analysis and desired results, as we have seen by dev2dev's response. :cool:
 
Richard Crist said:
To which post are you referring?
the one which i post in response to you new regex. i.e., my post before the post which i posted yesterday.

i wrote very lenghty post, god... i cant write it now completly but, in short

the regex you gave in your previous post has some logical error
which skips urls like
<a href="wwwtutorial/chap1.htm">
<a href="http/basics.asp">

i think its better to skip all url which starts with http:// and https:// and ftp:// and www.

what do you say
 
Back
Top