Regular Expressions Topics include: System.Text.RegularExpressions classes, and .NET regular expression syntax.

Go Back  Xtreme .NET Talk > .NET > Regular Expressions > Parsing XML data using RegEx


Reply
 
Thread Tools Display Modes
  #1  
Old 10-25-2004, 10:39 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default Parsing XML data using RegEx

I have some data that I need to have stripped of all excess XML data. When I first volunteered to do this, I thought to myself, 'it's just excess text, how hard can that be?' oh how little did I know what RegEx is. I'm beating my head against a wall, and would just like someone to help get me going.

I followed along in this thread. But, I've had no such luck in figuring it out on my own without a little bit of coaxing.

So, here's my question, how do I strip out this excess data so that all I'm left with is the coach's name (it needs to be a variable for a combo box). So, that the coach of the team can make changes to his roster below his name. Or since the coach's name always stays the same, should I use RegEx.IsMatch() instead of trying to strip the excess data since it needs to be saved as XML anyway and sent back as a plain TXT file? So, many questions, that's why I'm coming to the gurus...!

and here's a sample of what my XML looks like....

Code:
- <team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">
- <roster>
  <player name="Jose Fuentes" pos="QB1" /> 
  <player name="JoJo Jones" pos="RB1" /> 
  <player name="Johnnie Vee" pos="RB2" /> 
  <player name="Tom Waddle" pos="WR1" /> 
  <player name="Sherman Deary" pos="WR2" /> 
  <player name="Dan Graham" pos="TE1" /> 
  <player name="John Hall" pos="K" /> 
  <player name="CHA" pos="DEF" /> 
  <player name="Wheeler Chandells" /> 
  <player name="Carlin Patton" /> 
  <player name="Sidney Iverson" /> 
  <player name="Nicky Santoro" /> 
  </roster>
  </team>
Reply With Quote
  #2  
Old 10-26-2004, 06:45 AM
HJB417's Avatar
HJB417 HJB417 is offline
Contributor

Preferred language:
c#, c++, j#
 
Join Date: Mar 2003
Location: Lowell, MA
Posts: 609
HJB417 is on a distinguished road
Default

Code:
(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>
Reply With Quote
  #3  
Old 10-26-2004, 08:38 AM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

Quote:
Originally Posted by HJB417
Code:
(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>

Thank you.....I'll try it out when I get to the house....
A.
Reply With Quote
  #4  
Old 10-26-2004, 11:23 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

again...no luck
any clue as to what I'm doing wrong...?

Code:
    Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String()
        Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase)
        Dim Matchs As MatchCollection = myRegExp.Matches(mytext)
        Dim currentMatch As Match

        Dim matchedValues As New ArrayList()


        For Each currentMatch In Matchs
            Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures
            Dim currentItem As Capture
            For Each currentItem In myCaptures
                matchedValues.Add(currentItem.Value)
            Next

        Next

        Return CType(matchedValues.ToArray(GetType(String)), String())
    End Function


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim myPattern As String = "\<team name="Sacramento"\(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim myText As String = "<team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">"
        Dim oneValues() As String = ReturnValues(myPattern, myText, "coach1")
        'Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo")
    End Sub
TIA,
A.
Reply With Quote
  #5  
Old 10-27-2004, 06:01 AM
HJB417's Avatar
HJB417 HJB417 is offline
Contributor

Preferred language:
c#, c++, j#
 
Join Date: Mar 2003
Location: Lowell, MA
Posts: 609
HJB417 is on a distinguished road
Default

Code:
    
Public Function ExplicitTest() As ArrayList()
        Dim input As String = "<team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">"
        ' THIS IS NOT THE REGEX I GAVE YOU
        ' Dim pattern As String = "\<team name="Sacramento"\(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim pattern As String = "(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim matches As MatchCollection = Regex.Matches(input, regex)
        Dim captures As New ArrayList(matches.Count)

        For Each match In matches
            captures.Add(match.Result("${coach}"))
        Next

        Return captures
    End Function
I dunno vb but that example should work. here's a c# example

Code:
ArrayList FooBar(string input)
{
	string pattern = "(?i)<team[^>]+coach=\"(?<coach>[^\"]*)\"[^>]*>";
	MatchCollection matches = Regex.Matches(input, pattern);
	ArrayList captures = new ArrayList(matches.Count);
	foreach(Match match in matches)
		captures.Add(match.Result("${coach}"));
	return captures;
}
Reply With Quote
  #6  
Old 10-27-2004, 07:43 AM
fenris fenris is offline
Newcomer
 
Join Date: Sep 2002
Location: Canada
Posts: 22
fenris is on a distinguished road
Default

Why not create a class that loads that particular xml structure, then write it out as a differenct xml file with only the nodes and attributes that are required?
Reply With Quote
  #7  
Old 10-27-2004, 12:26 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

Quote:
Originally Posted by fenris
Why not create a class that loads that particular xml structure, then write it out as a differenct xml file with only the nodes and attributes that are required?

Fen, I'm completely open to suggestions.....I thought I knew VB, until I found out what regular expressions were. Whew!! They've thrown me for a loop
Reply With Quote
  #8  
Old 10-27-2004, 12:41 PM
fenris fenris is offline
Newcomer
 
Join Date: Sep 2002
Location: Canada
Posts: 22
fenris is on a distinguished road
Default

I hear that!

Regular expressions are entirely different langauge that were designed to parse text expression very well. I don't think that you need to use them for you particular circumstances.

I would use a couple of classes like this:

Code:
Public Class Team
    Private _Name As String
    Private _CoachName As String
    Private _ID As String
    Private _Picture As String 'could be an IMAGE object as well
    Private _NameAbbreviation As String
    Private _Email As String
    Private _Players As Collection

    Public Property Players() As Collection
        Get

        End Get
        Set(ByVal Value As Collection)

        End Set
    End Property

    Public Property Email() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property NameAbbreviation() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property Picture() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property ID() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property CoachName() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property Name() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

End Class

Public Class Player
    Private _Name As String
    Private _Position As String

    Public Property Position() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property
    Public Property Name() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property
End Class
Then I would load the xml into vb and create the collections from there. Once the classes are created, you can then create the new xml files any way you please. You can also output the data to text.

Here is an example to get you started.
Reply With Quote
  #9  
Old 10-27-2004, 01:29 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

THANK YOU!!! I'll definitely look at that when I get to the house tonight....and btw, you might find this really funny....this is for an online football league I'm in, my team name is The Fenris Wolves.

A.
Reply With Quote
  #10  
Old 10-27-2004, 01:31 PM
fenris fenris is offline
Newcomer
 
Join Date: Sep 2002
Location: Canada
Posts: 22
fenris is on a distinguished road
Default




Regex is great for parsing text, I use it (well at least I try to use it) for parsing html tables from downloaded html source.
Reply With Quote
  #11  
Old 11-02-2004, 01:07 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

Jeez, Fen...I've tried 10 or 11 different ways to setup what you gave me, but no dice.....I can't even get it started... I hope it's not too much to ask for a little more help....(as my son would say) PEASE????? LOL

A.
Reply With Quote
  #12  
Old 11-02-2004, 01:08 PM
fenris fenris is offline
Newcomer
 
Join Date: Sep 2002
Location: Canada
Posts: 22
fenris is on a distinguished road
Default

What do you have so far?
Reply With Quote
  #13  
Old 11-02-2004, 01:39 PM
mj0lnr mj0lnr is offline
Newcomer
 
Join Date: Oct 2004
Posts: 7
mj0lnr is on a distinguished road
Default

Quote:
Originally Posted by fenris
What do you have so far?

Fen, I got so frustrated, I scrapped everything
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing a file with RegEx AFterlife General 0 09-14-2006 02:20 PM
xml parsing kelphis Database / XML / Reporting 1 07-13-2006 06:11 PM
Parsing XML Data/File in C# rustyfancy Database / XML / Reporting 3 10-27-2003 01:22 PM
Parsing XML Data/File in C# rustyfancy Database / XML / Reporting 0 10-26-2003 08:55 PM

Advertisement:

Powered by liquidweb