Help - Search - Members - Calendar
Full Version: Request For Perl Help
BleepingComputer.com > Software > Programming
   
Diogenes
I'm writing a perl script to pull the data from XML tags and list it in a program. I have posted the code below.

CODE
sub get_TagData {
    if ($file[$x]=~/(<.*>)/) {
        $_=$file[$x];
        /(<\/w:t>)(.*?)(<w:t>)/ig; #the group of formatting tags begins with "</w:t>" and ends with "<w:t>"; [b]this is line #4[/b]
        $tagset=$1.$2.$3;
        push (@tagset, $tagset);
    }
}


The XML document I'm working with is made up of two lines (one with the filename, and one with the content), which are read into two array elements in the array @file. Initially, line 4 contained
CODE
/(<\/w:t>)(.*)(<w:t>)/i;
, but in addition to giving me all of the tags I was looking for, it also gave me all of the text of the XML document in between the first and last tag sets. I don't mind having the text in there, but the problem was that the program would then push all of the tag sets into one array element, at which point I could not deal with it. I then added the ? operator to the center search element (as it is in the fully shown subroutine above). This solved the problem of having multiple tag sets in the array, but I then ended up with only one element in the array. I tried adding g operator to get it to search globally, but it wouldn't. Is there any way to take a string (since that's what I'm essentially working with) and return multiple instances of a pattern? I appreciate any help.
groovicus
I edited your post to remove the font tags. I'm sorry, but it was distracting me from understanding your problem.

One thing I was noticing about your tags is that they seem to be reversed. Your opening tags have a backslash, and your close tags do not. Was that a typo? Shouldn't that be the other way around?

Then to try and clarify your issue. You are parsing an xml file in which you are trying to pull the data from between specified tags. Your xml file will have multiple occurances of the specified tags? And you only want the data from in between the tags, in order to build an array of just that data? Any chance you could post a snippet of the data?

I'm a bit noobish at Perl myself, but I am starting to build a deep appreciation of what it can do. At any rate, I would be glad to help, I am just not so sure I am understanding your issue.
Diogenes
1) No, the reversal of the tags was not a typo. The data I want to process in my program appears outside of the "text", which is what the "<w:t>" and "</w:t>" tags are marking. To give an example of what I'm working with:
CODE
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed mauris leo, suscipit ac, aliquet eu, varius non, odio. Sed ac felis non sem blandit venenatis. Suspendisse commodo dictum lorem. </w:t></w:r><w:r w:rsidR="00AC3608"><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="FF0000"/></w:rPr><w:t>Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer feugiat justo vitae lectus vulputate elementum. Maecenas in felis. Donec venenatis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos.

(except that there are multiple sets of tags.) The tags contain formatting information that I want to rearrange into another array. Right now, I have a for loop that looks like this:
CODE
(@tagset[0..256], $last) = /(<\/w:t>.*?<w:t.*?>)/ig;
        for ($x=0; $x<=$#tagset; $x++) {
            if (!$tagset[$x]) {
                splice (@tagset, $x, $x);
            }
        }

Which is a bit sloppy, but it gets the job done until there are over 257 sets of tags in the document. Is there a cleaner way to do this?
groovicus
Ok, you want to build an array of attributes.So in this tag
CODE
<w:r w:rsidR="00AC3608">

you want to pull out w:rsidR="00AC3608"..

Am I now correctly understanding? (Still have not had enough coffee this morning). smile.gif
Diogenes
Yes, that was right. Actually, I've solved that problem by sticking the get tagsets process inside a "while" loop, and it works fairly well. I'm almost finished with the program, and there's another big problem: to run the program, I need to be able to unzip one file, modify the contents, and zip it into another file. I've gone to CPAN and tried getting the appropriate module, but the modules that I got somehow didn't agree with each other, and I don't have time to sort out which module is which without downloading them all onto my computer (another bad idea.) Using only the File::Path module, what Perl commands can I use to unzip and rezip the files?

Thanks for your help so far.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.