Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Request For Perl Help


  • Please log in to reply
4 replies to this topic

#1 Diogenes

Diogenes

  • Members
  • 19 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:09:10 AM

Posted 14 July 2006 - 12:39 PM

I'm writing a perl script to pull the data from XML tags and list it in a program. I have posted the code below.

sub get_TagData {
	if ($file[$x]=~/(<.*>)/) {
		$_=$file[$x];
		/(<\/w:t>)(.*?)(<w:t>)/ig; #the group of formatting tags begins with "</w:t>" and ends with "<w:t>"; [b]this is line #4[/b]
		$tagset=$1.$2.$3;
		push (@tagset, $tagset);
	}
}

The XML document I'm working with is made up of two lines (one with the filename, and one with the content), which are read into two array elements in the array @file. Initially, line 4 contained
/(<\/w:t>)(.*)(<w:t>)/i;
, but in addition to giving me all of the tags I was looking for, it also gave me all of the text of the XML document in between the first and last tag sets. I don't mind having the text in there, but the problem was that the program would then push all of the tag sets into one array element, at which point I could not deal with it. I then added the ? operator to the center search element (as it is in the fully shown subroutine above). This solved the problem of having multiple tag sets in the array, but I then ended up with only one element in the array. I tried adding g operator to get it to search globally, but it wouldn't. Is there any way to take a string (since that's what I'm essentially working with) and return multiple instances of a pattern? I appreciate any help.
-WR, child of DOS & Windows 3.1
Win98SE, XP Pro, Vista, Red Hat 8.0, Red Hat Enterprise 4.0

"Strive at all times to bend, fold, spindle, and mutilate."

BC AdBot (Login to Remove)

 


#2 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:09:10 AM

Posted 14 July 2006 - 06:24 PM

I edited your post to remove the font tags. I'm sorry, but it was distracting me from understanding your problem.

One thing I was noticing about your tags is that they seem to be reversed. Your opening tags have a backslash, and your close tags do not. Was that a typo? Shouldn't that be the other way around?

Then to try and clarify your issue. You are parsing an xml file in which you are trying to pull the data from between specified tags. Your xml file will have multiple occurances of the specified tags? And you only want the data from in between the tags, in order to build an array of just that data? Any chance you could post a snippet of the data?

I'm a bit noobish at Perl myself, but I am starting to build a deep appreciation of what it can do. At any rate, I would be glad to help, I am just not so sure I am understanding your issue.

#3 Diogenes

Diogenes
  • Topic Starter

  • Members
  • 19 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:09:10 AM

Posted 17 July 2006 - 10:17 AM

1) No, the reversal of the tags was not a typo. The data I want to process in my program appears outside of the "text", which is what the "<w:t>" and "</w:t>" tags are marking. To give an example of what I'm working with:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed mauris leo, suscipit ac, aliquet eu, varius non, odio. Sed ac felis non sem blandit venenatis. Suspendisse commodo dictum lorem. </w:t></w:r><w:r w:rsidR="00AC3608"><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/><w:color w:val="FF0000"/></w:rPr><w:t>Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer feugiat justo vitae lectus vulputate elementum. Maecenas in felis. Donec venenatis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos.
(except that there are multiple sets of tags.) The tags contain formatting information that I want to rearrange into another array. Right now, I have a for loop that looks like this:
(@tagset[0..256], $last) = /(<\/w:t>.*?<w:t.*?>)/ig;
		for ($x=0; $x<=$#tagset; $x++) {
			if (!$tagset[$x]) {
				splice (@tagset, $x, $x);
			}
		}
Which is a bit sloppy, but it gets the job done until there are over 257 sets of tags in the document. Is there a cleaner way to do this?
-WR, child of DOS & Windows 3.1
Win98SE, XP Pro, Vista, Red Hat 8.0, Red Hat Enterprise 4.0

"Strive at all times to bend, fold, spindle, and mutilate."

#4 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:09:10 AM

Posted 17 July 2006 - 10:34 AM

Ok, you want to build an array of attributes.So in this tag
<w:r w:rsidR="00AC3608">
you want to pull out w:rsidR="00AC3608"..

Am I now correctly understanding? (Still have not had enough coffee this morning). :thumbsup:

#5 Diogenes

Diogenes
  • Topic Starter

  • Members
  • 19 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:09:10 AM

Posted 20 July 2006 - 12:54 PM

Yes, that was right. Actually, I've solved that problem by sticking the get tagsets process inside a "while" loop, and it works fairly well. I'm almost finished with the program, and there's another big problem: to run the program, I need to be able to unzip one file, modify the contents, and zip it into another file. I've gone to CPAN and tried getting the appropriate module, but the modules that I got somehow didn't agree with each other, and I don't have time to sort out which module is which without downloading them all onto my computer (another bad idea.) Using only the File::Path module, what Perl commands can I use to unzip and rezip the files?

Thanks for your help so far.
-WR, child of DOS & Windows 3.1
Win98SE, XP Pro, Vista, Red Hat 8.0, Red Hat Enterprise 4.0

"Strive at all times to bend, fold, spindle, and mutilate."




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users