Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Save all links under a domain's subpages


  • Please log in to reply
2 replies to this topic

#1 wbpwns

wbpwns

  • Members
  • 2 posts
  • OFFLINE
  •  
  • Local time:09:37 AM

Posted 21 June 2016 - 04:23 AM

Hi, I am new here. I have a bit of a project going on and I am trying to find a way to do this.

 

Here is the situation:

 

There is a website for example website.com

 

There are sub pages of course. One of the main subpages is website.com/events

 

Under that page, there are many subpages

website.com/events/1

website.com/events/2

website.com/events/3

and so on,

 

each of these show 20 actual event links (like a google search page shows x number of results per page, that's the idea)

 

I want to copy all event links under the domain website.com/events/1 to website.com/events/X<--last number  in a (.txt file) notepad file one event per line and not downloading any actual content from the website (just literally need the links) and also ignoring all other subdomains and subpages from website.com

 

Is this possible using any tool?

 

If this was made in the wrong section, I apologize and feel free to move it to correct section.

 

Thanks in advance.



BC AdBot (Login to Remove)

 


#2 Guest_hollowface_*

Guest_hollowface_*

  • Guests
  • OFFLINE
  •  

Posted 22 June 2016 - 02:25 PM


I want to copy all event links under the domain website.com/events/1 to website.com/events/X<--last number  in a (.txt file) notepad file one event per line


Is this possible using any tool?

 

You'll need to create a script of some kind that downloads each webpage, then extracts the links from it, and saves the results in a text file. How you do this will depend on the website because you'll need to examine the pages to find something you can use to identify the links you want, and what tools/features your script relies on. I don't have the knowhow to do this, so I can't help, perhaps another member can. If not, you could try asking on http://stackoverflow.com/ , but make sure to read their site rules first.

 

I assume you'll prolly want to write this in Powershell. Perhaps these links will get you headed in the right direction:

- https://msdn.microsoft.com/en-us/library/ez801hhe%28v=vs.110%29.aspx

- http://ss64.com/ps/select-string.html

- http://ss64.com/ps/get-content.html



#3 wbpwns

wbpwns
  • Topic Starter

  • Members
  • 2 posts
  • OFFLINE
  •  

Posted 23 June 2016 - 12:46 AM

Thanks for the reply.

 

After much searching, I found the correct technical term for what I need. It seems to fall under "Web Scraping".

 

I tried several web scraper softwares with free functionality and I already achieved what I needed.

 

P.S. I am not a programmer/coder. was actually asking for a program (in the first post)


Edited by wbpwns, 23 June 2016 - 12:47 AM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users