[Solved] Can't create a folder to place the downloaded images in it using python (Script to download images from a webpage)



  • Thanks Ranjith. You are always an ideal preceptor. Btw, I kinda bypassed the problem and got the output. I don't know whether I should practice this.

    import requests
    import os
    import urllib.request
    from lxml import html
    
    def Startpoint():
        url = "https://www.aliexpress.com/"
        response = requests.get(url)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//div[@class="item-inner"]')
        for title in titles:
            Pics="https:" + title.xpath('.//span[@class="pic"][email protected]')[0]
            os.chdir("E:\\images\\")
            urllib.request.urlretrieve(Pics, Pics.split('/')[-1])
    
    Startpoint()


  • But, if i try the same with this site "https://www.yify-torrent.org/search/1080p/", i get an error showing: [raise HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError: HTTP Error 403: Forbidden]. I tried like this:

    import requests
    import urllib.request
    import os
    from lxml import html
    
    def PictureScraping():
        url = "https://www.yify-torrent.org/search/1080p/"
        response = requests.get(url)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//div[@class="movie-image"]')
        folder = 'E:\\images\\'
        for title in titles:
            image_url = "https:" + title.xpath([email protected]')[0]
            file = folder + image_url.split('/')[-1]
            urllib.request.urlretrieve(image_url,file)
    
    PictureScraping()

  • administrators

    The server is blocking your request because they know that it's not a request from a normal browser and someone is trying to scrape their content. They are probably using sessions or cookies to secure it. You should bluff the request as if it was made from a real browser.



  • Thanks Ranjith for your invaluable reply. I set parameter to make it look like a real browser but that did not do the trick. However, it does bring results if i make second request.

    import requests
    import urllib.request
    import os
    from lxml import html
    
    def PictureScraping():
        url = "https://www.yify-torrent.org/search/1080p/"
        response = requests.get(url)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//div[@class="movie-image"]')
        for title in titles:
            image_url = "https:" + title.xpath([email protected]')[0]
            image_name = image_url.split('/')[-1]
            response = requests.get(image_url)
            os.chdir('E:\\image\\')
            with open(image_name, 'wb') as f:
                for chunk in response.iter_content(1024):
                    f.write(chunk)
    
    PictureScraping()


  • And this is the way you showed me:

    import requests
    import urllib.request
    import os
    from lxml import html
    
    def PictureScraping():
        url = "https://www.yify-torrent.org/search/1080p/"
        response = requests.get(url)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//div[@class="movie-image"]')
        folderName = "E:\\images\\"
        for title in titles:
            image_url = "https:" + title.xpath([email protected]')[0]
            file = folderName + image_url.split('/')[-1]
            response = requests.get(image_url)
            with open(file, 'wb') as f:
                for chunk in response.iter_content(1024):
                    f.write(chunk)
    
    PictureScraping()


  • One last question related working with python. I've written a script to parse the title of 248 videos from wiseowl.com. The crawler is now parsing 134 videos. There are thirteen categories out there. Each category contains list of videos.

    However, when the list exceeds 20 then the page has pagination and the rest of the contents displayed in the next page. So, if any category contains 60 videos, it will take 3 pages to display it's full content.

    If any category contains 20 or less videos then it gets displayed in a single page and no pagination option is there.

    My script parses the video titles of that category which have more than 20 videos I meant, which have pagination option. If any category doesn't have pagination option (less than 20 videos) my script skips that page.

    If i comment out the "Midpoint" portion from my script and run it then it fetches the titles from the first page of each category.

    "Midpoint" in my script is for pagination links. I think "If statement" or something similar can fix this issue and lead the script scrape video titles irrespective of the page has pagination or not. Would be very happy if you take a look into it and tell me what i'm missing. Here is the code I tried with: Site Link: "http://www.wiseowl.co.uk/videos/"

    import requests
    from lxml import html
    
    url="http://www.wiseowl.co.uk/videos/"
    def Startpoint(links):
        slnk="http://www.wiseowl.co.uk"
        response = requests.get(links)
        tree = html.fromstring(response.text)
        titles = tree.xpath("//ul[@class='woMenuList']")
        for title in titles:
            Names=title.xpath(".//li[@class='woMenuItem'][email protected]")
            for name in Names:
                if not "year" in name and not "author" in name:
                    page=slnk + name
                    Midpoint(page)
    
    def Midpoint(fullurl):
        slnk="http://www.wiseowl.co.uk"
        response = requests.get(fullurl)
        tree = html.fromstring(response.text)
        titles = tree.xpath("//div[contains(concat(' ', @class, ' '), ' woPaging ')]")
        for title in titles:
            Names=title.xpath('.//a[@class="woPagingItem"][email protected]')
            for name in Names:
                page=slnk + name
                Endpoint(page)
    
    def Endpoint(pageurl):
        response = requests.get(pageurl)
        tree = html.fromstring(response.text)
        titles = tree.xpath('//p[@class="woVideoListDefaultSeriesTitle"]')
        for title in titles:
            vids=title.xpath('.//a/text()')
            for vid in vids:
                print(vid)
    Startpoint(url)


  • On the other hand if i try like this it scrapes the exact 248 videos. But i wanted to get to the links step by step as i did in my earlier post.

    import requests
    from lxml import html
    
    def Startpoint(mpage):
        default="http://www.wiseowl.co.uk"
        page=1
        while page<=mpage:
            link="http://www.wiseowl.co.uk/videos/default-"+str(page)+".htm"
            response = requests.get(link)
            tree = html.fromstring(response.text)
            titles = tree.xpath('//p[@class="woVideoListDefaultSeriesTitle"]')
            for title in titles:
                Names = title.xpath('.//a/text()')[0]
                links = default + title.xpath([email protected]')[0]
                mash = (Names ,  links)
                print(mash) 
            page+=1
    
    Startpoint(14)


  • Got it resolved finally. Here is the full code.

    import requests
    from lxml import html
    
    unique=[]
    url="http://www.wiseowl.co.uk/videos/"
    def startpoint(links):
        slnk="http://www.wiseowl.co.uk"
        response = requests.get(links)
        tree = html.fromstring(response.text)
        titles = tree.xpath("//ul[@class='woMenuList']//li[@class='woMenuItem'][email protected]")
        for title in titles:
            if not "year" in title and not "author" in title:
                page=slnk + title
                midpoint(page)
    
    def midpoint(fullurl):
        slnk="http://www.wiseowl.co.uk"
        response = requests.get(fullurl)
        unique.append(fullurl)
        tree = html.fromstring(response.text)
        names = tree.xpath('//p[@class="woVideoListDefaultSeriesTitle"]//a/text()')
        for name in names:
            print(name)
    
        items = tree.xpath("//div[contains(concat(' ', @class, ' '), ' woPaging ')]//a[@class='woPagingItem'][email protected]")
        for item in items:
            page=slnk + item
            if page not in unique:
                midpoint(page)
    
    startpoint(url)


  • Dear Ranjith, could you provide me with a link following which i can learn "Scripting Dictionary" stuff in vba?


  • administrators

    There isn't really anything to learn about scripting dictionary. It's just a key value pairs object.
    http://stackoverflow.com/documentation/vba/3667/scripting-dictionary-object#t=201705202049493519399


Log in to reply
 

Looks like your connection to Codingislove Forum was lost, please wait while we try to reconnect.