[Solved] Help me get Image link from Craigslist



  • Hi Ranjith! Hope you are doing well. I have made a scraper to parse images from craigslist but it is not working at all. Definitely I'm doing something wrong. Hope you will take a look into this. Thanks in advance. Here is the faulty code.

    Sub Craigslist()
    Const URL = "https://newyork.craigslist.org/search/ata"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, posts As Object, topic As Object
    Dim i As Long, x As Long, y As Long
    x = 2
    http.Open "GET", URL, False
    http.send
    html.body.innerHTML = http.responseText
    Set topics = html.getElementsByClassName("result-image gallery")
    For Each topic In topics
        Cells(x, 1) = topic.getElementsByTagName("img")(0).src
        x = x + 1
    Next topic
    End Sub
    

    For your consideration, here goes the html element for that:

    <a href="/mnh/atq/6033903864.html" class="result-image gallery" data-ids="1:00l0l_auIVAPKuweh"><img alt="" class="" src="https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg">
                    <span class="result-price">$120</span>
            </a>
    



  • Ain't it possible to parse images using REGEX in this case?


  • administrators

    Craigslist uses client-side rendering here. It renders images using data-ids. You won't get the image element when you make an HTTP GET request. Use data-ids to get image source.

    Sample image source - https://images.craigslist.org/00l0l_auIVAPKuweh_300x300.jpg

    This is in the format of https://images.craigslist.org/{data-id}_{xDimension}x{yDimension}.jpg

    parse those data Ids in result-image element and create URLs in the above format.



  • Dear Ranjith, It's always a great pleasure to get informed about something new from your end. I kinda solved it performing two http requests. However, I was expecting to scrape images the way you described above but I'm not that advanced to follow you along until you stretch a helping hand. Thanks again.



  • Thanks Ranjith, You really are the gem. Following your instruction I have made it already.

    Here is the code:

    Sub Craigs()
    Const URL = "https://newyork.craigslist.org/search/ata"
    Const pref = " https://images.craigslist.org/"
    Const suff = "_300x300.jpg"
    Dim http As Object: Set http = CreateObject("MSXML2.serverXMLHTTP")
    Dim html As New HTMLDocument
    Dim topics As Object, posts As Object, topic As Object
    Dim str As Variant, i As Long, y As Long, x As Long
    x = 2
    http.Open "GET", URL, False
    http.send
    html.body.innerHTML = http.responseText
    str = Split(http.responseText, " class=""result-image gallery")
    y = UBound(str)
        For i = 1 To y
            On Error Resume Next
            Cells(x, 1) = Split(Split(str(i), " data-ids=""")(1), """")(0)
            Cells(x, 2) = pref & Split(Split(Split(Split(str(i), " data-ids=""")(1), """")(0), ":")(1), ",")(0) & suff
            x = x + 1
        Next i
    End Sub
    


  • By the way, have you worked with python?


  • administrators

    @shahin2137 You need not do all the string manipulation with split functions there. Just use the getAttribute method to get the data-ids.

    element.getAttribute("data-ids")
    

    That all you need. And yes I've worked with Python.



  • I believe, this is what you suggested.

    Sub Craigslist()
    Const URL = "https://newyork.craigslist.org/search/ata"
    Const pref = " https://images.craigslist.org/"
    Const suff = "_300x300.jpg"
    Dim html As New HTMLDocument
    Dim topics As Object, post As Object
    
    With New MSXML2.XMLHTTP60
        .Open "GET", URL, False
        .send
        html.body.innerHTML = .responseText
    End With
    Set topics = html.getElementsByClassName("result-image gallery")
    On Error Resume Next
    For Each post In topics
        x = x + 1
        Cells(x, 1) = pref & Split(Split(post.getAttribute("data-ids"), ":")(1), ",")(0) & suff
    Next post
    Set html = Nothing: Set topics = Nothing
    End Sub

Log in to reply
 

Looks like your connection to Codingislove Forum was lost, please wait while we try to reconnect.