[Solved] Can't parse Yellowpage Australia the right way

  • Hi, hope you are doing well. Recently I tried to make a parser for Yellowpage Australia but the thing is that when I run my code it produces Name, Address and Phone number correctly but gets positioned in the wrong place, I meant they are not parallely embedded across the column as they are from three different class elements whereas the xmlhttp method accepts one class as the parent. I could find out that the main class should be used here is "listing listing-search listing-data". However I can't write code for Name, Address and Phone using this single class because you know It requires strong knowledge about HTML element. However, while going through your "hackernews" parser I noticed that you used "sibling" or "child" elements which might be applicable in this case. So, if you stretch a helping hand to accomplish my code the right way, I would really be grateful to you. Thanks in advance.. Here is the code I've written:

    Sub YPAusData()
    Const URL = "https://www.yellowpages.com.au/search/listings?clue=coffee+shops&locationClue=all+states&lat=&lon=&selectedViewMode=list"
    Const weblink = "https://www.yellowpages.com.au"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, topic As Object, posts As Object, post As Object
    Dim x As Long, i As Long
    x = 2
    http.Open "GET", URL, False
    html.body.innerHTML = http.responseText
    Set topics = html.getElementsByClassName("listing-name")
    Set topic = html.getElementsByClassName("click-to-call contact contact-preferred contact-phone ")
    Set posts = html.getElementsByClassName("listing-address mappable-address mappable-address-with-poi")
        For i = 0 To topics.Length - 1
            If topics.Length > 0 And topic.Length > 0 And posts.Length > 0 Then
                Cells(x, 1) = topics(i).innerText
                Cells(x, 2) = topic(i).innerText
                Cells(x, 3) = posts(i).innerText
                x = x + 1
            End If
        Next i
    End Sub

  • administrators

    • Get all contact cards first, set it to topics.
    • Inside topics loop - call listing-name, phone, address for each particular card.
      Something like this :
    Set topics = html.getElementsByClassName("search-contact-card")
    For each topic in topics
    name = topic.getElementsByClassName("listing-name")(0)

  • Hi Ranjith! Thanks to have you in the loop. It would be very nice if I could do the way you described but you know, except IE all other methods, be it winhttp or xmlhttp, maintain some logic, as in [class>tag>tag] OR [tag>tag>tag]. So, how could I call another class inside a loop out of the main class outside the loop I meant, If I'm not wrong Class name can't be called twice in a single subroutine unless the request is made twice.

  • I could come up this far with the code. Now, It parses the Name flawlessly but I can't make it work for Address and Phone number as well.

    Set topics = html.getElementsByClassName("listing listing-search listing-data")
    For Each topic In topics
    Set posts = topic.getElementsByTagName("div")(0)
    Cells(x, 1) = posts.getElementsByTagName("a")(1).innerText 'Name
    x = x + 1
    Next topic

  • administrators

    Don't overthink for this stuff. Here's the code:

    Sub YPAusData()
    Const URL = "https://www.yellowpages.com.au/search/listings?clue=coffee+shops&locationClue=all+states&lat=&lon=&selectedViewMode=list"
    Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
    Dim topics As Object, topic As HTMLHtmlElement
    http.Open "GET", URL, False
    html.body.innerHTML = http.responseText
    Set topics = html.getElementsByClassName("search-contact-card")
    For i = 1 To topics.Length - 1
    Set topic = topics(i)
    Cells(i + 1, 1).Value = topic.getElementsByClassName("listing-name")(0).innerHTML
    Cells(i + 1, 2).Value = topic.getElementsByClassName("contact-text")(0).innerHTML
    Cells(i + 1, 3).Value = topic.getElementsByClassName("listing-address")(0).innerHTML
    End Sub

  • My goodness! You my brother have some set of skills that I hardly notice in others. When you suggested for the first time the method you applied above, I thought you mistakenly said that cause I never imagined that usage of class more than once in a single sub is possible until I saw your code. I can't still believe it is possible. By the way, this stuff "HTMLHtmlElement" is awesome. You helped me a lot. Thanks a trillion. I love this site very much.

  • Dear Ranjith, Another important suggestion I would like to have from you about web scraping that is how to apply split method on responsetext. I meant, I've learnt a little about this but I would like to go further with this method that is why I'm expecting any link or something following which I can learn everything about it. I'm pasting here the code about which I am talking. The pasted code is working as it should be. Thanks in advance. Btw, do let me know how to tag the code in your blog within the black colored portion.

    Sub WebScraping()
    Const URL = "https://www.yify-torrent.org/search/1080p/"
    Const mainlink = "https://www.yify-torrent.org"
    Const sublink = "https:"
    Dim http As New MSXML2.XMLHTTP60
    Dim P As Long, N As Long, L As Long, str As Variant
    L = 2
        http.Open "GET", URL, False
        str = Split(http.responseText, "<span class=""name"">")
        N = UBound(str)
        For P = 1 To N
            Cells(L, 1) = Split(Split(str(P), "title=")(0), "<")
            Cells(L, 2) = Split(Split(str(P), "<a href'")(0), "<")
            Cells(L, 3) = mainlink & Split(Split(str(P), "href=""")(1), """")(0)
            Cells(L, 4) = sublink & Split(Split(str(P), "src=""")(1), """")(0)
            L = L + 1
        Next P
    End Sub

  • administrators

    Split Function documentation here - https://msdn.microsoft.com/en-us/library/6x627e5f(v=vs.90).aspx

    This forum uses Markdown for formatting. Code can be formatted using 3 backticks. Full documentation here - http://commonmark.org/help/

Log in to reply

Looks like your connection to Codingislove Forum was lost, please wait while we try to reconnect.