[Solved] Can't parse Yellowpage Australia the right way
Hi, hope you are doing well. Recently I tried to make a parser for Yellowpage Australia but the thing is that when I run my code it produces Name, Address and Phone number correctly but gets positioned in the wrong place, I meant they are not parallely embedded across the column as they are from three different class elements whereas the xmlhttp method accepts one class as the parent. I could find out that the main class should be used here is "listing listing-search listing-data". However I can't write code for Name, Address and Phone using this single class because you know It requires strong knowledge about HTML element. However, while going through your "hackernews" parser I noticed that you used "sibling" or "child" elements which might be applicable in this case. So, if you stretch a helping hand to accomplish my code the right way, I would really be grateful to you. Thanks in advance.. Here is the code I've written:
Sub YPAusData() Const URL = "https://www.yellowpages.com.au/search/listings?clue=coffee+shops&locationClue=all+states&lat=&lon=&selectedViewMode=list" Const weblink = "https://www.yellowpages.com.au" Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument Dim topics As Object, topic As Object, posts As Object, post As Object Dim x As Long, i As Long x = 2 http.Open "GET", URL, False http.send html.body.innerHTML = http.responseText Set topics = html.getElementsByClassName("listing-name") Set topic = html.getElementsByClassName("click-to-call contact contact-preferred contact-phone ") Set posts = html.getElementsByClassName("listing-address mappable-address mappable-address-with-poi") For i = 0 To topics.Length - 1 If topics.Length > 0 And topic.Length > 0 And posts.Length > 0 Then Cells(x, 1) = topics(i).innerText Cells(x, 2) = topic(i).innerText Cells(x, 3) = posts(i).innerText x = x + 1 End If Next i End Sub
- Get all contact cards first, set it to topics.
- Inside topics loop - call listing-name, phone, address for each particular card.
Something like this :
Set topics = html.getElementsByClassName("search-contact-card") For each topic in topics name = topic.getElementsByClassName("listing-name")(0) next
Hi Ranjith! Thanks to have you in the loop. It would be very nice if I could do the way you described but you know, except IE all other methods, be it winhttp or xmlhttp, maintain some logic, as in [class>tag>tag] OR [tag>tag>tag]. So, how could I call another class inside a loop out of the main class outside the loop I meant, If I'm not wrong Class name can't be called twice in a single subroutine unless the request is made twice.
I could come up this far with the code. Now, It parses the Name flawlessly but I can't make it work for Address and Phone number as well.
Set topics = html.getElementsByClassName("listing listing-search listing-data")
For Each topic In topics
Set posts = topic.getElementsByTagName("div")(0)
Cells(x, 1) = posts.getElementsByTagName("a")(1).innerText 'Name
x = x + 1
Don't overthink for this stuff. Here's the code:
Sub YPAusData() Const URL = "https://www.yellowpages.com.au/search/listings?clue=coffee+shops&locationClue=all+states&lat=&lon=&selectedViewMode=list" Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument Dim topics As Object, topic As HTMLHtmlElement http.Open "GET", URL, False http.send html.body.innerHTML = http.responseText Set topics = html.getElementsByClassName("search-contact-card") For i = 1 To topics.Length - 1 Set topic = topics(i) Cells(i + 1, 1).Value = topic.getElementsByClassName("listing-name")(0).innerHTML Cells(i + 1, 2).Value = topic.getElementsByClassName("contact-text")(0).innerHTML Cells(i + 1, 3).Value = topic.getElementsByClassName("listing-address")(0).innerHTML Next End Sub
My goodness! You my brother have some set of skills that I hardly notice in others. When you suggested for the first time the method you applied above, I thought you mistakenly said that cause I never imagined that usage of class more than once in a single sub is possible until I saw your code. I can't still believe it is possible. By the way, this stuff "HTMLHtmlElement" is awesome. You helped me a lot. Thanks a trillion. I love this site very much.
Dear Ranjith, Another important suggestion I would like to have from you about web scraping that is how to apply split method on responsetext. I meant, I've learnt a little about this but I would like to go further with this method that is why I'm expecting any link or something following which I can learn everything about it. I'm pasting here the code about which I am talking. The pasted code is working as it should be. Thanks in advance. Btw, do let me know how to tag the code in your blog within the black colored portion.
Sub WebScraping() Const URL = "https://www.yify-torrent.org/search/1080p/" Const mainlink = "https://www.yify-torrent.org" Const sublink = "https:" Dim http As New MSXML2.XMLHTTP60 Dim P As Long, N As Long, L As Long, str As Variant L = 2 http.Open "GET", URL, False http.send str = Split(http.responseText, "<span class=""name"">") N = UBound(str) For P = 1 To N Cells(L, 1) = Split(Split(str(P), "title=")(0), "<") Cells(L, 2) = Split(Split(str(P), "<a href'")(0), "<") Cells(L, 3) = mainlink & Split(Split(str(P), "href=""")(1), """")(0) Cells(L, 4) = sublink & Split(Split(str(P), "src=""")(1), """")(0) L = L + 1 Next P End Sub
Split Function documentation here - https://msdn.microsoft.com/en-us/library/6x627e5f(v=vs.90).aspx
This forum uses Markdown for formatting. Code can be formatted using 3 backticks. Full documentation here - http://commonmark.org/help/