I'm trying to scrape HTML via VBA using this post as a guide: https://codingislove.com/parse-html-in-excel-vba/ - which is easy to understand.
I'm trying to scrape https://www.cscjes-cronfa.co.uk/events. I'm trying to pull event names, details and dates - which does not require log on to access the events list published.
The data I am looking for is in:
With the data repeating inside Div "event-browse-item". Specifically I am trying to pull "h3", "p" and "info" from each "event-browse-item".
I'm obviously missing the point - if I inspect the output of html.body.innerHTML - it seems to be asking for a log in - which the site does not require, so it appears to be going wrong before the main body of the code executes.
Any tips / suggestion greatly received.
Public Sub parsehtml() Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement Dim i As Integer Set http = CreateObject("MSXML2.XMLHTTP") http.Open "GET", "https://www.cscjes-cronfa.co.uk/events", False http.send html.body.innerHTML = http.responseText 'MsgBox html.body.innerText Added to see what's going on - seems to pull the sign in code....?? Set topics = html.getElementsByClassName("items") i = 2 For Each topic In topics Set titleElem = topic.getElementsByTagName("event-browse-item") Sheets(1).Cells(i, 1).Value = titleElem.getElementsByTagName("h3")(0).innerText Sheets(1).Cells(i, 2).Value = titleElem.getElementsByTagName("p")(0).innerText Sheets(1).Cells(i, 3).Value = titleElm.getElementsByTagName("info")(0).innerText 'Set detailsElem = topic.NextSibling.getElementsByTagName("event-browse-item")(1) 'Sheets(1).Cells(i, 3).Value = detailsElem.getElementsByTagName("span")(0).innerText 'Sheets(1).Cells(i, 4).Value = detailsElem.getElementsByTagName("a")(0).innerText i = i + 1 Next End Sub
Full disclosure - high school science teacher, self teaching VBA / HTML.