VBA scraping HTML Help?



  • Hi all

    I'm trying to scrape HTML via VBA using this post as a guide: https://codingislove.com/parse-html-in-excel-vba/ - which is easy to understand.

    I'm trying to scrape https://www.cscjes-cronfa.co.uk/events. I'm trying to pull event names, details and dates - which does not require log on to access the events list published.

    The data I am looking for is in:

    Div "items"

    With the data repeating inside Div "event-browse-item". Specifically I am trying to pull "h3", "p" and "info" from each "event-browse-item".

    I'm obviously missing the point - if I inspect the output of html.body.innerHTML - it seems to be asking for a log in - which the site does not require, so it appears to be going wrong before the main body of the code executes.

    Any tips / suggestion greatly received.

    Code:

    Public Sub parsehtml()
    Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
    Dim i As Integer
    Set http = CreateObject("MSXML2.XMLHTTP")
    http.Open "GET", "https://www.cscjes-cronfa.co.uk/events", False
    http.send
    html.body.innerHTML = http.responseText
    
    'MsgBox html.body.innerText  Added to see what's going on - seems to pull the sign in code....??
    
    
    Set topics = html.getElementsByClassName("items")
    i = 2
    
    For Each topic In topics
    Set titleElem = topic.getElementsByTagName("event-browse-item")
    Sheets(1).Cells(i, 1).Value = titleElem.getElementsByTagName("h3")(0).innerText
    Sheets(1).Cells(i, 2).Value = titleElem.getElementsByTagName("p")(0).innerText
    Sheets(1).Cells(i, 3).Value = titleElm.getElementsByTagName("info")(0).innerText
    'Set detailsElem = topic.NextSibling.getElementsByTagName("event-browse-item")(1)
    'Sheets(1).Cells(i, 3).Value = detailsElem.getElementsByTagName("span")(0).innerText
    'Sheets(1).Cells(i, 4).Value = detailsElem.getElementsByTagName("a")(0).innerText
    i = i + 1
    Next
    End Sub
    

    Full disclosure - high school science teacher, self teaching VBA / HTML.

    Cheers
    Glen


  • administrators

    Hi Glen,

    The data in that page is rendered client side by making a call to their API. Read more about client side rendering here and how to scrape such sites here - https://codingislove.com/best-practices-scraping-website-data/

    Here's the API Url for your understanding - https://www.cscjes-cronfa.co.uk/api/events?startDate=2017-07-26T00%3A00%3A00%2B05%3A30&take=25&skip=0&_=1501015156446

    If you make a GET request to this URL then you'll get that data.


guest-login-reply
 

reconnecting-message