Download XML file using Selenium



  • Hi there,

    I managed using a VBA script within Excel with the Selenium Add-on to login to a website and everything works fine.

    After I am logged a XML-file is loaded in the Browser which I want to download to a local file to be further analyzed in Excel.

    How can I download the XML-file line by line or otherwise the full website to a file using Selenium? Currently all efforts to scrape the contents failed because I do not get a match with the FindElemenyBy... methods - the information displayed is XML and not HTML :-(

    Do you have any idea how I can save the XML file to a local file?

    Thank you very much for your support on this issue.


  • administrators

    You can write it to a local file using plain VBA. Let's say you want to download this XML file - https://bin.codingislove.com/raw/vezawabegi

    Then code looks like this :

    Sub downloadXmlSelenium()
    Dim bot As New WebDriver
    bot.Start "chrome", "https://bin.codingislove.com"
    bot.get "/raw/vezawabegi"
    myfile = Application.ActiveWorkbook.Path & "\result.xml"
    MsgBox (myfile)
    Open myfile For Output As #1
    Write #1, bot.FindElementByTag("body").Text
    Close #1
    bot.Quit
    End Sub
    


  • Thank you ver much for your reply, this seems to be a nice way, but unfortunately does not work.

    The site I'm using does not embed the XML-file in HTML-tags, therefore there is no "body" tag to be used for the extract. However, Chrome provides some basic HTML-structure even if a plain XML-file is provided by the server and I could grab the class "pretty-print" to extract the XML-details but as I am usually using firefox this does not work. In addition to this the XML-file contains > 5.000 entries so working on the "body" tag or "pretty-print" class does take a very long time...

    Isn't there any other solution to save the whole website to a local file instead of grabbing only defined parts of it?

    Thank you very much for your support,


  • administrators

    You are looking at a page which fetches XML from some API, renders XML and formats it using CSS. The best solution is to find the source URL of XML file by inspecting XHR requests of that page. Once you find the source, Make a simple GET request and grab the XML.

    Read this - https://codingislove.com/best-practices-scraping-website-data/


Log in to reply
 

Looks like your connection to Codingislove Forum was lost, please wait while we try to reconnect.