Extract HTML from a redirected page

HD666

Newcomer
Joined
Nov 24, 2011
Hello,

I am using visual basic 2005. I found on the web the following function that extracts HTML from webpages. It is very useful but unfortunately it does not work with redirected pages. That is, when I put in it a URL of a redirect page it gives me nothing or error. I added to it ".AllowAutoRedirect = True" but still it did not work. I wonder how to make it work for redirected pages.

I appreciate the help.
Code:
Public Function GetPageHTML(ByVal URL As String, _
      Optional ByVal TimeoutSeconds As Integer = 10) _
     As String
        ' Retrieves the HTML from the specified URL,
        ' using a default timeout of 10 seconds
        Dim objRequest As Net.HttpWebRequest
        Dim objResponse As Net.HttpWebResponse
        Dim objStreamReceive As System.IO.Stream
        Dim objEncoding As System.Text.Encoding
        Dim objStreamRead As System.IO.StreamReader

        Try
            ' Setup our Web request
            objRequest = Net.WebRequest.Create(URL)
            objRequest.Method = "GET"
            objRequest.KeepAlive = True
            objRequest.AllowAutoRedirect = True
            objRequest.Timeout = TimeoutSeconds * 1000
            ' Retrieve data from request
            objResponse = objRequest.GetResponse()
            objStreamReceive = objResponse.GetResponseStream
            objEncoding = System.Text.Encoding.GetEncoding( _
                "utf-8")
            objStreamRead = New System.IO.StreamReader( _
                objStreamReceive, objEncoding)
            ' Set function return value
            GetPageHTML = objStreamRead.ReadToEnd()
            ' Check if available, then close response
            If Not objResponse Is Nothing Then
                objResponse.Close()
            End If
        Catch
           Return "error"
        End Try
    End Function
 

haydenw

Newcomer
Joined
Dec 18, 2011
Hi,
Pages that redirects sends a http header back to the client showing the location of the page being redirected to. Using VB 2008 i found the following code will give you this location:

Code:
objresponse.ResponseUri.AbsoluteUri

You Could then do the following:

Code:
If not url=objresponse.ResponseUri.AbsoluteUri then
return GetPageHTML(objresponse.ResponseUri.AbsoluteUri)
End IF

Hope this helps.
 
Top Bottom