June 28, 2024, 05:18:21 AM

News:

IonicWind Snippit Manager 2.xx Released!  Install it on a memory stick and take it with you!  With or without IWBasic!


Capture info from a web page - new question

Started by Andy, August 21, 2011, 02:39:37 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Andy

August 21, 2011, 02:39:37 AM Last Edit: August 21, 2011, 11:06:03 AM by LarryMc
Hi,

I have found a previous thread here about reading the html code of a web page and it does exactly that - but it stores it in a file that you can read when you want to.

This was the code posted:

sub DoEnum()
' input variables:
' a) WINDOW cont - browser window

' get the browser control
IDispatch browser
pointer p = GetPropA(cont.hWnd, "BROWSER")
if (p)

IDispatch tmp = *<comref>p
if (tmp && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))

BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif

browser->Release()
endif
endif
endsub

Question:

How can the code be changed so that I can put each line of code into a string rather than to a file?

I want to be able to read each line in turn so I can check for a specific word - can this be done?

Thanks,
Andy.

Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

billhsln

Quoteif (bstrHtml)
            ' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
            BFILE f
            OPENFILE(f, "html dump.htm", "w")
            WRITE f, "\xFF\xFE" ' unicode LE16 BOM
            __WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
            CLOSEFILE f
            FreeComString(bstrHtml)
         endif

Change the above to:

         if (bstrHtml)
'            code here to process *<char>bstrHtml
            FreeComString(bstrHtml)
         endif


Bill
When all else fails, get a bigger hammer.

sapero

This is not possible, there are no lines in HTML code. You can embed line breaks in the source code, but they will be usually ignored and deleted in the control. Even spaces and tabs are useless here.

A 10MB (or more) html script can be written in a single line, and that will be still valid.
If you really want to extract the code line by line, tokenize it:
pointer tok = wcstok(bstrHtml, L"\n") ' defined in string.inc and wchar.inc
while (tok)
  *<WSTRING>tok is the first/next line
  tok = wcstok(0, L"\n")
wend

But note that <tok> may point to a string of any length. Do not copy it to normal wstring variables, do not PRINT it if LEN returns 16KB or more.
Note2: wcstok will modify the string pointed to by bstrHtml.

Andy

Thanks for the replies,

The strange thing is when I add the code posted it works on browser_test example but NOT on the browser_test2 example?

I get the following compile errors

Compiling...
browser_test2.iwb
File: C:\2\projects\browser_test2.iwb (316) Error: Undefined variable cont
File: C:\2\projects\browser_test2.iwb (316) Error: FUNCTION (GetPropA): invalid type in parameter 1 (typeOpr)
File: C:\2\projects\browser_test2.iwb (316) Error: Cannot assign none to pointer
Error(s) in compiling C:\2\projects\browser_test2.iwb

It's complaining about this line

   pointer p = GetPropA(cont.hWnd, "BROWSER")

Any ideas on how I can fix this?

Thanks very much,
Andy.
 
Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

sapero

The browser_test2.iwb example is a multi-browser program. Global window variables have been moved to a linked list g_list.

Change your sub to' version A
sub DoEnum(WINDOW cont)
' input variables:
' a) WINDOW cont - browser window

' get the browser control
IDispatch browser = GETBROWSERINTERFACE(cont)
if (browser)

BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif

browser->Release()
endif
endsub

Or even (without the window variable) to'version B
sub DoEnum(IDispatch browser)

BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif
endsub

Notice that GETBROWSERINTERFACE command returns IWebBrowser2 interface, just casted to the generic IDispatch type, and with incremented reference counter. You need to call Release method when finished.

Now all depends from where you want to call DoEnum.
1. From the global namespace:
a) first you'll need the pointer to BROWSERDATA structure. It is returned from CreateBrowserWindow() and stored in pFirstWindow pointer.
b) call DoEnum(*pFirstWindow.cont) for version A

2. From handler and browsehandler
a) the "p" pointer points to BROWSERDATA structure. The idSave command (button) calls GETBROWSERINTERFACE and executes an OLECMDID_SAVEAS OLE command. Use it to create your own handler for a new button. Here you can call DoEnum(browser) for version B, or DoEnum(.cont) for version A.

Andy

Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

Andy

Hi,

I have incorporated the DoEnum routine into my browser (based on the browser_test2 example - thanks sapero).

It works exactly as I want it to except when the browser opens a "second" dummy window, at this point the browser crashes.

I've amended the DoEnum routine to write a binary file consisting of the HTLM code of that web page, the file is then read to find a keyword i am looking for.

Don't understand why it only crashes when the second window is opened.

This is the code:

      CASE @IDDOCUMENTCOMPLETE
         'best place to update toolbar buttons
         CONTROLCMD .win, 999, @TBENABLEBUTTON, idBack, BROWSECMD(.cont, @BACKENABLED)
         CONTROLCMD .win, 999, @TBENABLEBUTTON, idForward, BROWSECMD(.cont, @FORWARDENABLED)


            DoEnum(.cont)


And later the subroutine:

sub DoEnum(WINDOW cont)

OPENFILE(fl,"C:\\2\\dump.txt","W")

    ' input variables:
    ' a) WINDOW cont - browser window

    ' get the browser control
    IDispatch browser = GETBROWSERINTERFACE(cont)
    if (browser)

       BSTR bstrHtml = browser.Document.documentElement.outerHTML
       if (bstrHtml)

pointer tok = wcstok(bstrHtml, L"\n") ' defined in


DO
   *<WSTRING>tok 'is the first/next line
   tok = wcstok(0, L"\n")

ax = *<WSTRING>tok

b$ = ""
b$ = W2S(ax)
b$ = b$ + "\n"

WRITE fl,b$
UNTIL tok = 0



          CLOSEFILE fl
          FreeComString(bstrHtml)
       endif

       browser->Release()
ENDIF


DEF myfile as BFILE
DEF str as STRING
DEF ln:string

IF OPENFILE(myfile, "C:\\2\\dump.txt", "r") = 0
DO

ln = ""
    ln = space$(254)
    READ myfile,ln

IF INSTR(ln,"bbc.co.uk")

   'do something here

   CLOSEFILE myfile
   RETURN
ENDIF

UNTIL EOF(myfile)
CLOSEFILE myfile
ENDIF

RETURN
endsub

This is the bug report:

Problem signature:
  Problem Event Name:   APPCRASH
  Application Name:   NewTest New Version.exe
  Application Version:   0.0.0.0
  Application Timestamp:   4e7877ac
  Fault Module Name:   NewTest New Version.exe
  Fault Module Version:   0.0.0.0
  Fault Module Timestamp:   4e7877ac
  Exception Code:   c0000005
  Exception Offset:   0004831a
  OS Version:   6.1.7600.2.0.0.256.1
  Locale ID:   2057
  Additional Information 1:   0a9e
  Additional Information 2:   0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:   0a9e
  Additional Information 4:   0a9e372d3b4ad19135b953a78882e789

Can anyone tell me why it's crashing? and what must I do to fix it - in English.

Thanks as always,
Andy.




Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.