Web scraping with authentication

Web scraping with authentication

  

Hi community!

I've used the NScrape Demo from the Forge to test web scraping, so far so good, except that I would now like to know how to pass username and password when the site I want to scrap requires authentication. I've tried a few things with no success...so here is my cry for help.   

The demo shows an example with the IMDB site, and there seems to be a Server Action that would allow to do the authentication. The Server Action I'm refering to is the "ScrapeHtmlFrom", and I'm passing the following parameters:

URL: IMDB sign-in link: 

https://www.imdb.com/ap/signin?clientContext=134-4873338-3399902&openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl91cyIsInJlZGlyZWN0VG8iOiJodHRwOi8vd3d3LmltZGIuY29tL3RyYWlsZXJzP3JlZl89bG9naW4ifQ&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&&tag=imdbtag_reg-20 

formName: "signIn" (taken from the IMDB Html) 

formInputs

- Name: my IMDB username

-Value: my IMDB password

- IsCheckbox: True 

- IsSelect: True 

- IsRadio: True

These last 3 I don't find them in the HTML code of IMDB and they don't seem to be relevant in this case. 

I'm doing something wrong because I cannot login. Any detailed hints on how to login (I'm sort of a beginner)? are you aware of any example I could use to check how it would work?  

Thanks!

Gaby

Hi Gaby,

Don't know the component and never tried to scrap IMDB. But if you allow me, let me share my experience in the field... 

More and more, websites are connected to other web services, that require the site to keep JavaScript and information that they go look for.

This makes impossible to login programmatically, as this connections will not be made with your application. 

Usually those sites prevent what you are trying to do, actively or indirectly. If they have, you can use a API (usually a webservice), otherwise, the only way to workaround is to create a desktop application that is also a browser and you can do the login as if you were really typing the info in the page (because you are).

Cheers

Eduardo Jauch

P.S. Don't know if it is the case with IMDB.

Thanks Eduardo!