Python, web forms and cookies

Just the other day I finally got around to something that I’ve wanted to play around with for a fairly long time—posting web forms using python. As an added bonus I also took a look at dealing with cookies in Python.

For posting forms there is of course a module that makes things a lot easier, mechanize, but I wanted to first of all understand how to do it myself and secondly to avoid using anything but the standard Python modules. It turns out there isn’t much to understand. Say that we have a very simple form, say it’s a login form containing two text entries:

One way to post this form would be the following:

Simple enough, I’d say. urllib2.urlopen automatically switches from GET to POST on the existance of some data.

On most sites a cookie is used to track whether a user is logged in or not. Extending the example above to deal with this and enable subsequent requests to the site as a logged-in user leads us to the CookieJar:

After this cj will hold all the cookies returned in the response. You can enumerate over them like this:

Making requests with a cookie c is simple as well, just add c to the cookie jar before making the request:

The cookie jar also has a policy object and a method, set_cookie_if_ok that will set a cookie for a specific request only if the policy allows it. I.e. it seems fairly simple to make sure there is no cookie leakage when making requests to multiple sites. I’ll leaving playing with that for another day though.

⟸ repeat and sequence More on the weirdness that is del.icio.us API ⟹

Dominique Valentine

This was great. Thank you.

Travis

Great explanation, thanks.

chenz

nice recipe, maybe we can implement a session module using this, which can manage connections from the client.

james

exactly what i was looking for, thanks!

marc

wow thanks. i’ve been looking for this a long time :)

lee

thank you, i like this solution without any 3rd party module.

Matt

Very helpful- but I can’t figure out how to print (or save to txt) the results of the search.

when i do a print resp command I get this in the shell:

addinfourl at 21503312 whose fp = >

could someone please help?

Leave a comment