Automated access by scripts

From RavenWiki
Revision as of 11:52, 5 October 2019 by mgk25 (talk | contribs) (→‎Workaround: added security check: does redirect actually go to the real Raven site?)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Raven was designed to authenticate human users of a web browser. However, in practice, skilled users who know how to program, especially system administrators, often need to automate workflows that involve HTTP access to a Raven-protected resource. While there exist workarounds (see section "Workaround" below) that allow scripts to get past the Raven login screen, these are cumbersome and fragile.

Proposal: WLS to recognize two new cookies

The following simple proposal to make the Raven WLS more script friendly was made by Markus Kuhn on 28 Apr 2010 on the cs-raven-discuss mailinglist.

It involves adding to the Raven WLS three cookies that carry the login name and password and that, if presented, allow a client to by-pass the manual login screen and any additional manual confirmation screens during the Raven authentication sequence.

A script that tries to access any Raven-protected content will first add three cookies for https://raven.cl.cam.ac.uk/ to the "cookie jar" of its HTTP client tool/library:

  • Ucam-WLS-ID=your-crsid
  • Ucam-WLS-Passwd=your-raven-password
  • Ucam-WLS-mode=automatic

In addition, the script needs to instruct its HTTP client tool/library to automatically follow any HTTP redirects that it encounters.

Ucam-WLS-ID is already understood by the Raven WLS today. It replaces the login-name form element on the login page with the provided value, such that the user does not have to type in their crsid. It is currently set if a user asks https://raven.cam.ac.uk/auth/account/ to pre-fill your login name in the password form.

Ucam-WLS-Passwd would equivalently allow a client to tell Raven in advance the password, such that there is no need for Raven to display any interactive password form if both Ucam-WLS-ID and Ucam-WLS-Passwd are provided.

Ucam-WLS-mode=automatic would tell Raven explicitly that the client is a machine and is therefore not interested in any interactive notification or confirmation screens, as it can't understand English prose anyway. If Ucam-WLS-mode=automatic is present, any new interactive notifications or confirmations are postponed until the user logs in the next time without the cookie Ucam-WLS-mode=automatic.

With these three cookies set correctly, the WLS would either immediately redirect the client back to the application server's WAA where it came from (HTTP result code 302 or 303), or – if the login was not successful (wrong or missing password) – abort with an HTTP error (403 "Forbidden" seems appropriate).

Example

Say you want to access a Raven-protected web page using curl, a popular Unix command-line tool for making HTTP requests:

To deliver the above three cookies safely (i.e. only to https://raven.cam.ac.uk/), create a file "/tmp/cookiejar.txt" with content

raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-ID     your-crsid
raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-Passwd	your-raven-password
raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-mode	automatic

[Columns: domain, tailmatch, path, secure, expires, name, value]

It is important to set the "secure" flag to TRUE for Ucam-WLS-Passwd such that this cookie is not accidentally submitted over an insecure non-HTTPS connection. The "expire" value is just 231−1 (19 Jan 2038), the maximum time_t value supported on 32-bit platforms.

Then a single curl call of the form

curl -L -b/tmp/cookiejar.txt -c/tmp/cookiejar.txt ....

will not only get you past Raven, but will also leave in your cookie jar the application's session cookie that then makes further Raven calls unnecessary (until timeout). Option -L causes curl to automatically follow redirects, option -b reads the cookie jar, and option -c writes back an updated cookie jar.

In other words, getting with curl past Raven will become as easy as creating one temporary file that contains the username and password, plus providing three additional command-line options to each invocation of curl.

Most other http scripting tools and libraries have equivalent facilities to set cookies and automatically follow redirects, and therefore would equally benefit greatly from this simple extension.

Workaround

The following example illustrates how one can currently access a Raven-protected URL from within a shell script and curl. Note that the script has to rewrite the WLS URL that contains the session state, to replace the login page authenticate.html with the redirect page authenticate2.html. This script will fail if there are any interactive confirmations or notifications after the login stage, or if the WLS authors make any changes to the URL structure, form fields, or other aspects of the site design.

#!/bin/bash
# Demonstration of Raven authentication into Lookup using curl
# Markus Kuhn -- 2010-04-08 -- updated 2019-10-05
#
# Usage: login=mgk25 passwd=... ./raven-client-demo
#
# some parameters
# login=mgk25                                   # Raven userid to be used
# passwd=...                                    # Raven password to be used
url='https://www.lookup.cam.ac.uk/person/crsid/mgk25' # some URL that triggers a redirect to Raven
#
# storage space for session cookies (secret)
cookiejar=/tmp/raven-demo-$USER.cookiejar
rm -f $cookiejar ; touch $cookiejar ; chmod go-rwx $cookiejar
#
# firstly, trigger and then handle the Raven redirect
curlopt="-s --output /dev/null -b$cookiejar -c$cookiejar"
# get redirected to Raven's "authenticate.html" form with the right cookie
redirect1=`curl -w'%{redirect_url}' $curlopt "$url"`
if [ "${redirect1:0:47}" = "https://raven.cam.ac.uk/auth/authenticate.html?" ] ; then
    # fill in the password form by accessing Raven's "authenticate2.html" page
    redirect2=`curl -w'%{redirect_url}' $curlopt --data userid="$login" --data-urlencode pwd="$passwd" --data submit=Submit "${redirect1/authenticate.html/authenticate2.html}"`
    # now follow the redirect back and obtain the application's session
    # cookie that attests our successful login
    curl $curlopt "$redirect2"
fi

# and now we can get on to do some real work ...

# Example 1: download the Lookup webpage $url
curl -b$cookiejar -c$cookiejar -s "$url"

# Example 2: download the CSV list of CRSIDs that are members of an institution
institution=WOLFC
curl -b$cookiejar -c$cookiejar -s --data sort=crsid --data _action_download_members=Download https://www.lookup.cam.ac.uk/inst/$institution/bulk-update-members

Tip: use the Firefox add-on Live HTTP headers on an interactive session to understand how to run the same transaction automatically, e.g. using curl.

Warning: An earlier version of the above example script did not check if the redirect1 URL received actually starts with https://raven.cam.ac.uk/auth/authenticate.html?. Therefore, a malicious web site could redirect this script to somewhere else, where it can then get hold of the submitted Raven password.