Automated access by scripts: Difference between revisions

From RavenWiki
Jump to navigationJump to search
(→‎Workaround: more details)
(→‎Workaround: added security check: does redirect actually go to the real Raven site?)
 
(One intermediate revision by the same user not shown)
Line 53: Line 53:
  #!/bin/bash
  #!/bin/bash
  # Demonstration of Raven authentication into Lookup using curl
  # Demonstration of Raven authentication into Lookup using curl
  # Markus Kuhn -- 2010-04-08
  # Markus Kuhn -- 2010-04-08 -- updated 2019-10-05
  #
  #
  # Usage: login=mgk25 passwd=... ./raven-client-demo
  # Usage: login=mgk25 passwd=... ./raven-client-demo
Line 60: Line 60:
  # login=mgk25                                  # Raven userid to be used
  # login=mgk25                                  # Raven userid to be used
  # passwd=...                                    # Raven password to be used
  # passwd=...                                    # Raven password to be used
  url='http://www.lookup.cam.ac.uk/person/mgk25' # some URL that triggers a redirect to Raven
  url='https://www.lookup.cam.ac.uk/person/crsid/mgk25' # some URL that triggers a redirect to Raven
  #
  #
  # storage space for session cookies (secret)
  # storage space for session cookies (secret)
Line 70: Line 70:
  # get redirected to Raven's "authenticate.html" form with the right cookie
  # get redirected to Raven's "authenticate.html" form with the right cookie
  redirect1=`curl -w'%{redirect_url}' $curlopt "$url"`
  redirect1=`curl -w'%{redirect_url}' $curlopt "$url"`
  # fill in the password form by accessing Raven's "authenticate2.html" page
  if [ "${redirect1:0:47}" = "https://raven.cam.ac.uk/auth/authenticate.html?" ] ; then
redirect2=`curl -w'%{redirect_url}' $curlopt --data userid="$login" --data-urlencode pwd="$passwd" --data submit=Submit "${redirect1/authenticate.html/authenticate2.html}"`
    # fill in the password form by accessing Raven's "authenticate2.html" page
# now follow the redirect back and obtain the application's session
    redirect2=`curl -w'%{redirect_url}' $curlopt --data userid="$login" --data-urlencode pwd="$passwd" --data submit=Submit "${redirect1/authenticate.html/authenticate2.html}"`
# cookie that attests our successful login
    # now follow the redirect back and obtain the application's session
curl $curlopt "$redirect2"
    # cookie that attests our successful login
    curl $curlopt "$redirect2"
fi
   
   
  # and now we can get on to do some real work ...
  # and now we can get on to do some real work ...
Line 83: Line 85:
  # Example 2: download the CSV list of CRSIDs that are members of an institution
  # Example 2: download the CSV list of CRSIDs that are members of an institution
  institution=WOLFC
  institution=WOLFC
  curl -b$cookiejar -c$cookiejar -s --data sort=crsid --data _action_download_members=Download http://www.lookup.cam.ac.uk/inst/$institution/bulk-update-members
  curl -b$cookiejar -c$cookiejar -s --data sort=crsid --data _action_download_members=Download https://www.lookup.cam.ac.uk/inst/$institution/bulk-update-members


Tip: use the Firefox add-on [https://addons.mozilla.org/en-US/firefox/addon/3829 Live HTTP headers] on an interactive session to understand how to run the same transaction automatically, e.g. using curl.
Tip: use the Firefox add-on [https://addons.mozilla.org/en-US/firefox/addon/3829 Live HTTP headers] on an interactive session to understand how to run the same transaction automatically, e.g. using curl.


'''Warning:''' The above example script does not check if the <samp>redirect1</samp> URL received actually starts with <samp>https://raven.cam.ac.uk/</samp> (it should be <samp>https://raven.cam.ac.uk/auth/authenticate.html</samp>). Therefore, a malicious web site could redirect this script to somewhere else, where it can then get hold of the submitted Raven password. In a more secure version, the submit URL <samp>https://raven.cam.ac.uk/auth/authenticate2.html</samp> should be checked or hardwired, rather than just replacing <samp>authenticate.html</samp> with <samp>authenticate2.html</samp> in whatever URL was received. (Another data point regarding the importance of providing a proper documented interface for allowing scripts to get past Raven, rather then improvising some workaround.)
'''Warning:''' An earlier version of the above example script did not check if the <samp>redirect1</samp> URL received actually starts with <samp>https://raven.cam.ac.uk/auth/authenticate.html?</samp>. Therefore, a malicious web site could redirect this script to somewhere else, where it can then get hold of the submitted Raven password.

Latest revision as of 11:52, 5 October 2019

Raven was designed to authenticate human users of a web browser. However, in practice, skilled users who know how to program, especially system administrators, often need to automate workflows that involve HTTP access to a Raven-protected resource. While there exist workarounds (see section "Workaround" below) that allow scripts to get past the Raven login screen, these are cumbersome and fragile.

Proposal: WLS to recognize two new cookies

The following simple proposal to make the Raven WLS more script friendly was made by Markus Kuhn on 28 Apr 2010 on the cs-raven-discuss mailinglist.

It involves adding to the Raven WLS three cookies that carry the login name and password and that, if presented, allow a client to by-pass the manual login screen and any additional manual confirmation screens during the Raven authentication sequence.

A script that tries to access any Raven-protected content will first add three cookies for https://raven.cl.cam.ac.uk/ to the "cookie jar" of its HTTP client tool/library:

  • Ucam-WLS-ID=your-crsid
  • Ucam-WLS-Passwd=your-raven-password
  • Ucam-WLS-mode=automatic

In addition, the script needs to instruct its HTTP client tool/library to automatically follow any HTTP redirects that it encounters.

Ucam-WLS-ID is already understood by the Raven WLS today. It replaces the login-name form element on the login page with the provided value, such that the user does not have to type in their crsid. It is currently set if a user asks https://raven.cam.ac.uk/auth/account/ to pre-fill your login name in the password form.

Ucam-WLS-Passwd would equivalently allow a client to tell Raven in advance the password, such that there is no need for Raven to display any interactive password form if both Ucam-WLS-ID and Ucam-WLS-Passwd are provided.

Ucam-WLS-mode=automatic would tell Raven explicitly that the client is a machine and is therefore not interested in any interactive notification or confirmation screens, as it can't understand English prose anyway. If Ucam-WLS-mode=automatic is present, any new interactive notifications or confirmations are postponed until the user logs in the next time without the cookie Ucam-WLS-mode=automatic.

With these three cookies set correctly, the WLS would either immediately redirect the client back to the application server's WAA where it came from (HTTP result code 302 or 303), or – if the login was not successful (wrong or missing password) – abort with an HTTP error (403 "Forbidden" seems appropriate).

Example

Say you want to access a Raven-protected web page using curl, a popular Unix command-line tool for making HTTP requests:

To deliver the above three cookies safely (i.e. only to https://raven.cam.ac.uk/), create a file "/tmp/cookiejar.txt" with content

raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-ID     your-crsid
raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-Passwd	your-raven-password
raven.cam.ac.uk	FALSE	/	TRUE	2147483647	Ucam-WLS-mode	automatic

[Columns: domain, tailmatch, path, secure, expires, name, value]

It is important to set the "secure" flag to TRUE for Ucam-WLS-Passwd such that this cookie is not accidentally submitted over an insecure non-HTTPS connection. The "expire" value is just 231−1 (19 Jan 2038), the maximum time_t value supported on 32-bit platforms.

Then a single curl call of the form

curl -L -b/tmp/cookiejar.txt -c/tmp/cookiejar.txt ....

will not only get you past Raven, but will also leave in your cookie jar the application's session cookie that then makes further Raven calls unnecessary (until timeout). Option -L causes curl to automatically follow redirects, option -b reads the cookie jar, and option -c writes back an updated cookie jar.

In other words, getting with curl past Raven will become as easy as creating one temporary file that contains the username and password, plus providing three additional command-line options to each invocation of curl.

Most other http scripting tools and libraries have equivalent facilities to set cookies and automatically follow redirects, and therefore would equally benefit greatly from this simple extension.

Workaround

The following example illustrates how one can currently access a Raven-protected URL from within a shell script and curl. Note that the script has to rewrite the WLS URL that contains the session state, to replace the login page authenticate.html with the redirect page authenticate2.html. This script will fail if there are any interactive confirmations or notifications after the login stage, or if the WLS authors make any changes to the URL structure, form fields, or other aspects of the site design.

#!/bin/bash
# Demonstration of Raven authentication into Lookup using curl
# Markus Kuhn -- 2010-04-08 -- updated 2019-10-05
#
# Usage: login=mgk25 passwd=... ./raven-client-demo
#
# some parameters
# login=mgk25                                   # Raven userid to be used
# passwd=...                                    # Raven password to be used
url='https://www.lookup.cam.ac.uk/person/crsid/mgk25' # some URL that triggers a redirect to Raven
#
# storage space for session cookies (secret)
cookiejar=/tmp/raven-demo-$USER.cookiejar
rm -f $cookiejar ; touch $cookiejar ; chmod go-rwx $cookiejar
#
# firstly, trigger and then handle the Raven redirect
curlopt="-s --output /dev/null -b$cookiejar -c$cookiejar"
# get redirected to Raven's "authenticate.html" form with the right cookie
redirect1=`curl -w'%{redirect_url}' $curlopt "$url"`
if [ "${redirect1:0:47}" = "https://raven.cam.ac.uk/auth/authenticate.html?" ] ; then
    # fill in the password form by accessing Raven's "authenticate2.html" page
    redirect2=`curl -w'%{redirect_url}' $curlopt --data userid="$login" --data-urlencode pwd="$passwd" --data submit=Submit "${redirect1/authenticate.html/authenticate2.html}"`
    # now follow the redirect back and obtain the application's session
    # cookie that attests our successful login
    curl $curlopt "$redirect2"
fi

# and now we can get on to do some real work ...

# Example 1: download the Lookup webpage $url
curl -b$cookiejar -c$cookiejar -s "$url"

# Example 2: download the CSV list of CRSIDs that are members of an institution
institution=WOLFC
curl -b$cookiejar -c$cookiejar -s --data sort=crsid --data _action_download_members=Download https://www.lookup.cam.ac.uk/inst/$institution/bulk-update-members

Tip: use the Firefox add-on Live HTTP headers on an interactive session to understand how to run the same transaction automatically, e.g. using curl.

Warning: An earlier version of the above example script did not check if the redirect1 URL received actually starts with https://raven.cam.ac.uk/auth/authenticate.html?. Therefore, a malicious web site could redirect this script to somewhere else, where it can then get hold of the submitted Raven password.