Introspecting django urls for fun and profit

One of the projects I work on has a fairly large and old code base, and is unfortunately somewhat lacking in the test suite department. We are working to improve this, but in the medium term we are stuck with a fair amount of manual testing.

As part of some recent development, a lot of the urls were getting a new parameter, and during some (manual) testing i discovered a view that hadn't had its signature updated to match the new url pattern. This got me thinking that there ought to be some way to automatically look for this. Ideally such bugs shuold be covered in other automatic tests but maybe some code in(tro/)spection could help with some of these issues in the short to intermediate term.

Definitely worth a couple of hours to see what we can do.

Gameplan

  1. Get a list of url patterns from django
  2. Check which keywords are captured in the url
  3. Match these keywords against the arguments in the view
  4. Highlight any mismatches
  5. Profit

1. Get a list of url patterns from django

Some googling turned up a fairly simple way to get at all the patterns

import urls

def traverse(patterns):
    for entry in patterns:
        do_something(entry)
        if hasattr(entry, 'url_patterns'):
            traverse(pattern.url_patterns)

2. Check for keywords captured in the url

Each entry form the url patterns contains a reference to the regex it uses to match urls. It turns out there are a nice few ways to introspect regexes.

regex.pattern returns the string with the regex pattern. For some reason tab completion (I use IPython) doesn't help you discover this

>>> import re
>>> regex = re.compile('some pattern')
>>> regex. # <TAB>
r.__copy__      r.findall       r.match         r.search        r.sub
r.__deepcopy__  r.finditer      r.scanner       r.split         r.subn

>>> regex.pattern
'some pattern'

So we can access the pattern. Now to find captured groups and their names. Pass the pattern though custom regex? No need it turns out. A regex object has more hidden properties we can use to introspect it.

>>> regex = re.compile('capture (?P<some>\w+) group')
>>> regex.groups
1

Nice! However, this isn't quite what we want:

>>> regex = re.compile('capture (?P<some>(foo|bar)baz) group')
>>> regex.groups
2

Sure, there are two groups, but I only care about the named ones. Back in the docs, we find

>>> regex.groupindex
{'some': 1}

Aha! Perfext, just what we want. Even contains the name of the captured parameter. (The value is the order of the group in the regex.)

Not quite everything

There are a few more considerations. For example, django lets us pass extra keyword arguments to the view, aside from ones captured in the url. These may be accessed via entry.default_args. Throughout I will also assume that we only use named capture groups in our urlpatterns.

3. Match the keywords against the arguments in the view

The view can be accessed as entry.callback. Not every url pattern has a callback, since some just import other patterns.

Once we have a reference to the function, we can start using the inspect module from the standard library.

>>> import inspect
>>> callback = entry.callback
>>> inspect.getargspec(callback)
ArgSpec(args=['request', 'event_slug'], varargs=None, keywords=None, defaults=None)

All useful stuff. args is the function signature. varargs and keywords contain the names of variables with * and ** respectively. Typically these are named args and kwargs. In my experience, these are unusual in view functions, but as we shall see later, they do appear if we are using function decorators. defautls may be a duple of default argument values specified. These correspond to the n last arguments.

So, we should no be able to do something like

def check_entry(entry):
    default_args = getattr(entry, 'default_args', {}).keys()
    # add the 'request' argument, which won't be in the url
    kwargs_provided = set(['request'] + default_args)
    captured_kwargs = entry.regex.groupindex.keys())
    kwargs_provided.update(captured_kwargs)

    argspec = inspect.getargspec(entry.callback)

    args = argspec.args
    defaults = argspec.defaults
    if defaults:
        required_args = args[:-len(defaults)]
    else:
        required_args = args

    missing_kwargs = set(required_args) - kwargs_provided
    if missing_kwargs:
        print "%s: view requires kwargs %s not in the url kwargs" % (
            entry, list(missing_kwargs))

    if argspec.keywords:
        # signature contains **kwargs, so can't do second check
        return

    extra_kwargs = kwargs_provided - set(args)
    if extra_kwargs:
        print "%s: url provides kwargs %s not in the view signature" % (
            entry, list(extra_kwargs))

4. Highlight any mismatches?

Running check_entry agains my urlconf threw up a whole bunch of false alarms. It turns out the problem was decorators. A whole bunch of my views were decorated, e.g. with @permission_required. Of course, entry doesn't know that it's callback has been swiched out for another function, but maybe we can figure it out. Some investigation turned up the func_closure property of functions, containing references to items in their closure. And decorated functions, contain a reference to the oringinal function there. To get at it, some trickery and guessing is required

def guess_wrapper(cells):
    # look through the funciton closure for cells that look
    # like wrapped functions. for multuple functions,
    # attempt to guess which is the more likely candidate

    # we add points for
    # * non-lambda
    # * 'request' in arg list

    functions = collections.defaultdict(list)
    for cell in cells:
        contents = cell.cell_contents
        if not inspect.isfunction(contents):
            continue

        score = 0
        if contents.__name__ != '<lambda>':
            score += 1
        if 'request' in inspect.getargspec(contents).args:
            score += 1

        functions[score].append(contents)

    for score in reversed(sorted(functions.keys())):
        return functions[score][0]
    return None

def unwrap(callback):
    if callback is not None:
        while callback.func_closure:
            do_compare(callback)
            cells = callback.func_closure
            guess = guess_wrapper(cells)
            if guess:
                callback = guess
            else:
                # nothing in the closure was a function, so give up
                break
        do_compare(callback)

Profilt?

Putting this all together in a django management command, we get a nice tool to help with finding bugs, especially around refactoring a django probject. The complete thing may be found on github, as django-urls-introspect

Using the command I was able to find a few issues and get them sorted.

./manage.py check_urls
edit-members: url provides kwargs ['member_id'] not in the view sinature

Is this a stupid idea? Do let me know. Any other thoughts/comments also welome.

Comments !