Tuesday, March 10, 2015

Unit Testing Scrapy - part 1 - Integrating with DjangoItem

In my previous post I exposed how to scrap a page which requires multiple form submissions along one of way to save the scrapped data to the database, the Django integration with DjangoItem.

In this post I want to show how we can unit test scrappers using just the usual python unit test framework, and how we need to configure our testing environment when referencing a Django model from our Items.


Basic unit testing


Continuing with the example in my previous post, let's recall our project layout

├── mappingsite
│   ├── mappingsite
│   └── storemapapp
└── storedirectoryscraper
    └── storedirectoryscraper
        └──spiders

We had built a scraper in the storedirectoryscraper project but we haven't make any unit or integration tests for it yet (you may try out a little TDD afterwards instead of testing last, but it certainly helps out to have an idea of where we are heading when learning a new tool)

So you can go ahead and create a tests.py file inside the storedirectoryscraper top level folder, and add the following code to it

import unittest


class TestSpider(unittest.TestCase):

    def test_1(self):
        pass


if __name__ == '__main__':
    unittest.main()


running

python -m unittest storedirectoryscraper.tests

from the top level folder will display the python unittest success message.

Now, let's see what happens when we try to import our spider to test it. add the following line to the top of the tests.py file

from storedirectoryscraper.spiders import rapipago


and run the test again. You should see an error message like the one below.

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/unittest/__main__.py", line 12, in <module>
    main(module=None)
  File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
    self.createTests()
  File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
    self.module)
  File "/usr/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python2.7/unittest/loader.py", line 100, in loadTestsFromName
    parent, obj = obj, getattr(obj, part)
AttributeError: 'module' object has no attribute 'tests'



What happened here is that the unittest framework is not aware of our scrapy project configuration, it is not Scrapy running our tests, it is Python directly. So the configuration in our settings file does not take any effect.
One way to solve this is to simply add the Django application to our python path so the test runner can find it when invoked. As we are going to need to do this for every test and we certainly don't want to add it definetly to our path, but just for testing the scraper, we can just create a test package and alter the path in it's __init__.py file.

 So let's do that. Create a tests folder at the same level we have our tests.py file. Add a __init__.py file to it, and move the tests.py file to that directory. After the changes, the project should look like this

storedirectoryscraper
    ├── scrapy.cfg
    └── storedirectoryscraper
        ├── __init__.py
        ├── items.py
        ├── pipelines.py
        ├── settings.py
        ├── spiders
        │   ├── __init__.py
        │   └── rapipago.py
        └── tests
            ├── __init__.py
            └── tests.py


Now add the following lines to the __init__.py file you just created


import sys
import os

BASE_DIR = os.path.dirname(os.path.dirname(__file__))
sys.path.append(os.path.join(BASE_DIR, '../../mappingsite'))
os.environ['DJANGO_SETTINGS_MODULE'] = 'mappingsite.settings'

Now if you run

python -m unittest storedirectoryscraper.tests.tests

or just

python -m unittest discover

from the top level scrapy folder, you should get a success message again.

Setting up the test database


Now that we have make our test work with the Django model, we need to be careful which database we are running our tests against, we wouldn't like our tests modifying our production database.

Looking at how we integrated our Django app to the testing environment, it turns out to be very easy to configure a testing database, separated from our development one. We just need to create different settings for dev, test and prod environments in our Django application. Let's create a setting file for testing for now.
Inside our mappingsite module, create a folder called settings. Add an empty __init__.py file so that we tell python this is a package. Move our settings file inside that folder, and rename it to base.py.


├── manage.py
├── mappingsite
│   ├── __init__.py
│   ├── settings
│   │   ├── base.py
│   │   └── __init__.py
│   ├── urls.py
│   └── wsgi.py


Now create two new modules, dev.py and test.py. and cut and space the DATABASES declaration to those two files. Rename the database name to something that makes sense for each environment (you can also change the engine if desired) so that they won't collide.

Now you can take several approaches to resolve the correct environment. For this case, we will just add

from base import *

at the top of each file and replace our settings module in wsgi.py and the scrapper's settings.py and tests.__init__.py with the correct one. This is not the recommended solution though, as we would need to change those settings when deploying to a different environment (let's say production, or an staging server). You can read more on this from the Django docs.

Summary


In this post we have seen how to set up our environment for unit testing when utilizing Django models. With these changes, we can now start unit testing our scraper and even add some integration tests to see we are actually being able to populate our database. In future posts we will go deeper into how to unit test our scraper, and later on, we will look into a Scrapy's alternative, Contracts.

1 comment:

  1. Hello Chela,
    The Article on Unit Testing Scrapy is very informative. It give detail information about it .Thanks for Sharing the information on Unit Testing. mobile application testing

    ReplyDelete