.. -*- rest -*-
.. NB! Keep this document a valid restructured document.

Developing Scipy
================

:Author: Pearu Peterson <pearu@cens.ioc.ee>
:Last changed: $Date: 2004/04/22 15:55:11 $
:Revision: $Revision: 1.16 $
:Discussions to: scipy-dev@scipy.org

.. Contents::

Introduction
------------

Scipy aims at being a robust and efficient "super-package" of a number
of its modules, each having a complexity and size to be a highly
non-trivial package itself. In order to "Scipy integration" to work
flawlessly, all Scipy modules must follow certain rules that are
described in this document. Hopefully this document will be helpful to
Scipy contributors as well as to developers as a basic reference about
the structure of Scipy package.

Scipy structure
---------------

Currently Scipy consists of the following files and directories:

  INSTALL.txt
    Scipy prerequisites, installation, testing, and troubleshooting.

  PACKAGERS.txt
    Information on how to package Scipy and related tools.

  THANKS.txt
    Scipy developers and contributors. Please keep it up to date!!

  DEVELOPERS.txt
    Scipy structure (this document).

  setup.py
    Script for building and installing Scipy. It calls also
    scipy_core/setup.py with the same command line arguments
    as specified for setup.py. You'll find scipy_core related
    files in scipy_core/{dist,build}.

  MANIFEST.in
    Additions to distutils generated Scipy tar-balls.
    Its usage is depreciated, in general.

  scipy_core/
    Contains four modules, scipy_base, scipy_distutils, scipy_test,
    and weave, that all Scipy modules may depend on. As a rule,
    scipy_distutils is required only for building, scipy_test for
    running tests, and scipy_base contains various tools for runtime
    usage.

  Lib/
    Contains Scipy __init__.py and the directories of Scipy modules.

  tutorial/
    Scipy tutorial.

  util/
    Various tools [Not useful in general. Could we get rid of this?].


Scipy module
------------

In the following, a *Scipy module* is defined as a Python package, say
xxx, that is located in the Lib/ directory.  All Scipy modules should
follow the following conventions:

* Ideally, each Scipy module should be self-contained as much as
  possible, that is, it must be usable as standalone and have minimal
  dependencies to other packages or modules, even if they would be
  also Scipy modules. The exception is ``scipy_base``, its usage is 
  encouraged as a replacement of ``Numeric`` or ``numarray`` modules
  to simplify the future transition Numeric->Numarray.

* Directory ``xxx/`` must contain 

  + a file ``setup_xxx.py`` that defines
    ``configuration(parent_package='',parent_path=None)`` function.  
    See below for more details.

  + a file ``info_xxx.py``. See below more details.

* Directory ``xxx/`` may contain 

  + a directory ``tests/`` that contains files ``test_<name>.py``
    corresponding to modules ``xxx/<name>{.py,.so,/}``.  See below for
    more details.

  + a file ``MANIFEST.in`` that may contain only ``include setup.py`` line.
    DO NOT specify sources in MANIFEST.in, you must specify all sources
    in setup_xxx.py file. Otherwise Scipy tar-ball will miss these sources.

[Open issues: where we should put documentation?]

File xxx/setup_xxx.py
---------------------

Each Scipy module setup_xxx.py file should contain a function
``configuration(..)`` that returns a dictionary which must be usable
as an argument to distutils setup function.

For example, a minimal setup_xxx.py file for a pure Python Scipy
module xxx would be::

  def configuration(parent_package='',parent_path=None):
      package = 'xxx'
      from scipy_distutils.misc_util import default_config_dict
      config = default_config_dict(package,parent_package)
      return config

  if __name__ == '__main__':
      from scipy_distutils.core import setup
      setup(**configuration(parent_path=''))

A Scipy module may have also a ``xxx/setup.py`` file that should contain
one statement::

  execfile('setup_xxx.py')

Ideally there should be no need for this file but
``distutils/command/bdist_rpm.py`` (Python versions <=2.3) has
``setup.py`` hardcoded in and therefore building .rpm files without
the above described ``setup.py`` file will fail.  This is only
relevant when you wish to distribute Scipy module separately from
scipy.


get_path
++++++++

``scipy_distutils.misc_util`` provides function
``get_path(modulename,parent_path=None)`` that returns the directory
of ``modulename``. In ``setup_xxx.py`` file this can be used to
determine the local directory name as follows::

  local_path = get_path(__name__, parent_path)

If ``parent_path`` is not ``None`` then the returned path is relative
to parent path. This avoids longish paths.

When ``setup_xxx.py`` script is going to use ``os.path.join`` a lot
then defining the following functions can be handy::

  def local_join(*names):
        return os.path.join(*((local_path,)+names))
  def local_glob(*names):
      return glob.glob(os.path.join(*((local_path,)+names)))


Building sources 
++++++++++++++++

Often building an extension module involves a step where sources are
generated by, for example, by SWIG or F2PY. However, such a step
should be carried out only when building a module and, in general,
should be skipped when creating a distribution, for instance.

Scipy_distutils provides natural support for building sources from .i
(SWIG) and .pyf (F2PY) files. These files should be listed in the
``sources`` list to ``Extension`` constructor and Scipy_distutils
takes care of processing these files.

For examples, see

::

  scipy_distutils/tests/f2py_ext/
  scipy_distutils/tests/swig_ext/

In addition, Scipy_distutils allows building sources from whatever
means that is most suitable for you. All you need to do is to provide
in the ``sources`` list auxiliary functions with the following
signatures:

::

  def build_sources(extension, build_dir):
      ...
      return <list of generated source files>

  def build_source(extension, build_dir):
      ...
      return <name of the generated source file>

Here ``extension`` argument refers to the corresponding ``Extension``
instance so that all its attributes are available to be used or to be
changed in inside these functions. The ``build_dir`` argument is
suggested (and highly recommended) location for saving generated
source files. Btw, if you will use ``build_dir`` as a prefix to all
generated source files then Scipy_distutils will be able to build
source distributions that contain built sources and in users side they
will be used instead of regenerating them.

For an example, see

::

  scipy_distutils/tests/swig_ext/gen_ext/

Note that generated source files may be C or Fortran source files as
well as Python files.

All dependencies on auxiliary files (e.g. Python files, header files,
etc that are used to generated sources and should not be installed)
should be specified in ``depends`` list of the ``Extension``
constructor.

SourceGenerator [depreciated]
+++++++++++++++++++++++++++++

Often building a module envolves a step where sources are generated by
whatever means. However, such a step should be carried out only when
building modules and should be skipped when creating a distribution,
for instance. To facilitate this, ``scipy_distutils.misc_util``
provides a class ``SourceGenerator(func,target,sources=[],*args)``
that can be used to hold the process of source generation.  Here
``func`` is a function ``func(target,sources,*args)`` that is called
whenever ``target`` should be generated. ``target`` is a name of
source file that ``func`` must create. ``sources`` is a list of files
that ``target`` depends on and ``target`` will be regenerated whenever
these dependencies are changed. ``args`` can be used to pass on extra
arguments to ``func``. The instance of ``SourceGenerator`` can be used
in the ``sources`` list argument of an Extension class constructor.
See ``Lib/xxx/setup_xxx.py`` for a typical example of
``SourceGenerator`` usage.

If ``func`` is ``None`` then the ``target`` must exist and whenever
``sources`` are modified, the ``target`` file is touched.  This
feature is useful when including non-standard dependencies to
Extension instances, just put them to ``sources`` list.  See fastumath
module in ``scipy_core/scipy_base/setup_scipy_base.py`` for example.


SourceFilter [depreciated]
++++++++++++++++++++++++++

On different platforms different sources may be required to build a
module. When making such a difference in ``configuration()`` function
by defining different sources for an Extension instance, then there
might occur portability issues (e.g. missing files) when a source
tar-ball was created on a different platform than the users platform.

To overcome this difficulty, ``scipy_distutils.misc_util`` provides
``SourceFilter(func,sources,*args)`` class that can be used to define
a holder of all sources. Function ``func(sources,*args)`` should
return a list of sources that is relevant for building the module on
the particular platfrom. ``SourceFilter`` instance can be used in the
list of ``sources`` argument of the Extension class.

File xxx/info_xxx.py
--------------------

Scipy setup.py and Lib/__init__.py files assume that each Scipy module
contains a info_xxx.py file. The following information will be looked
from this file:

__doc__
  The documentation string of the module.

__doc_title__
  The title of the module. If not defined then the first non-empty 
  line of __doc__ will be used.

standalone
  Boolean variable indicating whether the module should be installed
  as standalone or under scipy. Default value is False.

dependencies
  [Support not implemented yet, may be it is YAGNI?]
  List of module names that the module depends on. The module will not
  be installed if any of the dependencies is missing. If the module
  depends on another Scipy module, say yyy, and that is not going to
  be installed standalone, then use full name, that is, ``scipy.yyy``
  instead of ``yyy``.

global_symbols
  List of names that should be imported to scipy name space. To import
  all symbols to scipy name space, define ``global_symbols=['*']``.
  This option is effective only when ``standalone=False``.

ignore
  Boolean variable indicating that the module should be ignored or
  not. Default value is False. Useful when the module is platform
  dependent or badly broken.

postpone_import
  Boolean variable indicating that importing module should be
  postponed until first attempt of its usage. Default value is False.
  This option is effective only when ``standalone=False``.

File xxx/__init__.py
---------------------

To speed up the import time as well as to minimize memory usage, scipy
uses ppimport hooks to transparently postpone importing large modules
that might not be used during a Scipy usage session. But in order to
have an access to documentation of all Scipy modules, including of the
postponed modules, the documentation string of a module (that would
usually reside in __init__.py file) should be copied also 
to info_xxx.py file.

So, the header a typical xxx/__init__.py file is::

  #
  # Module xxx - ...
  #

  from info_xxx import __doc__
  ...

File xxx/tests/test_yyy.py
--------------------------

Ideally, each Python code, extension module, or a subpackage in
``xxx/`` directory should have the corresponding ``test_<name>.py``
file in ``xxx/tests/`` directory. This file should define classes
derived from ``ScipyTestCase`` (or from ``unittest.TestCase``) class
and have names starting with ``test``. The methods of these classes
which names start with ``bench``, ``check``, or ``test``, are passed
on to unittest machinery. In addition, the value of the first optional
argument of these methods determine the level of the corresponding
test. Default level is 1.

A minimal example of a ``test_yyy.py`` file that implements tests for
a module ``xxx.yyy`` containing a function ``zzz()``, is shown below::

  import sys
  from scipy_test.testing import *

  set_package_path()
  # import xxx symbols
  from xxx.yyy import zzz
  restore_path()

  set_local_path()
  # import modules that are located in the same directory as this file.
  restore_path()

  class test_zzz(ScipyTestCase):
      def check_simple(self, level=1):
          assert zzz()=='Hello from zzz'
      #...

  if __name__ == "__main__":
      ScipyTest('xxx.yyy').run()

``ScipyTestCase`` is derived from ``unittest.TestCase`` and it
implements additional method ``measure(self, code_str, times=1)``.

``scipy_test.testing`` module provides also the following convenience
functions::

  assert_equal(ctual,desired,err_msg='',verbose=1)
  assert_almost_equal(actual,desired,decimal=7,err_msg='',verbose=1)
  assert_approx_equal(actual,desired,significant=7,err_msg='',verbose=1)
  assert_array_equal(x,y,err_msg='')
  assert_array_almost_equal(x,y,decimal=6,err_msg='')
  rand(*shape) # returns random array with a given shape

``ScipyTest`` can be used for running ``tests/test_*.py`` scripts.
For instance, to run all test scripts of the module ``xxx``, execute
in Python:

  >>> ScipyTest('xxx').test(level=1,verbosity=1)

or equivalently,

  >>> import xxx
  >>> ScipyTest(xxx).test(level=1,verbosity=1)

To run only tests for ``xxx.yyy`` module, execute:

  >>> ScipyTest('xxx.yyy').test(level=1,verbosity=1)

To take the level and verbosity parameters for tests from
``sys.argv``, use ``ScipyTest.run`` method (this is supported only
when ``optparse`` is installed).


Open issues and discussion
--------------------------

Documentation
+++++++++++++

That is an important feature that Scipy is currently missing. Few
Scipy modules have some documentation but they use different formats
and are mostly out-dated.

Currently there are

* Scipy tutorial by Travis E. Oliphant that is maintained using LyX. 
  The main advantage of this approach is that one can use mathematical
  formulas in documentation.

* I (Pearu) have used reStructuredText formated .txt files to document
  various bits of software. This is mainly because ``docutils`` might
  become a standard tool to document Python modules. The disadvantage
  is that it does not support mathematical formulas (though, we might
  add this feature ourself using e.g. LaTeX syntax).

* Various text files with almost no formatting and mostly badly out
  dated.

* Documentation strings of Python functions, classes, and modules.
  Some Scipy modules are well-documented in this sense, other again
  are very poorly documented. Other issue is that there is no
  consensus on how to format documentation strings, mainly because
  we haven't decided which tool to use to generate, for instance, HTML
  pages of documentation strings.

So, we need unique rules for documenting Scipy modules. Here are some
requirements that documentation tools should satsify:

* Easy to use. This is important to lower the threshold of developers
  to use the same documentation utilities.

* In general, all functions that are visible to Scipy end-users, must
  have well-maintained documentation strings.

* Support for mathematical formulas. Since Scipy is a tool for
  scientific work, it is hard to avoid formulas to describe how its
  modules are good for. So, documentation tools should support LaTeX.

* Documentation of a feature should be closely related to its
  interface and implementation. This is important for keeping
  documentation up to date. One option would be to maintain
  documentation in source files (and have a tool that extracts
  documentation from sources). The main disadvantage with that is
  the lack of convenience writing documentation as the editor would be
  in different mode (e.g. Python mode) from the mode suitable for
  documentation.

* Differentiation of implementation (e.g. from scanning sources) and
  concept (e.g. tutorial, users guide, manual) based docs.
  

Configuration
+++++++++++++

[Discuss system_info.py limitations. Need a building step to determine
certain system parameters.]
