The next few moths will be interesting. I got accepted in the Google Summer of Code program and I am already starting to worry (irrationally) about the project and the schedule. I will be working on a differential geometry module for SymPy (and time permitting, some more advanced tensor algebra).
Basically, I want to create the boilerplate that will permit defining some scalar/vector/form/tensor field in an arbitrary coordinate system, then doing some coordinate-system-independent operations on the field (with hopefully coordinate-system-independent simplifications) and, finally, getting the equations describing the final result in another arbitrary coordinate system.
With this in mind, the details about the project can be seen on the proposal page. Most of it (all except the tensor algebra that I may work on at the end) is based on the work of Gerald Jay Sussman and Jack Wisdom on “Functional Differential Geometry”. I suppose that this project started as a part of their superb book “Structure and Interpretation of Classical Mechanics” (I really have to read this book if I am to call myself a physicist) and the accompanying “Scheme Mechanics” software. By the way, reading the Scheme code is a wonderful experience. This language is beautiful! The authors are also actively updating their code and a newer, more detailed paper on the project can be found here.
Most of my work will be reading the Scheme code and tracing corner cases in SymPy. My workflow will probably consist of implementing some notion from “Functional Differential Geometry” in SymPy and only when I get to semi-working state comparing with the original Scheme code for ideas, then repeating the process on the next part of the system. This way I will be less susceptible to implementing Scheme idioms in Python.
Writing the final version of each function/class of my module will probably take very little time. Most of the time will be dedicated to removing/studying corner cases and assumptions in SymPy’s codebase (more about these later) and experimenting with different approaches for the module structure (and of course reading/deciphering the work of Wisdom and Sussman).
Finally, I will speak a bit about the aforementioned corner cases and
assumptions in the SymPy’s codebase. There are the obvious things like
having to derive from Expr if you want to be able to have your class as
a part of a symbolic expression. Then there is the fact that Basic (and
its subclasses like Expr) do some magic with the arguments for the
constructor (saved in expr._args
) in order to automagically have:
- rebuildable expression with
eval(srepr(expr))==expr
- rebuildable expression with
type(expr)(*expr._args)
- some magic with the
_hashable_content()
method in order to (presumably) have efficient cashing
These details make it a bit unclear how to implement things like
CoordinateSystem
objects which learn during their existence how to
transform to other coordinate systems (thus their implementation in code
is a mutable object) but at the same time they are the same mathematical
object. Anyway, from what I have seen just having a persistent hash and
a correct srepr
should be enough. I wonder how tabu it is to change
your _args
after the creation of the class. Why I need to worry about
caching (thus the hash) and rebuilding (thus the srepr
) is still
unclear to me, but I will dedicate whole posts to them later on when I
have the explanation. The caching is presumably for performance. It is
the need for all that fancy magic that does not permit duck typing in
SymPy. If you do not subclass Basic, you can not be part of SymPy, no
matter the interfaces that you support.
Then there is the question of using the container subclasses of Expr
.
Things like Add
and Mul
, which I would have expected to be just
containers. However, they are not. They also do some partial
canonicalization, but at the moment their exact role (and more
importantly, what they don’t do) is very unclear to me. There was
much discussion about AST trees and canonicalization on the mailing
list, if you are interested, and how exactly to separate the different
duties that Add
and Mul
have, but as this is enough work for another
GSoC I decided to just stop thinking about that and use them in the
simples way possible: just as containers.
There is one drawback to this approach. The sum of two vector fields for
example is still a vector field and the object that represents the sum
should have all the methods of the object representing one of the
fields, however Add
does not have the same methods as VectorField
.
The solution that was already used in the matrix module was to create
classes like MatrixAdd
, and the same was done in the quantum physics
module. However, I fear such proliferation of classes for it becomes
unsustainable as the number of different modules grows. What happens
when I want to combine two objects from the disjoint modules? This is
why I simply use Add
and Mul
and implement helper functions that are
not part of the class. These helper functions will ideally be merged in
some future canonicalizer that comes about from separating the container
and canonicalization parts of Add
and Mul
.
One last remark is that I will probably have to work on sympify and the sympification of matrices, as I will use coordinate tuples (column vectors) quite often. Then there is the distinction between Application and Function and all the magic with metaclasses that seems very hard to justify. But probably I will write entire posts in which I try to understand why the metaclasses in the core are necessary.