May 2016 – Navigating Uncertainty

Setting up for developing Sage is one thing, actually contributing to the development is a completely different ball game. The development process, broadly speaking, consists of the following steps:

Identify a defect, bug, enhancement or new feature that Sage has or requires.
Open a ticket on Trac describing the exact problem.
Enter discussions with other community members on clarifying the issue and discuss solution and implementation ideas.
Write code and submit.
Get it reviewed by other at least one other independent programmer.
If review is positive, the ticket is closed and code is successfully merged. Otherwise, go back to step 3 and repeat.

As a contributor, one needs to know how to do both, review as well as successfully write code. And that is what this post describes. I’ll use tickets that I worked on, to illustrate these aspects of development in further detail. In order to avoid confusion, I’ll use Code while talking about Coding Theory and code to talk about programming.

Reviewing a Ticket:

David and Johan (my mentors for the project) suggested Ticket #20113. The zero method in the linear_code.py file was supposed to return the zero vector of the associated Code. The original implementation called the gens method (the generators of the Code), took the first element of the returned list, multiplied it by 0. This resulted in all zero vector of length of the Code. While the solution wasn’t incorrect, this was a rather tedious way of obtaining it. David had already submitted a patch that simply called the zero element from the ambient space of the Code to produce the answer.

My job was to review this new code and at first glance, this seems trivially correct. And it was. But this may not always be the case. Sage is a massive beast and a small change in one part of the code could break something in some other seemingly unrelated part of the code. There are a lot of things that need to be verified and the best way is to simply follow the Reviewer’s Checklist. I’ll clarify some of the terms from there first and then provide a general rule-of-thumb workflow on how to interact with the git-trac system to review.

Doctests and Coverage: The documentation for each of the methods in Sage contains examples of code that explains how to use the method. Python can search for and extract these commands, run them and compare the output to the one mentioned in the documentation. If the patch changes code, the doctests for all the changed methods must still hold true. To do this, run the command `./sage --coverage src/sage/coding/linear_code.py.
Manuals and Building: Manuals serve as a reference book and can be incredibly useful for quick reference. Sage adds material to its manuals from the documentation in the source code itself. Which is why the documentation should follow a specific template. See here for examples and notice the formatting and indentation. Run `./sage –docbuild reference html` to build the html version of the “Reference” Manual. Others include _tutorial, developer, constructions and installation_. These are also available in pdf versions.
Speedup: Generally, it is hard to check the speed. Standard Python time-management (import time, time_clock()) can be used. The main idea is to ensure that the new code does not utterly slow down Sage. Patchbot reports can also be used but caution should be exercised and error reports should be double-checked. Run methods on large inputs, compare to other softwares that offer similar functionality, try boundary cases and consistency checks based on mathematical foundations.

Review Workflow:

git trac checkout TICKET_NUM (This will create a local branch say t/20113)
git checkout t/20113
./sage -b (See if the code builds without errors)
Run through the Checklist above and note down errors, if any.
Run various test examples in an attempt to “break” the code.
Go to Trac and write comment(s).
Change ticket status.
Write your full name in the Reviewer field of the ticket.

Writing a Ticket:

David and Johan suggested improving the efficiency of the decode_to_code method of the Nearest Neighbour Decoder. I opened a Ticket #20201 on Sage to fix this. The previous implementation created and stored the distances between the received word and every codeword. It then sorted the entire list in order to find the closest codeword. This took exponential memory and time in the size of the Code which can be very inefficient for large input. An obvious solution is to compute the Hamming weight for the first codeword and set that as the minimum. Then, iterating over the rest of the codewords and updating the minimum drastically brings down the memory and run time requirements.

There are some major differences when it comes to actually opening a new ticket and writing code for it. Whenever you write new code in Sage, you might want to add some print statements and code, run the files, test, make small changes and test again. In that case, you have to build sage (./sage -b) or in some cases even run make distclean && make which can take a very long time. Instead, the best way is to write your new code in a myfile.sage file and then run ./sage myfile.sage.

Write Workflow:

Open a new ticket on Sage. Or if a ticket that has commits already exists, create a local branch on your system. And then pull the changes from the ticket onto your branch. Do not directly checkout the ticket. It can result in very weird errors.
Discuss solution idea on trac.
On the local branch (NOT master) corresponding to the ticket, write your code.
Test your code.
git add <changed_file(s)>
git commit -m "insert detailed message here."
`git trac push TICKET_NUM`
Once you’re satisfied with your commits, set the ticket field to “needs_review”.
If positive, ticket closed. Else, go back to step 2.

As a final note, Sage is guaranteed to break down at any point for seemingly no valid, discernible reason and I’ve lost track of the time I’ve spent in trying to fix it. But in doing so, I’ve learnt a lot about the codebase and once you successfully manage to rebuild Sage, there is a weirdly awesome feeling of accomplishment!

Sage is a viable open source mathematics software package that offers an extensive toolbox
for algebra, combinatorics and numerical computations. This massive project can seem daunting to someone who wants to begin contributing to it, especially in the short GSoC application period. In what follows, I cover a constantly updating collection of advice for new programmers, to simplify the process of applying to Sage.

1) Where to find Sage

Mailing Lists:

The following are the primary ones:

sage-devel – development discussion group of Sage
sage-support – help in running/using Sage
sage-gsoc – related discussions regarding the GSoC program

Apart from these, there are tons of other topic based lists dedicated to specific modules. For example, sage-coding-theory. When applying to GSoC, find a project from the Ideas page and post on sage-gsoc (with a cross post on the related topic-based list, if it exists) indicating your interest in the project and request more details; instead of simply asking for help in “getting started” since one is “very interested in contributing to open source”.

Source Code:

Source Code – Download your copy of Sage from here
git.sagemath.org – Online Sage repository. A good way to read the code online
Documentation – Developer’s guide to the Sage
Trac – Organizing development and code review through tickets

2) Contributing to Sage

The following is more of a cheat sheet to setup everything you need so you can begin contributing. When in doubt or need of further clarity, always refer to the official documentation. It is really well written!

Installation:

Install Sage: Assuming the source code is downloaded, unzip it and go inside the directory. And run “make”. This should take a few hours. Once it finishes, you can run it by typing “./sage” on the command prompt.
Configure git: Assuming you have git installed, run the following two configuration commands (if you haven’t already)

git config –global user.name “Your Name”
git config –global user.email you@yourdomain.example.com
Install git-trac:Now, all development tasks can be performed in the usual way using git and a browser. Alternatively git-trac can simplify the communication process between git and trac. To use this, run:

git clone https://github.com/sagemath/git-trac-command.gitsource
git-trac-command/enable.sh (enables it temporarily)
cd git-trac-command
sudo python setup.py install
Obtain Sage Trac account: Development happens through tickets on Trac. In order to comment, review, commit or interact in any other active way with the source code, a Trac account is required. To do that, send an email to sage-trac-account@googlegroups.com, containing your full name, preferred username, contact email and reason for needing an account. Once the request is approved, a username and password will be assigned.
Configure git-trac: Authentication mechanism is required for read-write access. Simply put, sage needs to be told about trac with verification. To do this, run:

git trac config –user USERNAME –pass ‘PASSWORD’
ssh-keygen (to generate a public/private rsa key pair)

Run the Sage prompt (./sage) and upload it to trac using the command “dev.upload_ssh_key()“. Use the credentials obtained above for completing the authentication.

Again, the Sage documentation is magnificent and it should be referred to in case of doubts or further explanations. Once all this is done, one can start contributing by checking out an existing ticket or opening a new one. There is a lovely cheat sheet on how to do all that, already available. The way development happens is, someone opens a ticket regarding a bug, defect, new feature or enhancement and writes code for it. Once the commit is ready, it is opened for review whereupon an independent second programmer checks it and suggests changes or gives it a thumbs up.

Communicating with the Sage community:

While the people in the community are extremely helpful, it is necessary to ask for help wisely. Through the communications, three aims need to be achieved:

Form a positive impression on the mentors and admins: Ask precise questions (and don’t hesitate to ask them), ask for help if you don’t understand something but do your homework before asking. For instance, I was asked to review a tiny patch and I didn’t know what review entailed. Sage documentation has a Reviewer’s Checklist and I tried to follow that and asked questions when I didn’t understand something from there.
Make a small contribution(s) to the code: Ask for tickets related to the project you are applying for since that will not only help you understand the development process better, but also the module itself. Ideally, I’ve found that a review of one ticket and writing another new ticket on an associated module can suffice.
Understand and further define the project: Mentors have usually given the project more thought and therefore it is a good idea to talk to them and get an idea about what they have in mind for it first. What is not available, is an exact statement on what all needs to be added and how. Ask for reading material, get clarity on the various modules that need to be implemented as part of the project and come up with a plan on what needs to be done to achieve that.

3) Writing a GSoC Proposal

A good proposal cannot be prepared without a proper understanding of the organization and the project idea itself. And simply having a good understanding is not enough, if that is not visible through the proposal. There is no right way of preparing a proposal. The best way is to go over as many of them as you can find and then suitably adapting elements from each to suit the particular project in question. Here’s mine for what it’s worth: Rank-Metric Codes in Sage

There’s plenty of other advice on how to write a good proposal available; the recurring points being concise writing, clarity on project deliverables, structured timeline and review from mentors.

Month: May 2016

First Contributions