Sage

Project Linear Rank Metric Codes in Sage – GSoC 2016

 

Introduction:

This document is meant to be a reference for people who want to understand the work done on implementing Skew Polynomials and Rank Metric Codes in Sage as a part of this GSoC 2016 project. My mentors for this project are Johan Rosenkilde and David Lucas. This post provides a brief mathematical background, summary and status of the tickets related to the project and further links to detailed discussions and the code. It is assumed that the reader is familiar with the basic concepts of abstract algebra, coding theory and Sage itself.

Background:

A Skew Polynomial is given by the equation:

        F(X) = a_{n} \times X^{n} + ... + a_0

where the coefficients a_i \in R and X is a formal variable. Let R be a commutative ring equipped with an automorphism \sigma. Addition between two skew polynomials is defined by the usual addition operation and the modified multiplication is defined by the rule X \times a = \sigma(a) \times X for all a in R. Skew polynomials are thus non-commutative and the degree of a product is equal to the sum of the degrees of the factors. Then the ring of skew polynomials can be written as R[X, \sigma].

Linear Rank Metric Codes are linear error correcting codes over the rank-metric and not the Hamming metric. The distance between two matrices is defined as the rank of the difference between them. And this forms a metric. They are a way to build codes that can detect and correct multiple random rank errors. Gabidulin Codes are an example of linear rank metric codes.

A Gabidulin Code Gab[n, k] over \mathbb{F}_{q^m} of length n (at most m) and dimension k (at most n) is the set of all codewords, that are the evaluation of a skew polynomial f(x) belonging to the skew polynomial constructed over the base ring \mathbb{F}_{q^m} and the twisting automorphism \sigma.

        \text{Gab[n, k]} = \big\{ (f(g_0) f(g_1) ... f(g_{n-1})) = f(\textbf{g}) : \text{deg}(f(x)) < k \big\}

where the fixed evaluation points g_0, g_1,..., g_{n-1} are linearly independent over \mathbb{F}_{q}. These codes are the rank-metric equivalent of Reed Solomon Codes.

Implementation:

During the community bonding period, we started with a small fix to the wtdist_gap method from the module linear_code.py. This involved improving its documentation, changing to a better name and making the method private. We also opened a ticket for a new abstract class for Golay Codes. However, there was not enough time to finish this ticket since the coding period started. 

David had already implemented a class to manage representation of elements of a field extension into a smaller field. We went through the review checklist in Sage. This has now been merged into Sage.

Xavier Caruso had authored a ticket on Skew Polynomials about four years ago. This was not reviewed at the time. And the category framework in Sage underwent massive restructuring and change since then. As a result, this ticket was very incompatible. We first started by resolving the several hundred doctest errors some of which were due to deprecated syntax errors while others were more technical. This ticket was the most difficult and complicated module and it took a lot longer than originally anticipated. Debugging the code ended up breaking Sage several times before all originally present doctest errors were resolved. Once the ticket started working, a lot of the methods and classes still did not have proper documentation and/or tests and we added those. Since this was a huge ticket, we then decided to break it up into five smaller tickets based on the content and dependencies to make the work more manageable. We majorly refactored the code in the first ticket to make it cleaner and then added methods such as operator evaluation, interpolation, minimum vanishing polynomial, etc to the appropriate various classes. For the purposes of the other targets for the GSoC project, we practically needed only the first of the five tickets (mentioned above) which dealt with the basic implementation of skew polynomials and skew polynomial rings. So the primary focus has been to get that merged. The second ticket based on skew polynomials over finite fields has also been worked upon. The remaining three tickets were more or less lifted directly from the original skew polynomial ticket and no effort has been made yet to accommodate for changes to these.

We then created a new abstract base class for Gabidulin Codes and implemented Generator Matrix based and Skew Polynomial Evaluation based encoders for it. This module also included a Gao Decoder for the Gabidulin Code and a Channel class to introduce random rank errors. Along with all of this, basic getter methods and methods to compute some basic combinatorial properties were also implemented. 

Lastly, we worked on a new base class called AbstractLinearRankMetricCode that inherits from the AbstractLinearCode class and from which Gabidulin Code inherits. This class provides methods that are common to any Linear Rank Metric Code so as to facilitate the creation of other such codes. It also deletes those methods from the general linear code class that are not valid in the case of rank metric codes.

Current Status: The tickets related to skew polynomials, Gabidulin Codes and AbstractLinearRankMetricCode are currently still open. We believe that the skew polynomial ticket is almost ready to be merged into Sage. It has been well tested and we believe it is in stable condition. The Gabidulin Code ticket is not as rigorously tested and the Gao decoder in particular needs to be examined further. The completion of this ticket depends on the AbstractLinearRankMetricCode class which requires also requires some work.

I am currently attending Sage Days 75 in Paris to which I was invited by my mentors. We have coding sprints all week long and we aim to wrap up all of the starred tickets (see below) this week. The next section provides links to more elaborate details about the work done during the project.

Code:

All relevant tickets, bug reports and code commits can be found below:

  1. #20565 – Fix LinearCode.wtdist_gap method (enhancement)
  2. #20787* – A class to manage Golay Codes (new)
  3. #20284 – A class to manage representations of elements of field extensions (review)
  4. #13215* – Skew polynomials (enhancement)
  5. #21088* – A class for skew polynomials over finite fields (enhancement)
  6. #21131* – Interpolation/evaluation methods for skew polynomials (content)
  7. #21259* – Karatsuba-based methods for skew polynomials (content)
  8. #21262* – Center-related content for skew polynomials (content)
  9. #21264* – Factoring/irreducibility methods for skew polynomials (content)
  10. #20970* – A class for Gabidulin codes (feature)
  11. #21226* – An abstract class for rank-metric codes (feature)

Tickets in green represent merged code, tickets in blue represent mature code and the ones in brown represent immature tickets.

The daily project progress log that provides further details about the project can be found at the following link:

https://arpitdm.wordpress.com/sage-project-log/

The blog posts related to Sage can be found here:

https://arpitdm.wordpress.com/category/open-source/sage/

 

Skew and Linearized Polynomials

Gabidulin Codes, one of the primary modules of my GSoC project, are Maximum Rank Distance (MRD) codes analogous to Reed Solomon in Rank-Metric. In this post, I give a basic overview of the one of the preliminaries, Skew and Linearized Polynomials, required in their construction in Sage.

Consider a non-commutative ring {R = \mathbb{F}[X; \sigma, \delta]} where we have a ring A of usual polynomials in variable X, a ring endomorphism {\sigma} on A and a {\delta}-derivation which is essentially a map {\delta: A \rightarrow A} such that for all a, b in A,

{\delta(a + b) = \delta(a) + \delta(b)}
{\delta(a + b) = \delta(a)b + \sigma(a)\delta(b)}
This ring with the usual addition and modified multiplication subject to the relation above is called a Skew Polynomial Ring of Endomorphism type. Recall that, a Homomorphism is a structure preserving map from one mathematical object (such as group, ring, vector space) to another. And an Endomorphism is a homomorphism from a mathematical object onto itself.

Given a finite field {\mathbb{F}_q} of order q and its extension field {\mathbb{F}_{q^m}} for some fixed positive integer m, a Linearized Polynomial in variable X is given by

{L(X) = \sum_{i=0}^{n-1} a_i X^{q^i}}

where each of the coefficients {a_i} belongs to {\mathbb{F}_{q^m}} and the exponents of the monomials are powers of q. The degree of this polynomial, also called the q-degree, is n-1. The non-commutative univariate Linearized Polynomial Ring consisting of all such polynomials L(X) over {\mathbb{F}_{q^m}} is denoted by {\mathbb{L}_{q^m}}[X]. Ernst Gabidulin used this ring to build Gabidulin codes where codewords are given by the coefficients of the polynomials, i.e. {C = (a_0, a_1, . . . , a_{n-1}) \in \mathbb{F}_{q^m}^{n} }.

Linearized and Skew Polynomial Rings are isomorphic with matching evaluation maps. More specifically, when the derivation {\delta} is 0 and the endomorphism {\sigma} is a Frobenius Automorphism (recall that an automorphism is an endomorphism that admits an inverse and for a finite field, a Frobenius Endomorphism is always also a Frobenius Automorphism), the Skew Polynomial Ring becomes a Linearized Polynomial Ring. That is to say, the latter are a special case of the former.

Sage has an open ticket that already implements the Skew Polynomial Ring and provides functionality to evaluate a given polynomial and perform basic operations such as addition, subtraction, multiplication and division. Our aim therefore, is to complete the open ticket and use that to create the appropriate, required linearized polynomial ring.

 

 

First Contributions

Setting up for developing Sage is one thing, actually contributing to the development is a completely different ball game. The development process, broadly speaking, consists of the following steps:

  1. Identify a defect, bug, enhancement or new feature that Sage has or requires.
  2. Open a ticket on Trac describing the exact problem.
  3. Enter discussions with other community members on clarifying the issue and discuss solution and implementation ideas.
  4. Write code and submit.
  5. Get it reviewed by other at least one other independent programmer.
  6. If review is positive, the ticket is closed and code is successfully merged. Otherwise, go back to step 3 and repeat.

As a contributor, one needs to know how to do both, review as well as successfully write code. And that is what this post describes. I’ll use tickets that I worked on, to illustrate these aspects of development in further detail. In order to avoid confusion, I’ll use Code while talking about Coding Theory and code to talk about programming.

Reviewing a Ticket:

David and Johan (my mentors for the project) suggested Ticket #20113. The zero method in the linear_code.py file was supposed to return the zero vector of the associated Code. The original implementation called the gens method (the generators of the Code), took the first element of the returned list, multiplied it by 0. This resulted in all zero vector of length of the Code. While the solution wasn’t incorrect, this was a rather tedious way of obtaining it. David had already submitted a patch that simply called the zero element from the ambient space of the Code to produce the answer.

My job was to review this new code and at first glance, this seems trivially correct. And it was. But this may not always be the case. Sage is a massive beast and a small change in one part of the code could break something in some other seemingly unrelated part of the code. There are a lot of things that need to be verified and the best way is to simply follow the Reviewer’s Checklist. I’ll clarify some of the terms from there first and then provide a general rule-of-thumb workflow on how to interact with the git-trac system to review.

  • Doctests and Coverage: The documentation for each of the methods in Sage contains examples of code that explains how to use the method. Python can search for and extract these commands, run them and compare the output to the one mentioned in the documentation. If the patch changes code, the doctests for all the changed methods must still hold true. To do this, run the command `./sage --coverage src/sage/coding/linear_code.py.
  • Manuals and Building: Manuals serve as a reference book and can be incredibly useful for quick reference. Sage adds material to its manuals from the documentation in the source code itself. Which is why the documentation should follow a specific template. See here for examples and notice the formatting and indentation. Run `./sage –docbuild reference html` to build the html version of the “Reference” Manual. Others include _tutorial, developer, constructions and installation_. These are also available in pdf versions.
  • Speedup: Generally, it is hard to check the speed. Standard Python time-management (import time, time_clock()) can be used. The main idea is to ensure that the new code does not utterly slow down Sage. Patchbot reports can also be used but caution should be exercised and error reports should be double-checked. Run methods on large inputs, compare to other softwares that offer similar functionality, try boundary cases and consistency checks based on mathematical foundations.

Review Workflow:

  1. git trac checkout TICKET_NUM (This will create a local branch say t/20113)
  2. git checkout t/20113
  3. ./sage -b (See if the code builds without errors)
  4. Run through the Checklist above and note down errors, if any.
  5. Run various test examples in an attempt to “break” the code.
  6. Go to Trac and write comment(s).
  7. Change ticket status.
  8. Write your full name in the Reviewer field of the ticket.

Writing a Ticket:

David and Johan suggested improving the efficiency of the decode_to_code method of the Nearest Neighbour Decoder. I opened a Ticket #20201 on Sage to fix this. The previous implementation created and stored the distances between the received word and every codeword. It then sorted the entire list in order to find the closest codeword. This took exponential memory and time in the size of the Code which can be very inefficient for large input. An obvious solution is to compute the Hamming weight for the first codeword and set that as the minimum. Then, iterating over the rest of the codewords and updating the minimum drastically brings down the memory and run time requirements.

There are some major differences when it comes to actually opening a new ticket and writing code for it. Whenever you write new code in Sage, you might want to add some print statements and code, run the files, test, make small changes and test again. In that case, you have to build sage (./sage -b) or in some cases even run make distclean &amp;&amp; make which can take a very long time. Instead, the best way is to write your new code in a myfile.sage file and then run ./sage myfile.sage.

Write Workflow:

  1. Open a new ticket on Sage. Or if a ticket that has commits already exists, create a local branch on your system. And then pull the changes from the ticket onto your branch. Do not directly checkout the ticket. It can result in very weird errors.
  2. Discuss solution idea on trac.
  3. On the local branch (NOT master) corresponding to the ticket, write your code.
  4. Test your code.
  5. git add <changed_file(s)>
  6. git commit -m "insert detailed message here."
  7. `git trac push TICKET_NUM`
  8. Once you’re satisfied with your commits, set the ticket field to “needs_review”.
  9. If positive, ticket closed. Else, go back to step 2.

As a final note, Sage is guaranteed to break down at any point for seemingly no valid, discernible reason and I’ve lost track of the time I’ve spent in trying to fix it. But in doing so, I’ve learnt a lot about the codebase and once you successfully manage to rebuild Sage, there is a weirdly awesome feeling of accomplishment!

On Applying to Sage for GSoC

Sage is a viable open source mathematics software package that offers an extensive toolbox
for algebra, combinatorics and numerical computations. This massive project can seem daunting to someone who wants to begin contributing to it, especially in the short GSoC application period. In what follows, I cover a constantly updating collection of advice for new programmers, to simplify the process of applying to Sage.

1) Where to find Sage

Mailing Lists:

The following are the primary ones:

  • sage-devel – development discussion group of Sage
  • sage-support – help in running/using Sage
  • sage-gsoc – related discussions regarding the GSoC program

Apart from these, there are tons of other topic based lists dedicated to specific modules. For example, sage-coding-theory. When applying to GSoC, find a project from the Ideas page and post on sage-gsoc (with a cross post on the related topic-based list, if it exists) indicating your interest in the project and request more details; instead of simply asking for help in “getting started” since one is “very interested in contributing to open source”.

Source Code:

  • Source Code – Download your copy of Sage from here
  • git.sagemath.org – Online Sage repository. A good way to read the code online
  • Documentation – Developer’s guide to the Sage
  • Trac – Organizing development and code review through tickets

2) Contributing to Sage

The following is more of a cheat sheet to setup everything you need so you can begin contributing. When in doubt or need of further clarity, always refer to the official documentation. It is really well written!

Installation:

  1. Install Sage: Assuming the source code is downloaded, unzip it and go inside the directory. And run “make”. This should take a few hours. Once it finishes, you can run it by typing “./sage” on the command prompt.
  2. Configure git: Assuming you have git installed, run the following two configuration commands (if you haven’t already)

    git config –global user.name “Your Name”
    git config –global user.email you@yourdomain.example.com

  3. Install git-trac:Now, all development tasks can be performed in the usual way using git and a browser. Alternatively git-trac can simplify the communication process between git and trac. To use this, run:

    git clone https://github.com/sagemath/git-trac-command.gitsource
    git-trac-command/enable.sh (enables it temporarily)
    cd git-trac-command
    sudo python setup.py install

  4. Obtain Sage Trac account: Development happens through tickets on Trac. In order to comment, review, commit or interact in any other active way with the source code, a Trac account is required. To do that, send an email to sage-trac-account@googlegroups.com, containing your full name, preferred username, contact email and reason for needing an account. Once the request is approved, a username and password will be assigned.
  5. Configure git-trac: Authentication mechanism is required for read-write access. Simply put, sage needs to be told about trac with verification. To do this, run:

    git trac config –user USERNAME –pass ‘PASSWORD’
    ssh-keygen (to generate a public/private rsa key pair)

    Run the Sage prompt (./sage) and upload it to trac using the command “dev.upload_ssh_key()“. Use the credentials obtained above for completing the authentication.

Again, the Sage documentation is magnificent and it should be referred to in case of doubts or further explanations. Once all this is done, one can start contributing by checking out an existing ticket or opening a new one. There is a lovely cheat sheet on how to do all that, already available. The way development happens is, someone opens a ticket regarding a bug, defect, new feature or enhancement and writes code for it. Once the commit is ready, it is opened for review whereupon an independent second programmer checks it and suggests changes or gives it a thumbs up.

Communicating with the Sage community:

While the people in the community are extremely helpful, it is necessary to ask for help wisely. Through the communications, three aims need to be achieved:

  1. Form a positive impression on the mentors and admins: Ask precise questions (and don’t hesitate to ask them), ask for help if you don’t understand something but do your homework before asking. For instance, I was asked to review a tiny patch and I didn’t know what review entailed. Sage documentation has a Reviewer’s Checklist and I tried to follow that and asked questions when I didn’t understand something from there.
  2. Make a small contribution(s) to the code: Ask for tickets related to the project you are applying for since that will not only help you understand the development process better, but also the module itself. Ideally, I’ve found that a review of one ticket and writing another new ticket on an associated module can suffice.
  3. Understand and further define the project: Mentors have usually given the project more thought and therefore it is a good idea to talk to them and get an idea about what they have in mind for it first. What is not available, is an exact statement on what all needs to be added and how. Ask for reading material, get clarity on the various modules that need to be implemented as part of the project and come up with a plan on what needs to be done to achieve that.

3) Writing a GSoC Proposal

A good proposal cannot be prepared without a proper understanding of the organization and the project idea itself. And simply having a good understanding is not enough, if that is not visible through the proposal. There is no right way of preparing a proposal. The best way is to go over as many of them as you can find and then suitably adapting elements from each to suit the particular project in question. Here’s mine for what it’s worth: Rank-Metric Codes in Sage

There’s plenty of other advice on how to write a good proposal available; the recurring points being concise writing, clarity on project deliverables, structured timeline and review from mentors.