Iterative Classroom Teaching

This post summarizes our paper which will be presented at the 33rd AAAI Conference on Artificial Intelligence (AAAI-2019).

Introduction

I was fascinated by the idea behind Machine Teaching ever since I was introduced to it in The Teaching Dimension of Linear Learners (JMLR-2016) by Liu, et al. Machine teaching is the inverse problem of machine learning. In machine learning, we are given a set of training examples and the goal is to design a model that accurately “learns” the target concept. In machine teaching, the objective is to design an optimal set of training examples (e.g. smallest size) to accurately “teach” the target concept to a given learning model.

Our paper takes inspiration from two ideas; teaching a single learner in the online setting introduced in Iterative Machine Teaching (ICML-2017) by Liu, et al, and the complexity of simultaneously teaching multiple learners in the offline setting studied in No Learner Left Behind (IJCAI-2017) by Zhu, et al. We focus on simultaneously teaching a classroom of diverse learners in the online setting. More specifically, we (i) design new, state-of-the-art teaching algorithms with provable guarantees under complete and incomplete information paradigms, (ii) discuss classroom partitioning strategies for improved teaching and learning experiences, and (iii) experimentally evaluate the performance of our algorithms on simulated and real-world data.

The Model

Our model consists of the following elements.

  1. Parameters: The feature space of training examples, the label set (discrete or continuous depending on classification or regression), and the feasible hypothesis space including the target hypothesis comprise our primary parameters.
  2. Classroom: The classroom is a set of online projected stochastic gradient descent learners. Each learner is defined by two internal parameters, a time-varying learning rate, and a time-varying internal state.
  3. Teacher: The teacher is characterized by the amount of information she has about the students in the classroom. Knowledge represents information regarding the learning rates of the students and Observability represents information about the internal states.
  4. Teaching Objective: We consider two objectives, either to ensure that every student in the classroom converges to the target concept or to ensure that the entire classroom on average converges to the target concept, as quickly as possible.

We consider different scenarios wherein the teacher has complete information about all the parameters and perfect Knowledge and Observability, or limited Knowledge and perfect Observability, or perfect Knowledge and limited Observability. In each of the scenarios, the model functions over a series of interactions between the teacher and the classroom. At every interaction (or time-step), the teacher receives updated information about the classroom (as defined by the scenario in question). She then chooses a training example and the corresponding label from the feature space and label set respectively, and provides it to the classroom. Lastly, each student in the classroom performs a gradient update step. This cycle repeats until the teaching objective is achieved.

The core idea of our algorithms is the training example construction step. The trick is to pick that example at every time step which minimizes the average distance between students’ internal states and the target concept. This translates to solving an optimization task. In general, this is a non-convex problem, but in the case of learners following a squared-loss function, we present a closed-form solution. For each of the scenarios, we also provide theoretical guarantees for the number of examples or iterations the teacher requires to teach the target concept.

Classroom Partitioning

Individual teaching, the notion of teaching one student at a time, is extremely useful for the individual student because she receives personalised training. However, it is extremely cumbersome for the teacher because she has to expend extra effort in producing these personalised resources. On the other hand, having a large classroom of diverse students reduces the teacher’s effort while increasing the workload on the students. A well-studied solution from pedagogical research is to partition the classroom into smaller groups. However, there isn’t a consensus on the optimal partitioning strategy.

Figure 1: Teaching Paradigms

We propose two straightforward strategies for homogenous partitioning that follow from our model setting. The first is grouping based on learning rates of students and the second is group based on internal states. The first idea is helpful because faster learners can quickly converge to the target concept whereas slower learners would require more samples. Placing the fast learner in the same group as a slow learner adds unnecessary extra effort on the fast learner. The second idea is helpful because it groups similar kinds of students together. See the next section for a more concrete example of this.

Experiment: Classifying Butterflies and Moths

We conduct an experiment on real-world data to demonstrate the usefulness of our teaching paradigm on a binary classification task. Our data consists of 160 images (40 each) of four species of insects, namely (i) Caterpillar Moth, (ii) Tiger Moth, (iii) Ringlet Butterfly, and (iv) Peacock Butterfly. Given an image, the task is to classify it as a butterfly or a moth. Figure 1(a) below, represents a low dimensional embedding of the data set and the optimal separating hyperplane (target concept).

Figure 2: Classifying Butterflies and Moths

We also have initial hypotheses (internal states) of 67 actual human learners for this classification task from Amazon Mechanical Turk workers. These learners represent our classroom. Conceptually, we identify four different types of students based on their internal states as shown by the four different coloured lines. For instance, the group P1 (green) represents those students that incorrectly classify Tiger Moths as butterflies, but can accurately classify the other three species. Similarly, P4 comprises of students who classified every image as a butterfly. What P1 needs are examples of Tiger Moths so they can learn to identify them as moths and not butterflies. Showing them examples of (say) Ringlet Butterflies adds unnecessary effort since they can already identify them correctly.

Figure 3(c) illustrates these ideas more concretely. It depicts thumbnails of training examples generated by our algorithm for the entire classroom (CT) as well as for each group in the the partitioned classroom. Without specific context of butterflies and moths, the algorithm intelligently chooses Tiger Moths examples only for P1, and Tiger and Caterpillar Moth examples alternately for P4. This allows for efficient teaching of the target concept and a good balance between teacher effort and student workload. In the paper, we also demonstrate the performance of the algorithms on simulated data.

Conclusion

Our work reveals the potential of machine teaching for online learning. We achieve small training sample complexity that is an exponential improvement on the standard individual teaching strategy. And we show the applicability of our model in generating interpretable training examples in real-world situations.


For further details about our models, algorithms and experiments, please check out our full paper that has been accepted to AAAI 2019. For questions and comments about the work, please feel free to drop an email to me (Arpit Merchant) at arpitdm [at] gmail [dot] com.

Readings in 2018


Following is the list of books, stories, and essays I read in 2018.

BOOKS:

(I) Non-Fiction:

  1. The Undoing Project, by Michael Lewis
  2. Engines of Creation, by K. Erik Drexler
  3. The Age of Em: Work, Love, and Life When Robots Rule the Earth, by Robin Hanson
  4. Data Feminism, by Catherine D’Ignazio and Lauren F. Klein
  5. How Professors Think, by Michelle Lamont
  6. The Sense of Style, by Steven Pinker
  7. In Spite of Gods, by Edward Luce

(II) Fantasy Fiction:

  1. Alloy of Law – Book 1 of Mistborn Era 2 (Wax and Wayne Series), by Brandon Sanderson
  2. Shadows of Self – Book 2 of Mistborn Era 2 (Wax and Wayne Series), by Brandon Sanderson
  3. The Shadow Rising – Book 4 of the Wheel of Time Series, by Robert Jordan
  4. The Dragonbone Chair – Book 1 of the Memory, Sorrow and Thorn Trilogy, by Tad Williams
  5. The Wise Man’s Fear – Book 2 of The Kingkiller Chronicle, by Patrick Rothfuss
  6. The Fires of Heaven – Book 5 of the Wheel of Time Series, by Robert Jordan
  7. Lord of Chaos – Book 6 of the Wheel of Time Series, by Robert Jordan
  8. A Crown of Swords – Book 7 of the Wheel of Time Series, by Robert Jordan
  9. A Darker Shade of Magic – Book 1 of The Shades of Magic Trilogy, by V.E. Schwab
  10. The Colour of Magic – Book 1 of the Discworld Series, by Terry Pratchett
  11. Elantris, by Brandon Sanderson
  12. The Armored Saint, by Myke Cole

(III) Science Fiction:

  1. The Long Way to a Small, Angry Planet – Book 1 of the Wayfarers Series, by Becky Chambers
  2. The Ninefox Gambit – Book 1 of the Machineries of Empire Trilogy, by Yoon-Ha Lee
  3. Semiosis, by Sue Burke
  4. Children of Time, Adrian Tchaikovsky
  5. Autonomous, Annalee Newitz

(IV) Speculative Fiction:

  1. Station Eleven, by Emily St. John
  2. Winter Tide, by Ruthanna Emrys
  3. The Obelisk Gate – Book 2 of the Broken Earth Trilogy, by N.K. Jemisin
  4. The Stone Sky – Book 3 of the Broken Earth Trilogy, by N.K. Jemisin
  5. All The Birds in the Sky, by Charlie Jane Anders
  6. Too Like The Lightning – Book 1 of the Terra Ignota Series, by Ada Palmer

(V) Rational Fiction:

  1. Luminosity – Book 1, by Alicorn

(VI) Historical Fiction:

  1. The 7th Function of Language, by Laurent Binet

NOVELLAS:

(I) Science Fiction:

  1. And Then There Were (N-One), by Sarah Pinsker
  2. Binti – Book 1 of the Binti Trilogy, by Nnedi Okorafor
  3. A Series of Steaks, by Vina Jie-Min Prasad
  4. All Systems Red – Book 1 of the Murderbot Series, by Martha Wells

(II) Speculative Fiction:

  1. Envy of Angels, by Matt Wallace
  2. The Slow Regard of Silent Things, by Patrick Rothfuss

(III) Rational Fiction:

  1. The Dark Lord’s Answer, by Eliezer Yudkowsky
  2. The Metropolitan Man, by Alexander Wales
  3. Branches on the Tree of Time, by Alexander Wales

SHORT STORIES:

(I) Speculative Fiction:

  1. Mono No Aware, by Ken Liu
  2. The Book of Sands, by Jorge Luis Borges
  3. Is God a Taoist?, by Raymond Smullyan
  4. The Sword of Good, by Eliezer Yudkowsky
  5. A Space of One’s Own, by Steve Rasnic Tem
  6. Vault, by D.A. Xiaolin Spires
  7. There Will Come Soft Rains, by Ray Bradbury
  8. When We Were Starless, by Simone Heller
  9. The Falls: A Luna Story, by Ian McDonald
  10. The Birding: A Fairy Tale, by Natalie Theodoridou
  11. Octo-Heist in Progress, by Rich Larson
  12. The Love Letters, by Peng Simeng

ESSAYS:

  1. Lessons Learned from Feminist Film Studies, by Tara McPherson
  2. Artificial Intelligence Policy in India: A Framework for Engaging the Limits of Data-Driven Decision-Making, by Vidushi Marda
  3. You and Your Research, by Richard Hamming
  4. Tolkien, Lewis and the explosion of genre fantasy, by Edward James
  5. Magical Realism, by Sharon Sieber
  6. Science Fiction and the Life Sciences, by Joan Slonczewski and Michael Levy
  7. In Praise of the Research University, by Harry Mairson