ICLR Reproducibility Interview #2: Arnout Devos, Sylvain Chatel, Matthias Grossglauser
Interview by Cecelia Shao
The second interview of our series was with Arnout Devos, Sylvain Chatel, and Matthias Grossglauser. Arnout, and Sylvain, are PhD students at the Swiss Federal Institute of Technology in Lausanne (EPFL), while Matthias is the head of the Information and Network Dynamics Lab at EPFL.
Interested in learning more about the Reproducibility Challenge? Read our kick-off piece here
Interview recap (TLDR)
The challenges that the team faced included:
- Missing seeds and training files
- Lack of details for the convolutional parameters (filter width, padding, and stride), as well as stopping criterion.
- Given more time, the team would try distributed learning, given the large size of the datasets and computationally intensive nature of the tasks.
Reproducibility Report Details:
- Original Paper: https://openreview.net/pdf?id=HyxnZh0ct7
- Code: R2D2 — https://github.com/ArnoutDevos/r2d2
- MAML with CIFAR support: https://github.com/ArnoutDevos/maml-cifar-fs
- PR (contains reviews):https://github.com/reproducibility-challenge/iclr_2019/pull/150
Note: this interview has been edited and condensed for clarity
Cecelia: Just to kick things off, can you please introduce yourself?
Arnout: Hello. My name is Arnout. I’m a PhD student here at the Swiss Federal Institute of Technology in Lausanne (EPFL). My collaborators Sylvain and Matthias couldn’t be here. Sylvain is a PhD student at EPFL and Matthias is the professor who heads the Information and Network Dynamics Lab at EPFL.
Cecelia: Gotcha, and how did you find out about the challenge and why were you interested in participating?
Arnout: So near the end of October, I think, Joelle Pineau from McGill University gave a talk at EPFL about reproducibility and reinforcement learning. That’s where I learned about the challenge taking place. At the time I was doing this machine learning course and I asked the professor whether we could use the Reproducibility Challenge as kind of a final project for the course. He was enthusiastic about it and that’s how it all went down.
I think in the end, there were five or six teams from the EPFL machine learning course who took part in the challenge.
Cecelia: What made you propose this idea to your professor?
Arnout: It was partially because I think reproducibility is important, and also when looking through the papers , I found some really interesting meta- learning papers, and that’s what my PhD is basically about, or what I am trying to have my PhD be about. So it was perfect for me, and also for Sylvain, as I discovered later on.
Cecelia:Have you ever reproduced a paper or tried to reproduce research?
Arnout: Yes. While I was getting my computer science degree in the US at USC, we tried to reproduce work from DeepMind. It was very hard to do, because they had amazing results but did not release any code.
Cecelia: Is there a specific reason why you chose this paper?
Arnout: We picked this paper mostly because we only got to know about the challenge at end of October.
Basically the differentiable closed formed solver is a linear regression that’s done on top of a deep convolutional neural network. I thought the paper was pretty understandable. Also given that I had researched meta-learning for a while already, the concept was easy to grasp.
We only started on the first of November, with the deadline being December 20th. So we wanted something that was manageable and was useful towards our own research and we could also learn something.
Cecelia: Can you describe how you approached trying to reproduce the paper?
Arnout: So the first thing we did was read. We skimmed the through the paper, trying to grasp the general concept. Then, and this might seem very simple at first, we checked whether the numbers referenced in the paper were actually correct (accuracy of the baseline, etc.). Sometimes, numbers are copied incorrectly, so we verified that they were consistent with the referenced papers.
Then we went onto choosing a baseline. Given the short time frame, we chose to use MAML, which is model-agnostic meta learning. We just reran that code, and also noticed some very small differences in the results.
This is because actually it’s very hard to reproduce the results even with the original author’s code, but at least it was very near to the results mentioned in the original MAML paper. You can also find those results in our reproducibility work.
Following Joelle Pineau’s Machine Learning Reproducibility Checklist, we decided to provide a more clear algorithmic description of the proposed R2D2 algorithm in the paper. While this might look like a copy of what is described in the paper, we think it creates a lot more clarity in how the procedure is set up, and thus increases reproducibility.
To be honest, regarding the checklist, we did not provide error bars in our results figures, so as to not clutter them. We did provide all necessary information in tables next to the figures. We then went on to implement the proposed algorithm in the paper. We split the work a bit between ourselves. We were three team members, and we started hacking away at that.
One thing which is worth mentioning is that at the end of the convolutional stage, they get some number of output features, and we had to make some assumptions, because some basic parameters like filter width, padding, and stride were not mentioned in the original paper.
Cecelia: And you mentioned that when you when you started off with the MAML code, and tried reproducing that first, your results actually didn’t line up. So it’s almost like multiple levels of reproducibility.
Arnout: So the MAML paper has code available online and it’s been tested heavily. People who have tried to implement it again have had a hard time reaching the same performance.
There is a paper ‘How to Train Your MAML’, which shows how to make a training process more stable and we deliberately tried to not use that one because it would invalidate the comparison towards the original MAML.
Another thing is that the data sets are a bit hard to acquire. You have to download the data set manually because it’s based on the ImageNet and CIFAR datasets. The CIFAR few-shot dataset is used in the new paper that we tried to reproduce, so we had to adapt MAML to ingest this new dataset.
Seeding is also another issue, because there is so much variability in seeding, in loading the data sets, seeding in initialization. This can all influence the way you learn. How parameters are initialized affects the final result.
Cecelia: Are those the only challenges that you remember from reproducing the papers?
Arnout: Our limiting factor was mostly time, because we started the challenge pretty late, so in the end, we could only compare the results with MAML and reproduce the paper.
Cecelia: A follow up question is if you had more time, would you have changed your initial approach at all? Would you still have started from the MAML paper?
Arnout: We’d still have to start from the MAML paper. In the end, it turned out that the authors of the paper started from prototypical networks. They adapted their code based on that. They didn’t write it from scratch either but this happens a lot in the Meta-Learning space or in research in general I guess.
Just running the Meta-Learning experiments takes a while. It takes two or three hours on the machines that we have. If we had more time, maybe we could have considered training in a distributed manner.
Cecelia: You mentioned that some of the parameters were missing. Did you get a chance to communicate with the authors about any of those questions that you had or things that were missing?
Arnout: That’s a great question! We did get in touch with the original authors.
The reproducibility challenge encouraged interaction with the authors. and so we made a small summary of our findings and put it on the Open Review platform and the authors actually responded and updated their paper with the convolutional parameters and also were more clear about, for example, the stopping criterion which was initially a bit vaguely defined as “if the error does not drop significantly for twenty thousand iterations”
Cecelia: Last question, now that you’ve done the challenge, has it changed your perception of research and how you’ll approach your own work?
Arnout: Yes. As I mentioned, I’d look more into seeding, and how to get the same result every time. I hope frameworks like Tensorflow, PyTorch, and OpenAI Gym provide either more clarity or instructions on how to get this.
In my own research, I’ll try to release code that is reproducible, whenever possible. In the end, the thing you want from science is that you have work that you want to reuse in other more application or further research.
Our next interview will be published next Wednesday. Make sure to follow us on this blog to stay updated.
Interested in how Comet.ml can help your machine learning team automatically track, compare and reproduce experiment results? Find out more here.