ICLR Reproducibility Interview #4: Aniket Didolkar

Reproducing h-detach: Modifying the LSTM Gradient Towards Better Optimization


> Interested in learning more about the Reproducibility Challenge? Read our kick-off piece here

Our final interview with Aniket Didolkar covered his work reproducing a method to address the exploding gradient problem in LSTMs. Aniket was the only solo participant in the ICLR Reproducibility Challenge.

Read on to see our discussion around hacks for dealing with long-running training jobs and the importance of organization when reproducing research.

Interview recap (TLDR):

  • Aniket was able to retrieve the code repository from the paper’s authors and reproduce all the experiments except one around the image captioning task due to time constraints.
  • For Aniket, reproducibility means having details around hyperparameters, tuning the learning rate, library and package versions, pre-processing, data pointers, and more. Working in an organized manner with code repositories and more transparency around tested, failed hyperparameters can help the community get closer to reproducibility.
  • Given more resources, Aniket would not have had to deal saving everything before his instance gave out and downloading the data repeatedly. He also would have ran the experiments with many more seeds. Note: the authors did specify that training was slower because they were training it sequentially (normally PyTorch and Tensorflow send data in parallel)

Reproducibility Report Details:

Team members:

The Interview

> This interview has been edited for readability

Cecelia: To kick things off. I’d love for you to introduce yourself.

Aniket Yeah. My name is Aniket. I’m a third-year undergraduate studying at Manipal University here in India. The first time I got interested in deep learning was in my third semester last year and since then I’ve taken up a lot of courses and worked on a student project here at my university related to robotics and I’ve interned at a company that works with NLP. That’s basically all the work I’ve done in machine learning.

Cecelia: How did you find out about the Reproducibility Challenge?

Aniket: I found out on Quora in an answer to a question. What are some good ways to contribute to AI? What can I do to make myself better? I don’t remember who, but the answer was from professor Yoshua Bengio to take this Reproducibility Challenge.

Cecelia Why were you interested in participating?

Aniket: Because I am actually very interested in deep learning and I wanted to do a good project and I thought reproducibility would be really good chance to learn a lot and in depth about the paper that I took up. Sometimes I’m reading a paper and I don’t understand it properly but actually going in and implementing helps understand it more in depth and better.

Cecelia: How do you define reproducibility?

Aniket: If the author reports certain experiments, they should be able to be reproduced by anyone under all circumstances. For example, I think I’ve found this problem a lot. Whenever I’m working, especially with reinforcement learning, there are so many papers but you’re not really able to reproduce the results and I think that is a problem. But I think the Reproducibility Challenge is a great initiative, because they have this condition that some of the experiments, you have to reproduce them and that’s really good for the community.

Cecelia: For the papers that you have tried to reproduced and had challenges, what kind of issues did you run into?

Aniket: For example, I was working in reinforcement learning and we were working on certain algorithms like PPO and we couldn’t get our networks to converge. We were following the paper but there were certain things where they just wouldn’t converge and we were not aware of what was wrong with it. I think one of the most important reasons was that there was no supporting code. I think when there is code, it becomes very easy for people to read it alongside the paper and understand it more clearly.

Cecelia: Do you think reproducibility is important? And if so why?

Aniket: Yeah it’s important because the research community keeps on publishing a lot of new techniques and new papers but they should be able to apply this in real life. There are so many startups that are coming up and so many companies in the industry that should be able to apply them and a lot of these algorithms should be reproducible. It would be easier for other researchers to build upon those algorithms.

Cecelia: Can you describe if there a specific reason why you chose this paper?

Aniket: The paper was based on LSTMs. This is a new algorithm for training LSTMs which handles the exploding gradient problem in LSTMs. And the reason I chose this was because I was a bit comfortable with LSTM; I had worked with them before and I was pretty confident about this. And I really liked the idea — I thought it simple and I thought it was pretty effective and I was able to reproduce the results so it was very effective.

The authors reported that when the weights of the LSTM are large, the gradients along the linear temporal path get suppresses. This path is important as it carries information about temporal dependencies. To prevent this, the h-detach blocks gradient through the h-state stochastically..

Cecelia: Can you describe how you approached reproducing this paper from beginning to end?

Aniket: Basically, first I read the paper two or three times then I actually implemented a smaller version of it and I found it was it was a little bit slow as compared to other LSTM models that I ran before like usual LSTMs implementations are in PyTorch or Tensorflow. So I contacted the authors about that and they were very kind and they gave me the code repository of their own implementation and they said that it was slow because they were training it sequentially and generally, PyTorch and Tensorflow send in the data in parallel. They did this to ensure the correctness of their algorithm basically and after that I used their repository because it’s possible that I might’ve made some small mistakes. With their repo, I could focus on tuning hyperparameters and all. So I used their repo directly and I conducted experiments that they had given in the paper. So basically there was copying task where initially the LSTM is given a certain sequence of numbers and asked to reproduce the sequence after a particular time interval. There’s a sequential MNIST task where the MNIST data given to the LSTM pixel by pixel. And then they had to perform ablation studies to prove that their approach is very effective as compared to the normal LSTMs. And in fact compared it to another algorithm called c-detach — basically detaching the gradients in the c-state of the lstm instead of the other part that I told you about. And, in their experiments they proved that it was really effective … There’s one experiment that I didn’t reproduce called the transfer copying task because of lack of time.

Cecelia: And from end to end — from you reading the paper and finishing this challenge — how long did that take you?

Aniket: It took me about a month and a half. The main bottleneck was basically the training of the network. For every experiment, it took about three or four days.

Cecelia: OK. So you mentioned that you reached out to authors and they actually gave you this code repository to work with which is great. Do you know why they didn’t include the code repository in the paper to begin with?

Aniket: I’m not sure. Yeah. They hadn’t included it. They might have included it now but I’ve not checked it yet. At that time, they didn’t include it — I’m not sure and I didn’t ask them about it.

Cecelia: Gotcha. Did you communicate with the authors through Open Review or did you somehow get their contact information?

Aniket: I got their email information and I contacted them through email.

Cecelia Gotcha. Okay, so it seems like once you had the code repository, it seemed pretty straightforward but in the time period before you had the code, what were some of the challenges that you encountered trying to reproduce the paper?

Aniket: First, initially I read the paper. I had a fair idea about the algorithm but I was not sure if I was completely correct about it so I had to go through it and read it two three times. I came up with a small implementation of it and I tested it and it was working like it was showing good results but I was still not sure. Like I told you, it was really slow. With their implementation, it took three or four days, but with my implementation, it was even a bit slower so I was a little worried about that because I wasn’t sure if I would be able to finish all the experiments. That’s why I contacted them and asked about this.

Cecelia You described your implementation as a small implementation. Can you go into more detail? What do you mean small?

Aniket Basically what I did was take a sample dataset and I just implemented it in PyTorch for the same LSTM implementation that they mentioned and just used it directly.

Cecelia Gotcha. So you just took some sample data from somewhere.

Aniket But it was at that time I didn’t make a proper repository keeping in mind all the experiments I had to run. Just a test run.

Cecelia So when they gave you this repository that they had worked on, what was inside of it? It was just a code or do they also include the data that they used?

Aniket The data was…so for the copying task, the data is generated by the code itself. With MNIST, the data can be downloaded, which the code for downloading the data they included in their code and they had different branches for their separate experiments that they conducted. It was fairly easy — I cloned it and it ran immediately. I didn’t have to make any changes.

Cecelia Got it. It’s good that it was runnable code. When you ran the experiments initially, it took three or four days for the experiment to run fully. What were you running your experiments on initially?

Aniket: Initially, I was running them on Google Colab.

Cecelia: Okay and what kind of environment did they use?

Aniket: The code was in PyTorch, which I’m very comfortable with. The test code was also in PyTorch. I’m not sure which GPUs they were using, but I was using Colab. I had access to a GTX 1060 in the middle for some time and that time I used it but most of the time, I was using Colab.

Cecelia: Gotcha. So did you… the experiments ran slowly at first, but would you say that you faced computational limits when you were trying to reproduce the paper?

Aniket: Yup. For example the major thing was that it took three days to run. So for one experiment, Google Colab only runs for twelve hours. So I have to keep restarting and save everything and download the data again and again, so that was the main constraint.

Cecelia: Gotcha. That’s that’s pretty frustrating. So what would you do differently if you had more time or more resources?

Aniket: If I had more resources…currently I was able to run the experiments but I only ran the experiments two seeds while the author reported results with many seeds. I would do that with more resources. I would test the algorithm with many more seeds. There was one experiment [around the image captioning task] that I didn’t replicate which I would do if I had more time.

Cecelia: It seemed like the authors of the paper were receptive.

Aniket: They were pretty supportive. I even had certain doubts about the algorithm itself and the concepts in the paper and I just emailed them and they gave me a reply and explained it properly.

Cecelia: And can you can you explain why, because some other people who did the challenge posted on Open Review and they couldn’t find the author’s email. So if you couldn’t find all those e-mail, how would you have tried to communicate with them? Would you have asked question on Open Review?

Aniket: I would be a little bit more careful with asking on Open Review. I would try to figure it out myself because it’s a public forum but I easily obtained the email ID and I thought it would be easy to just e-mail them.

Cecelia: Having done the challenge, you described your sentiments on reproducibility before, did the challenge change your perception of machine learning research?

Aniket: No, it didn’t change. I want machine learning research to go in a direction where we are consistently publishing papers, but one thing I want to change is that every paper should release some code along with them to just give a basic idea to the people of how to use the algorithms in the paper.

Cecelia: So to replicate an experiment, you think just the code is necessary?

Aniket: Yeah I think code or even maybe every paper should come with a section for the algorithm that explains their methodology and how everything works.

Cecelia: Having done the challenge, how will you approach your own work differently?

Aniket: Yeah I think having done the challenge, what I will do differently is that whenever I get into something, I try to first create a repository, organize it, and keep it organized from start to the end of the project. So that’s one thing that I learned from doing this project — working in a more organized manner with your code repositories.

Cecelia: So if you were to start a repository today or recommend a repository structure, what would be included in that repository?

Aniket: So basically in different branches have the code for each of the experiments that I would report. And whatever data my algorithm requires or the pointers should have data and ways to download and any pre-processing that the data requires. I think the code repository for any NLP project should have the pre-processing steps.

Cecelia: Do you think the experience would have been different if you were collaborating with multiple people or if you had a group that you were working with?

Aniket: If we had a group, we could have had more experiments done, but that’s the only thing: we could have done more experiments. But apart from that, I don’t think there would be any difference.

Cecelia: What do you think would help make machine learning research more reproducible and the follow up to that — to kind of frame the first question — is what are the challenges that you think researchers face in making their work reproducible today?

Aniket: I think there are lot of things when it comes to machine learning to get something to converge: there are these small hyperparameters that make a lot of difference, how you tune the learning rate and everything — all that. So I think that is one of the main obstacles to reproducibility. Sometimes the code repository — and this is something that happens with me also and I’ve seen people — is so disorganized that they don’t actually release it. And secondly, there are a lot of things that you have to import from outside and different software that you have to use. This might not be feasible to open-source code. To include documentation around how to run it would be very lengthy. Sometimes it may happen for running any piece of code, it depends on the software that you’re using and different versions. Reporting all of that would be very lengthy some days and that’s why people don’t open-source the code.

Cecelia: So you described a couple of different challenges — things around all the hyperparameters, external imports, just like all these nuances — would you consider those details as part of a code or outside of the code?

Aniket: I think sometimes when people display their results, they will only show a certain set of hyperparameters and they don’t talk about the other hyperparameters they tried and failed. For my paper, the hyperparameters that the authors tried were on the peer review site and not the paper or in the code repository. When I started reproducing the paper, the comments were not on Open Review yet and they stacked up over time.

Cecelia: Yeah so it’s helpful to have that initial discussion. Do you think there are tools or best practices that can make this process easier and better?

Aniket: I don’t think there are set best practices, but this initiative, the Reproducibility Challenge, can be done for every conference and not just ICLR and to have a proper review should be a practice that can be adopted over all major conferences.

Cecelia: Do you think there are certain fields of machine learning that are easier to have reproducible than others?

Aniket: From my experience, I think whenever I worked on something like NLP or computer vision and have worked with a particular paper, I was able to reproduce it, but when I work on reinforcement learning, I have had a lot of problems.

Cecelia: And why do you think that is?

Aniket: I’m not sure what actually but it’s a thing. I don’t have an answer. I’m not actually sure, but I’ve observed it a lot. I think it may be due to that RL algorithms are difficult to understand but if supporting code were provided it would easier to understand the paper. Also, sometimes an algorithm that works well on one environment doesn’t work well on other environments. So I think it is really important for the research community to focus on building RL algorithms that generalize and give consistent results across different environments.

Want to learn how Comet can help you track, compare, explain and reproduce your machine learning experiments? Find out more.

It’s easy to get started

And it's free. Two things everyone loves.