ICLR Reproducibility Interview #1: Francesco, Samuel, Emiljano

Reproducing the paper ‘Learning Neural PDE Solvers with Convergence Guarantees’


Results of a hyperparameter search in order to find the optimal number of layers and learning rate. See the full report here

> Interested in learning more about the Reproducibility Challenge? Read our kick-off piece here

For our first interview, we had a group discussion with masters students at École polytechnique fédérale de Lausanne (EPFL). Francesco, Samuel, and Emiljiano participated in the Reproducibility Challenge together as a final project for their EPFL Machine Learning Course.

In our interview, we covered their thoughts on what a fully reproducible paper should contain, their efforts to dig into missing details from the PDE paper, and how their commitment to reproducibility changed as their academic experience deepened.

Interview recap (TLDR):

  • The team was able to reproduce the first set of experiments, but not the second, more complex approach proposed in the paper.
  • Some issues that the team faced included: lack of access to data and code, and lack of details around the second multi-grid approach. There was also missing information around: how the model was trained, the shape of the data, what the inputs and outputs were for the model, the loss function, the regularization term, library versions and which machine was used to train the experiments.
  • To achieve reproducible research, the team mentioned that “ all the proofs should be clear with no hidden assumptions and there should be at least a sample of code with the sample of the dataset that you can train with a reasonable amount of computational power.”

Reproducibility Report Details:

Team members:

The Interview

Note: this interview has been edited and condensed for clarity

Cecelia: So to kick things off, can you please introduce yourselves in whatever order you’d like!

Samuel: Hi I’m Sam! I’m studying Computer Science at EPFL. Computer science with a minor in neuroscience (computational neuroscience). I did my bachelor’s at the University of Zurich and ETH in computational neuroscience and I’ve worked a bit as a software engineer and data scientist for some startups. And then I returned to the EPFL to broaden my horizons and study something else. Yeah that’s basically it.

Francesco: Well my name is Francesco and I come from Italy where I studied mechanical engineering for my bachelor at Politecnico di Milano. I then decided to move to Switzerland for this masters in computational science and engineering to give me the opportunity to study both computer science and mechanical engineering subjects. I also did an internship in computational fluid dynamics and right now I’m working on my master thesis and I’m joining both my competencies, working on aerodynamics shape optimization using neural networks. For the future I would love to do a PhD

Emiljano: I’m Emiljano. I’m from Albania. I did my bachelor in Bulgaria in computer science where I also did a software engineering internship at a company called Skyscanner. Then I got a scholarship from the Albanian government to study here in Switzerland in data science, which I like because it’s a nice combination of math and computer science. I like both these fields and in the future, I think hopefully I can do a PhD in cybersecurity.

Cecelia: Gotcha. Very cool. So how did you find out about the challenge and why were you interested in participating?

Francesco: So it was basically one of the three projects that were proposed as a final project of the machine learning class.

Cecelia: Was there a specific reason that you chose this paper to reproduce?

Francesco: Well I went to the OpenReview website and typed “ partial differential equation” and found this one. As I said, I have a background in numerical simulations and I wanted to share my knowledge and find something that could combine deep learning with numerical simulations.

Samuel: Well for Emiljano and me, it was the deep learning aspect that was important. This is for us very interesting and yes Francesco just wanted to do PDE.

Cecelia: Gotcha. It’s interesting that it was a requirement for the course. Are there other courses or other professors that emphasize like this kind of reproducibility aspect?

Samuel: I mean it’s always part of the computer science department. It’s very important that you that the code is good, it is readable and executed nicely. Apart from that, not that I’m aware of.

Francesco: I mean I was never pushed towards a challenge like this but I was always told whenever I was doing a presentation that was not reproducible, that’s not going to get a good grade.

Cecelia: Cool. So going into the challenge, how did you guys think about reproducibility? Have you ever thought about it before? What did you think it was?

Samuel: I mean in general, of course we think about it. Because every time when we see a paper we want to use in a project or in a thesis, then of course there’s the question of how to actually run it and how to actually use it. And so for us, we are affected by it by not being able to replicate or to build on the work of others. So far I think this is this is almost the most important thing — to actually be able to reproduce models easily without much hiccups. Like basically here’s a Jupyter notebook or some sample code; this is the input and the output of a model.

Francesco: It’s something very simple which is to me which is reproducible even with small computational power. Also it’s important that you should be capable of going from the proofs and you should be able to understand everything. There shouldn’t be hidden assumptions.

Samuel: You should be able to understand it from the beginning to the end and see all the assumptions which they are making. Maybe they did some small mistakes — I mean everyone does mistakes so we should be able to see them and reason about them.

Cecelia: When you’ve tried to reproduce papers in the pass or try to incorporate them into your own work, did you think that it was easy to?

Francesco: Well it’s not always easy because I found some papers with some mistakes that prevents me to reproduce it or to use the results from the paper. That was annoying.

Samuel: We didn’t have access or we don’t have access to the data itself. Maybe you have access to the code, which is probably not really readable. Then you also doesn’t have the data then you don’t know “okay so how does the data come from”? How does it look like? I mean what is the shape of the data — is it images? Is there some preprocessing on it? Or if it’s textual then what kind of texts?

Cecelia: If you could have an ideal paper, what would they provide for you? What would the components be for a fully reproducible paper or a piece of work?

Francesco: Well the material should be understandable. All the proofs should be clear with no hidden assumptions and there should be a sample of code with the sample of the dataset that you can use and should be a reasonable amount of computational power. It should run end to end without 10 GPUs.

Cecelia: So you’d be fine with even a subset of the data or sample code. You don’t need the full code or the full dataset necessarily.

Samuel: For the code, you can’t run it without the full code. You need all the code — without it, it would be very hard to reason about it. I mean the dataset is always like it’s a problem sometimes it’s just not….they’re not able to give it to you. As long as you get an example so that you get an overview of the data so you can understand the data and it’s reasonable.

Cecelia: We’ve talked a little bit about how you see reproducibility and the challenges you’ve had up to this point. Do you think that reproducibility is important? If so why or why not?

Francesco: I mean it’s the basis of science right? An experiment should be reproducible. I don’t see why this shouldn’t be the case for a computer science experiment.

Cecelia: Okay, I’m going to ask a hard hitting question that I didn’t include on the list. Have you ever been guilty of producing work that was not reproducible.

Francesco: I mean, it’s difficult to say ourselves because maybe for a project, we kept the code for ourselves.

Samuel: Like in my bachelor’s thesis, I mean I basically did a little bit of p-value hacking. The model didn’t train that well and I just searched for hours until I got the right parameters so that the performance was good. It was still reproducible given my values, but the search for these values is not very good.

Francesco: I got more into this mindset of reproducibility when I started my masters.

Cecelia: So now the fun stuff. We’re gonna start talking about the actual journey that you had reproducing this paper. So the first question is: can you describe how you decided to approach reproducing the paper and was there a specific reason why you chose that approach?

Francesco: Well at the beginning, we start trying to reprove the proofs that they showed and trying to understand what data set they used, and how they built the whole process because it was not very clear how they built up the loss function. It was written in a formal language and we needed to interpret. We don’t know yet if our interpretation was good or not, because we haven’t received an answer. I mean we managed to reproduce the results up to a certain point, but we din’t know if our approach is the only one, or if it was the one they took or there are other approaches.

Samuel: It was kind of open to interpretation. For me the most important part was to get the end to end solution as quickly as possible so you can kind of understand “Okay, does this work? Are we on the right track?” And for that, for me the most difficult part was “how did you train your model? What was your input? What is the output of a model” and the loss function. This one equation was super dense and for me, it could have been that a bit more explanation.

Francesco: So some other part of the processes that were not carefully explained at the beginning because we started working on paper and got some reviews. But in the meantime, we worked by ourselves and we thought that everything was working. And then they published like two days before our submission — they published something gave more explanation.

Cecelia: Interesting. If you have that a little bit earlier, it would have been much better. I guess a good analogy to be like hold out data set. You want to see if your model was actually correct. So when that new follow up came out, was your approach the same as the one that they had used?

Francesco: Yes, I think so.

Samuel: I mean still we would have liked to get a stamp of approval basically. We had discussed it with the professor but then he also said “Yeah your explanations are reasonable”, but since they are also not able to spend too much time into just one project in a class of 400 students. It’s hard to get a feeling if we were on the right track.

Francesco: And we got that feeling when the final paper came out and we saw stuff that we discovered ourself were added to the papers.

Samuel: I mean the paper’s authors was anonymous so we couldn’t contact them. You only could post a comment on the website. I would have liked that we could send them an email because I mean we didn’t want to post questions like “how did you really train your model?” as a comment on this website because it didn’t really feel right.

Francesco: So we did comment at the end asking for confirmation.

Cecelia: Okay so you wanted to make sure you made some attempts first before asking more questions.

Francesco: But I think it’s good because usually when I study the literature of the research, it takes a lot of time before I understand it. You still need to spend this time with it and sometimes you go to the author at the beginning and see that it’s wasted time whereas if you started after working on it, it’s much more relevant because you understand what you’re talking about.

Samuel: But also like since we didn’t have quotes from them, you can only have the paper so I think the code would’ve helped a lot. But then we couldn’t have selected this paper since because of the requirements that we had in our class. In general, I think it would have helped tremendously if you had code because then you can really see what they’re doing.

Francesco: We were advised to take a paper without code.

Cecelia: What about you Emiljano? What were your thoughts on the challenge?

Emiljano: I mean I totally agree and I learned a lot from it. Because usually when I read papers, I was with the mindset, “Okay I’m not going to get this or I can’t reproduce this.” Well now, I think I can approach that with a different mindset.

Cecelia: How long did you spend essentially reproducing the paper?

Francesco: So we started the middle of November and we have to end before Christmas. So basically we had two deadlines: one for the class which was on Christmas and then another for the Reproducibility Challenge, which was on the 7th of January. We submitted for the class and then refined the report a bit for the challenge.

Cecelia: Gotcha. And I know you have described that like you weren’t sure about the actual format of the data, what the inputs were, what the outputs were, and also didn’t know how they defined the loss function. But besides those challenges, were there any other problems that you faced when you were trying to reproduce the paper?

Francesco: That was I think once we got the interpretation and the process was working, everything was nice because we didn’t need some data from somewhere else. We could generate them. Because the data are the solution of the Poisson equation and with the finite difference method you can just solve a linear system, so that was very nice being able to generate the training set.

Samuel: It was also kind of unique I think, for this paper, because normally you have a lot of training data and we could generate it so we didn’t have this problem. I think the model didn’t take that long because it’s not that complicated. So I think this is unique in total. I mean if you train a model for road segmentation, it takes way way longer than training our simple three layer linear network.

Cecelia: Okay. I was doing research and read through the reviews of your report and one of the reviewers mentioned that you didn’t actually reproduce all the experiments from the paper. Is there a reason why you didn’t reproduce all them?

Francesco: There are two main reasons: one was time constraint of course we had one month to reproduce it and so we focused on the first method that they reported. The second one is that the multi-grid approach that they proposed was more difficult. This should have been explained better but they only referenced one other paper which is the multi-grid approach that they explain in this paper then don’t use say how they used it. That was difficult. we did a hyperparameter search that they didn’t provide.

Samuel: Also given our class, one requirement was that if we can we should build on top of it and expand it a bit. So show some more results, different results with only reproduced results from the others. It was kind of like the low hanging fruit first to be expanded a bit.

Emiljano: Even in the final version they posted two days before our deadline, they didn’t expand on the second multi-grid method.

Cecelia: Yeah. That’s unfortunate. Do you think that if you had more time or more resources, would you have done things differently or you would have approached it the same way?

Francesco: I think that our approach would be the same. I mean, of course we could have expanded it up to the multi-grid but that would have taken a lot of time.

Samuel: Yeah. Given how much time it took us to understand the non-multi-grid part, expanding on it and building on it would’ve taken much longer.

Francesco: The bottleneck for the second part was the explanation for this multi-grid method

Cecelia: Did you face any computational limits? It doesn’t seem like the model you were recreating was particularly intensive right?

Francesco: No, it was fine from that point of view. I’m just saying the beginning of the paper should have a sample that’s easy and immediate to reproduce without heavy computational resource.

Cecelia: What kind of tools or IDEs did you use to reproduce the paper?

Samuel: He used Emacs [pointing to Francesco]. I like Vim more. But it was programmed in PyTorch, Jupyter notebooks, and collaborating with Github.

Cecelia: How was your experience communicating with the authors around the paper? Any doubts you had? I know I don’t think you ever had their email.

Samuel: We posted one comment but we didn’t have any way of communicating with them.

Cecelia: Did they reply to your comment?

Francesco: They didn’t answer.

Cecelia: Did you look at the other comments and the discussions on Open Review?

Francesco: Yes, that’s how we started the new challenge was looking through the comments.

Cecelia: Okay, so I guess you tried to communicate but there wasn’t much of a response back?

Samuel: No I mean to be honest, we only posted the comment at the end. Since we didn’t see a nice, easy private way of communicating with the authors, we didn’t communicate with them.

Cecelia: Do you think that if there was some kind of way you could privately reach out to the author, that would be better? Not necessary having it on this public review site.

Samuel: Yeah I think so. Basically some way of communicating with them in the direct channel.

Cecelia: What if other people had the same questions that you had?

Samuel: You can ask questions later on if you find a good solution to it. Then you can post it but starting with some basic questions didn’t really feel right. Like no I will not post this simple, stupid question on this website.

Francesco:Maybe we didn’t really understand the scope of peer review. Maybe should be like a forum. Maybe we should do it in another way.

Cecelia: Do you think those basic questions should have been answered already in the paper?

Samuel: Some yes.

Francesco: Something should have been more clear. Yes.

Cecelia: Can you point to specific things?

Samuel: Like we mentioned before like the input or the outputs, the shape of the data which they gave. Yeah, the loss function. Like see the code example. They gave you like an equation but then there was a sampling part in it — how did you sample it? I think that they kind of wrote it down, but then in the end you don’t really know the batch size or the operations of the model which you trained.

Francesco: And is there any regularization term? Because they were putting a lot of emphasis with respect to the spectral radius to guarantee but then they didn’t say if they were actually putting some regularization on the loss function to prevent the spectral radius from growing too much. But that was also something which we worked a lot on. I think we asked for it on Open Review.

Cecelia: Yeah and going back 15 minutes, you mentioned that you obviously did some work trying to reproduce it by yourself first before reaching out to the authors and you tried to ask your professor if you were going in the right direction or not. But given that you didn’t have all the answers to all these questions, how did you know if you were going in the right direction or not?

Francesco: Well it’s when we saw that the thing started working because the point of the paper was trying to accelerate the Jacobi method to solve the linear system of the Poisson Equation. Our net was working — so not showing instability, showing convergence.

Cecelia: Well they were arguing that the convergence also happened at a faster rate, right? Did you see that that speed up difference that you had implemented match theirs?

Francesco: If you have our paper there are some graphs in which we showed the convergence rate for different number of convolutional layers.

Samuel: To totally replicate their results, we would have needed some more inputs and information on how the number of operations were going to be reduced and how much CPU time was going to be reduced. But then how did you compute CPU time?

Francesco: Yeah they produced a time without saying whether it was a mean or if they did just one experiment.

Samuel: So we could kind of replicate the results or generally the hypothesis but we couldn’t totally compare our results to their results.

Emiljano: You just measure the number of Flops (floating point operations). They didn’t specify in the paper how to do that.

Cecelia: Did the challenge change your perception of machine learning research or research in general?

Francesco: For me, it was the first time I was approaching machine learning research. And then I think it helped to improve understanding of reproducibility and to understand what are the errors that somebody shouldn’t do when writing the paper.

Samuel: Yeah not really no because for me the paper should be reproducible. So I was always like going on a paper with this expectation that I can and they somehow provide me with some code to actually prove their claim and I’m always getting frustrated if it’s not the case because I feel like this is an obvious thing to do. And I don’t know why they don’t publish it. Maybe they feel that the code is super ugly. But still that is better than nothing.

Francesco: That’s always the case. You write some code not super recommended and super clean and you say “I’m not going to publish, or I will publish when it’s cleaner and better, but then you don’t.

Samuel: Maybe it just starts with writing clean code, which of course never happens.

Cecelia: Having done the challenge, do you think you’re going to approach your own work differently in your own research?

Samuel: I mean, our work is of course always perfect.

Cecelia: Right. Of course. Well I guess if you were to write a paper and submit, would you release the code? Would you have all these very specific details about the implementation of it?

Francesco: So it’s not that trivial because sometimes maybe the code is secret or cannot be shared. And that’s something which is difficult to handle because if it’s difficult to check the code, it’s difficult to make a reproducible paper. But still I think that if I were going to publish something, I’m still trying to provide something which is not the code itself but something which proves what I want to show that’s simple.

Samuel: Also sometimes, you have these space limitations. Let’s say you have four pages you can write your findings on and there is just no space to fit details on how you did your training model.

Cecelia: I feel like you could put that information in the appendix if there is a space constraint.

Samuel: You can, but sometimes there’s just no time to write that down. So, the professor always wants to see only the results just because they also know how did you train your model and then there is no time to put it into the appendix.

Samuel: Every time I mean I’m trying to write something I always posts a link to more information and also to the code. For me, this is necessary. You cannot publish something without this.

Cecelia: So I guess my follow up question to that is if a paper came out and it had these amazing results and it came from a reputable author, but it wasn’t reproducible because they couldn’t share the code for some reason and they couldn’t share information about the data for some reason. Would you trust those results?

Francesco: I will at least try to use the work. Even if they don’t provide the code and it’s something that could be valuable for my research, I will try to implement it by myself.

Emiljano: If it was published by a respected journal, probably their work was reviewed.

Samuel: For me, I think it’s the trust in the process. This is the basis of scientific reasoning. And if you do not make sure your study design or your experiment setup is as clear as you can. Even if you are a really respected person, then how would you know? This is just necessary. Even if you are a well-respected person, if you do not make it clear than how can we believe you?

Cecelia: Having gone through this experience, what do you think would help make research more reproducible either now or in the future?

Samuel: You mean in general? Like techniques or hypotheses?

Francesco: Open access to the papers. A lot of papers are not open access. There are different journals that you need to pay for it. As students, there are journals which we can access but there are others we cannot and we cannot afford to pay a subscription.

Francesco: Well there are maybe some journals that can require the code in a repository. So if you want to publish, you should need to provide us with working code. That’s a good point. Why don’t you come up with a short experiment? I mean how can I trust your results — show me what actually is going on. There could be a very small bug and of course you show the results. So there’s no doubt after the whole process. If I cannot see it, then I cannot review it. I mean it’s been peer-reviewed but it’s peer-reviewed only on the paper. It’s not peer reviewed on the code. It’s difficult to peer review code.

Samuel: In general, I think everything should be open. There’s nothing which you should hide. With the data, if it’s personal data like health data, then of course you can’t publish it. But you still can publish statistics about the data so we know more about where this data comes from. So I think this should be included or should be open for anybody to see it. This is how we progress.

Francesco: Like in computer science, it’s one of the easiest to reproduce. I mean, a biological experiment would be much more difficult. The scientific method is super expensive. Of course you may need some additional computational resources.

Samuel: And even then you can rent a machine on Amazon for a couple of hours. It should be easy to produce because in the end, it is a program and a program should be runnable.

Cecelia: From the researchers and practitioners perspective, what do you think are some challenges that they face in making their work more predictable? We talked a little bit about how sometimes the data is private as an example. Do you think there’s any very practical challenges in making work reproducible?

Samuel: Time constraints. You cannot publish it. You don’t want to publish it. You have fear that someone else can see it and steal the ideas you have.

Francesco: Sometimes the libraries they’re using? Maybe it’s difficult to install all the required libraries.

Cecelia: Why is it why would the libraries contribute to difficulties reproducing?

Francesco: Two weeks ago I had a problem just because of Tensorflow. I was using a different version of CUDA. I changed a version of CUDA and it worked well with other processes and they noticed that there was one operation which was not working and then magically when I changed to the version everything was fine as well.

Cecelia: So do the researchers should also share information about their environment? When you say when you say code, are you including that information as well?

Samuel: Your setup should be there. Like if you’re using libraries, for example numpy version X or Y, sometimes the operations changes. I think you should also release which machine you train it on so that I know “OK so they used a supercomputer to train for an hour, so I can’t expect the code to run on my machine”. Maybe they were using different architecture so that I would expect the differences. I think all that, again, everything should be given, everything should be easy to install, easy to use.

Francesco: I mean, for the guidelines for the course, the professor told us to get to the library to rent a laptop (a different laptop from the university) and to try to run our code on the laptop

Samuel: Because they also needed to produce our results so they actually needed the code to set it up. It’s because there are 400 students, so it needed to be practical.

Francesco: At least it was working on three different laptops.

Cecelia: When you had a repository for the code and when you tried to run it, how did you ensure that you all have the same environment? Did you use a Docker container or something else?

Samuel: No, that would’ve taken too much time. We just used a conda environment.

Francesco: Yeah, for our project, there weren’t too many libraries. It was just PyTorch and numpy so it was fairly easy. Plus we had a fixed weekly meeting every Thursday where we met for one or two hours.

Cecelia: Got it. Did you all have any follow up questions or things you wanted to mention outside the questions we covered?

Samuel: The interview was quite complete. Well, I just want to say that I really like this — reproducing as a challenge. I think this is good progress in a scientific area where there’s a need for it and maybe you can actually do this without huge problems. So I really like this push to reproduce papers.

Francesco: I think it’s going to be a very relevant journal because the paper really was peer reviewed. There was this reproducibility phase in the peer review process where the paper was peer reviewed again. You can at least make sure that this is a good paper and is at least reproducible after two peer review steps.

Cecelia: Did you see the latest update for NeurIPs where in their call for papers they actually now ask people to submit answers to the reproducible checklist that was introduced last year. And Joelle — who is like the organizer for this challenge— she’s the one who made that checklist.

Francesco: OK I think I’ve seen on the website, the reproducibility checklist.

Cecelia: So this huge conference is now encouraging people to or basically mandating that people submit responses to these questions.

Samuel: So it is really great.

Cecelia: I think some folks have argued that if you were trying to force people to submit code or you know fulfill all these requirements, there’s actually going to be less research that’s submitted. What do you think about that claim?

Francesco: No that’s not true.

Samuel: It’s not the quantity that should be important. There are too many papers published already. I cannot follow them. Even if I really want to and only want to do deep learning, there are too many papers published daily that I cannot follow up. I think that if you pay for less papers to get this higher quality.

Emiljano: Probably there will be less bad papers.

Francesco: But I was thinking is that this reproducibility checklist is very similar to what we are asked to do in our broader guidelines.

Cecelia: Got it. It was great to meet you guys. Thanks for sharing all this information!

Our next interview will be published this Friday. Make sure to follow us on this blog to stay updated.

Interested in seeing how Comet.ml can help your team automatically track your datasets, code, models, and experimentation history for reproducible research? Find out more.

It’s easy to get started

And it's free. Two things everyone loves.