NIPS 2017 Summary

NIPS

tl;dr

takeaways

Deep learning is influencing bayesian methods and vice versa: deep bayesian learning and bayesian deep learning are two slightly different subfields with extremely rapid progress this year. Expect to see a lot more developement in this area. There was a great workshop on these topics and I predict there will soon be benchmark datasets and a growing base of open source software for folks to use.
Not only is model interpretability, bias, and fairness on people’s minds, but there’s also a lot of attention and research in this rapidly-developing area right now. Kate Crawford gave an excellent invited talk about bias and fairness in machine learning which paved the conversation for the rest of the week, including in several papers, symposia, and workshops. This is not just an area where the expertise of computer scientists is needed: it’s inter-disciplinary and research would be served best by decades of research by sociologists, anthropologists, pyschologists, historians, etc.
There is general acknowledgement that more theory is needed to understand the state-of-the-art empirical performance across many areas of deep learning. Rahimi and Recht’s test-of-time award talk called for simple theorems and for simple, easily reproducible experiments.
Deep reinforcement learning is all the rage. We’ve beaten humans at poker, a great entree into imperfect information games, and we’ve beaten humans at perfect-information games (Go, Shogi, and Chess) with no prior knowledge other than rules (i.e., no training data) – using the exact same set hyperparameters.
GANs are hot – but we’re still not sure how they work or how to make them more usable. Ian Goodfellow suggests we give them a few more years.

must-see papers

Neural Information Processing Systems 2017

Main takeaways from #NIPS2017

Note: throughout the week I intend to post deeper dives on tutorials, talks, symposia, and workshops.

Deep learning is influencing bayesian methods and vice versa: deep bayesian learning and bayesian deep learning are two slightly different subfields with extremely rapid progress this year. Expect to see a lot more developement in this area. There was a great workshop on these topics and I predict there will soon be benchmark datasets and a growing base of open source software for folks to use. Neural networks can make brittle predictions – how can we quantify their uncertainty better? Gaussian processes can be composed together to create richer models – how do we leverage tools used in deep learning to make using GPs easier and more performant? 1.1 See Yee Whye Teh’s talk, On Bayesian Deep Learning and Deep Bayesian Learning for more info. Video link.
Not only is model interpretability, bias, and fairness on people’s minds, but there’s also a lot of attention and research in this rapidly-developing area right now. Kate Crawford gave an excellent invited talk about bias and fairness in machine learning which paved the conversation for the rest of the week, including in several papers, symposia, and workshops. This is not just an area where the expertise of computer scientists is needed: it’s inter-disciplinary and research would be served best by decades of research by sociologists, anthropologists, pyschologists, historians, etc. Can we create some benchmark datasets over which we can define methods and problems associated with bias and fairness? Can we develop theoretical guarantees around these concepts as a foundation for further research? How do we nurture interdisciplinary research across fields whose methods and language are almost equally important but almost completely divergent (e.g. mathematical proofs vs. qualitative prose)? 2.1 See Kate Crawford’s talk, The Trouble with Bias for more info. Video link
There is general acknowledgement that more theory is needed to understand the state-of-the-art empirical performance across many areas of deep learning. Rahimi and Recht’s test-of-time award talk called for simple theorems and for simple, easily reproducible experiments. Why does deep learning work so well? How do these models generalize so well when, according to VC theory, they have so many parameters? Can I repeat the results of someone’s experiment from reading their paper? 3.1 See the talk by Ali Rahimi and Benjamin Recht, the test-of-time-award talk, for more info.
Deep reinforcement learning is all the rage. We’ve beaten humans at poker, a great entree into imperfect information games, and we’ve beaten humans at perfect-information games (Go, Shogi, and Chess) with no prior knowledge other than rules (i.e., no training data) – using the exact same set hyperparameters. We’re nowhere near DeepMind’s mission of AGI but in 2017 we’ve made some undeniably cool breakthroughs. How do we design reward functions? How do we get systems to learn new tasks more efficiently? How do we make RL systems interacting with the real world (e.g., self-driving cars) safely robust to new/rare states (avoiding an accident with a car flipping in mid-air), outliers (stop signs missing), or adversarial environments (stickers placed on signs to fool self-driving cars)? 4.1 See Pieter Abbeel’s talk, Deep Learning for Robotics for more info.
GANs are hot – but we’re still not sure how they work or how to make them more usable. Ian Goodfellow suggests we can take GANs more seriously in a few years once we work some of the foundational kinks out. In essence, they’re hard to train and the learning process is somewhat unstable. Can we draw on related insights from dynamical systems theory to create more robust training objectives/regimes? How do we avoid situations where we get close to the nash equilibrium and then the parameters diverge? 5.1 See Ian Goodfellow’s talk, Bridging Theory and Practice of GANs, for more info.

Practical talks

There were a few talks that stood out to me for immediate application with my own work.

Poincaré Embeddings for Learning Hierarchical Representations 1.1 This paper seems to implement an elegant solution that vastly outperforms earlier word2vec work and does a good job of characterizing hierarchical language relationships. Hooray for non-euclidean geometry!
Self-Normalizing Neural Networks 2.1 This paper seems to implement a simple way to avoid manually injecting batch norm, needing to be picky about activiation functions, and achieves superior results via a new activation function and provably optimal activation function parameter settings (I wouldn’t really call alpha a hyperparameter if the setting is optimal – why would you change it).

Reflections and daily summaries

Overall, NIPS 2017 is easily one of the best conferences I’ve ever attended. It was my first NIPS and my first academic, week-long, machine-learning conference, and therefore hard to compare to other, shorter, more industry-focused conferences. It was overwhelming in terms of content. I think that next year I’ll spend weeks/months preparing by reading more of the proceedings and symposia/workshop submissions so that I can get more out of NIPS. Right now, I’m writing everything down to avoid catastrophic forgetting. Having attended NIPS, I think I’ll set my sights on other similar conferences such as ICML or ICLR.

At least two aspects of attending NIPS would improve with additional preparation:

Poster sessions.
Comprehension of talks/knowing what points are most important thereto.

Because the talks are about state-of-the-art results, unless one is intimately familiar with the topic, it’s hard to know what’s important. Further, I think I could learn a lot more about being a practitioner by talking with authors of papers about their implementations and what some of their advice is for either using their methods or coding them up – what are some good hyperparameter settings? How many computational resources do I need for a reasonable implementation? What were some of your built-in assumptions about the data that weren’t explicit in your paper (e.g., perhaps known aspects of benchmark datasets that would not apply to an industrial, proprietary dataset).

Here’s a short overview of each day. I will link to my other posts covering my attendance of the conference. There were so many parallel tracks that I saw far fewer than half of NIPS but I will attempt to convey my experience.

Monday, December 4: tutorials; invited talk; opening remarks
Tuesday, December 5: invited talks; first day of talks of main NIPS proceedings
Wednesday, December 6: invited talks; second day of talks of main NIPS proceedings
Thursday, December 7: invited talks; final talks of main NIPS processings; symposia
Friday, December 8: workshops
Saturday, December 9: workshops and NIPS closing party

Fun stuff

Deep Learning textbook anniversary

MIT Press held a one-year anniversary party for the print edition of the Deep Learning textbook. Ian Goodfellow, Aaron Courville, and Yoshua Bengio were all there to cut a cake on whose surface was printed the cover of the textbook – a Google deep-dream photograph from the Strawberry Fields in Central Park. Is this the first example of the output of a neural network being rendered in food? (Edit: Google handed out cookies whose recipes were learned via a neural network).

Someone made the inevitable joke asking how many “layers” were in the cake. It looked and tasked like a fully connected MLP of about 6-7 layers to me. Not good enough for ImageNet but good enough for an afternoon snack.

Lots of people got photos or took selfies with the authors, or brought their books to this event to get them signed.

NVidia CEO Jensen Huang was excited too and spent some time hanging out with everyone. No wonder, since the huge demand for GPUs and NVidia’s sustained success over the last few years has been fueled by the cambrian explosion of deep learning – in no small part due to the authors and even the crowd at the book signing.

NIPS Closing Party

The Imposteriors, an improbably good band. The only band with 50,000 citations, amiright?

It was a blast to bring everyone together at the end and have the nerdiest rockstars (all rockstar scholars in their own right as well) give a great concert.

Toward the end they had special guest David Blei on accordian playing polka. This led to the audience self-assembling into giant concentric markov-chains: fun was had by all.

Corporate presence

As this was my first NIPS, I’m not much of a stakeholder in terms of what NIPS should be. Second, I’m an engineer and therefore already somewhat in the “enemy” camp for many of the attendees.

Zeitgeist: it seems like the general feeling is that the success of deep learning since 2012 has led to the corporate takeover of NIPS. Many or most attendees are not scholars but are in industry (would love exact stats on that); corporate sponsors steal a lot of attention from NIPS (both from talks and poster sessions) via expensive, flashy booths, free swag, heavy recruiting, and parties with open bars and other gimmicks.

This kind of attention is a double-edged sword. It’s great – look how many more people are interested in the field compared to a few years ago. It’s terrible – the hype will lead to the next AI winter, and technology is progressing too quickly to keep track of all of the potentially societally detrimental innovations such as generating realistic fake videos with geopolitical consequences or violating privacy with improved identity detection and automatic surveillance.

Personally, I felt 100% uninfluenced by the recruiting and generally did not attend the corporate parties. Overall, I thought it was great that all of the poster sessions were sponsored with food and drinks, encouraging folks to stick around late into the evenings.

Corporate party highlights:

Intel held a Flo Rida concert and announced their latest “AI” chip, derived from their acquisition of Nervana Systems. This is exactly the plot of the pilot episode of HBO’s Silicon Valley, either a perfect self-aware example of life imitating art imitating life, or a cringe-worthy coincidence.
NVidia, not to be outdone, had an orchestra play an “AI-generated” piece, unveiled a new GPU, and handed out over a dozen to the audience (MSRP ~$3k).
Tesla, who did not sponsor NIPS, held an invite-only event with Elon Musk, Andrej Karpathy, and Jim Keller in an old-school mansion – they let attendees drive brand-new Model S’s to the event.

Most of the deep-pocketed sponsors held similar events – some open, some invite-only – but those 3 are the top in terms of gimmicks that I know of.

Other corporate stuff:

During a lunch event, Apple announced they’d be open-sourcing turi-create, which is a fantastic move on their part. As a former user of graphlab-create, I am pleased that they are open-sourcing it and I’m looking forward to using it again.
Intel had a quantum computer at their booth. It was awesome.
Rennaissance Technologies had a booth with no sign on it. So badass.