Ideas of the leading Machine Learning researches

foo bar
5 min readFeb 1, 2021

Disclaimer: ideas that were expressed in these videos may be wrong or misunderstood and incorrectly conveyed by me.

Article provides three fresh interviews (2019–2020) with the world’s leading researchers in the field of Machine Learning. I’ve tried to find the most interesting ideas for thinking and developing new ML algorithms.

Interview with Geoffrey Hinton: The Foundations of Deep Learning

Geoffrey Hinton photo
  • Brain learns without a “teacher”, so the future of ML is unsupervised learning (7:57, 27:18, 36:44, 42:50, 47:09). For now BERT, GPT-3, capsule networks and other transformer models use unsupervised pretraining (25:25). Most of the learning in the brain is unsupervised (48:07).
  • The main idea of the capsule network is that features are better represented as a directional vector, rather than an undirected array of numbers. According to Hinton capsule neural networks are “finally something that works well”.
  • Brain does not use back-propagation, at least not like in convolutional neural networks (18:04, 19:42). So idea of distillation networks appeared (20:52). There is no need to back-propagate through all the layers, but reach an agreement between neighboring layers in the layer stack. However for now it is not better than simple greedy bottom-up learning algorithm. That is, knowledge distillation simply trains another individual model to match the output of the ensemble. Also self-distillation (or “Be Your Own Teacher”) performs knowledge distillation against an individual model of the same architecture.
  • Neural networks are very good at recognizing textures (31:55). That’s why there are adversarial examples where two things looks totally different to us, but very similar to neural net and vice versa.
  • Big model trained directly on the data can teach smaller and faster models, that would be as good as the big model (37:01, 38:53). Models that are good at sucking structure out of the data are not necessarily the same as the models that are going to be small, agile and easy to use on the cell phones.

Interview with Jeremy Howard: fast.ai Deep Learning Courses and Research

Jeremy Howard photo
  • Python is not the future programming language for ML (12:59). Python is a lot less elegant language in nearly every way, but it has data science libraries. There are better array-oriented programming languages (8:16) such as Swift (see CoLab with Swift support, Swift for TensorFlow, Swift AI ), J and Julia.
  • There is about 10x shortage of the number of doctors in the world (24:05). It would like to take about 300 years to train enough doctors to meet that gap. Maybe if we used deep learning for some of the analytics, maybe we don’t need so many highly trained doctors for diagnosis and treatment planning. In South Africa there is only 5 pediatric radiologists for the entire continent (25:28). The person that looks at medical imaging for kids will be a nurse at best. In India and China almost no X-rays are read by any trained professional. We need algorithms for preliminary diagnosis. They have money: developing world is not a poor world. There is an expensive diagnostic equipment, but no expertise. AI in medicine started from zero at 2014 (28:28).
  • One of the research areas now is doing more with less data (33:00): transfer learning, active learning (41:49), cooperative learning, one-shot learning, future learning, data augmentation, data generation, self-supervised learning, etc. The rise of small data or low data.
  • Most of the research in deep learning world is a total waste of time :-) (40:55).
  • Computational photography (57:01): three cheap lenses plus a little bit intentional movement (to take a several frames) gives enough information to get excellent pixel resolution. The same thing with audio which is not done yet.
  • Leslie Smith discovered a super-convergence (58:57, 1:01:00). There are certain networks that with certain settings of hyper-parameters could be trained ten times faster by using ten times higher learning rate. No one published that paper, because it’s not an area of academic research and this phenomenon has not yet been explained.
  • The key differentiator between people that succeed and people that fail is tenacity (1:20:55, 1:27:02).
  • We don’t need more experts like create slightly evolutionary research in areas that everybody is studying (1:24:37). We need experts at using deep learning to diagnose malaria, to analyze fisheries to identify problem areas in ocean, to predict mutations in viruses, etc.

Interview with Andrew Ng: Deep Learning, Education, and Real-World AI

Andrew Ng photo
  • The thing we really got wrong was the early importance of unsupervised learning (23:33). The modern ML is based on supervised learning. However unsupervised learning is a beautiful idea (46:10).
  • Make learning a habit (50:54). Just like brushing your teeth. Start small (57:52, 1:14:33, 1:18:22, article AI Transformation Playbook), but remember that the ML model is about less than 5% of the entire system.
  • Career advice: try to find a good team for daily communication. Who are these 5–10 people you’ll interact with every day? (1:00:32, 1:03:10).

Several useful courses

If you want to do ML for yourself there are several useful courses worth to study. I’m a practice fan of “eating my own dog food”, so I have studied these courses myself in the last six months or finishing them right now.

Quick summary

Do more with less data. Use Machine Learning in your expert domain area to solve everyday life practical problems with this tool. Make learning a habit. Keep learning even if you don’t understand something, because the key difference between people that succeed and people that fail is tenacity.

--

--