ML in Research VS in Production

Research prioritizes high Throughput, production prioritizes low Latency.

When processing queries one at a time, higher latency means lower throughput. When processing queries in batches, however, higher latency might also mean higher throughput.
(To read this part of the book again)

Research
- Dataset is usually historical (already exists), often clean and well-formatted, allowing you to docus on model development / architectures
Production
- Data is usually VERY messy, unstructured: Incorrect labels (maybe you need to relable everything), bias (you might not even know if there is a bias), the data might be sparse, imbalanced, or incorrect
- Working with data constantly generated by users, systems, 3rd party data
- Privacy and regulatory issues

Researchers during research phase:
- “Let’s try to get state of the art first and worry about fairness when we get to production.” (lol) → Often it’s too late
When ML algorithms are deployed at scale, they can discriminate against people at scale. If a human operator might only make sweeping judgments about a few individuals at a time, an ML algorithm can make sweeping judgments about millions in split seconds.

Since most ML research is still evaluated on a single objective, model performance, researchers aren’t incentivized to work on model interpretability. However, interpretability isn’t just optional for most ML use cases in the industry, but a requirement.
Reasons
- Users (leaders and end users) can understand why a decision is made so that they can trust a model and detect potential
- Developers can debug and improve the model.

Companies can’t risk ML research because it takes tens of millions of dollars to compute + massive amount of data
The vast majority of ML-related jobs will be, and already are, in productionizing ML.

leejunkim