Requirements for ML systems

Varies from company by company, but generally should follow these

Reliability
- The system should continue to perform the correct function at the desired level of performance even in the face of adversity
- ML systems can fail silently (if you google translate into a language you don’t know, it’s hard to check correctness)
Scalability
- Ways your ML model can grow
  - Complexity: larger parameters of your model
  - Volume traffic: your model might have 10k requests daily, but it gan grow to 1-10M.
  - Model cout: You might use more than 1 model for use cases
- Whichever way your system grows, there should be reasonable ways of dealing with that growth
  - Resource scaling: upscaling or downscaling
    - Autoscaling: automatically scaling up and down the number of machines depending on usage. This feature can be tricky to implement.
  - Artifact management. Managing one hundred models is very different from managing one model.
    - You will need to automate the monitoring process and manage code generation
Maintainability
- Many ppl (MLE, DevOps, SME) contributes to an ML system
- Set up your system so that other ppl can use it easily
  - Document
  - Version code, data, and artifacts
Adaptability
- should have some capacity for both discovering aspects for performance improvement and allowing updates without service interruption

leejunkim