FSDL Lecture 6: Deployment is now live! This lecture covers a critical step: getting your model into prod. The key message is similar to our philosophy in other parts of the ML workflow: Start simple, add complexity as you need it. fullstackdeeplearning.com/course/2022/le…

full_stack_dl's tweet image. FSDL Lecture 6: Deployment is now live!

This lecture covers a critical step: getting your model into prod.

The key message is similar to our philosophy in other parts of the ML workflow:

Start simple, add complexity as you need it.

fullstackdeeplearning.com/course/2022/le…

When it's time to deploy, the first step is to create a prototype you and your friends / teammates can interact with. @Gradio, @huggingface, and @streamlit are your friends at this stage. You do want this to have a basic UI and be hosted behind a webserver to reduce friction.

full_stack_dl's tweet image. When it's time to deploy, the first step is to create a prototype you and your friends / teammates can interact with.

@Gradio, @huggingface, and @streamlit are your friends at this stage.

You do want this to have a basic UI and be hosted behind a webserver to reduce friction.

This is an example of the model-in-service deployment paradigm, where you just embed your model in your webserver. It's simple to implement, but will run into issues as you scale because models and web servers scale differently.

full_stack_dl's tweet image. This is an example of the model-in-service deployment paradigm, where you just embed your model in your webserver.

It's simple to implement, but will run into issues as you scale because models and web servers scale differently.

Once you run into some of these limitations, it's time to pull the model out of your web server. At a high level, there are two ways to do this. The first is batch prediction, where you run your model periodically on all possible inputs, and store the results in a database.

full_stack_dl's tweet image. Once you run into some of these limitations, it's time to pull the model out of your web server. 

At a high level, there are two ways to do this.

The first is batch prediction, where you run your model periodically on all possible inputs, and store the results in a database.

Batch prediction is simple to implement, scales really well, has low latency, and has used in production for years by top companies in large-scale systems. However, - It doesn't work if you have a large universe of inputs, like in many use cases - Predictions quickly get stale


The second way to pull the model out of the web server is to run it as a separate service. This is the right answer for most ML use cases. It lets you scale & manage it separately and reuse it across apps. The tradeoff is added latency & infra complexity.

full_stack_dl's tweet image. The second way to pull the model out of the web server is to run it as a separate service.

This is the right answer for most ML use cases. It lets you scale & manage it separately and reuse it across apps.

The tradeoff is added latency & infra complexity.

In the lecture, we cover some of the main problems you may need to solve in the course of building and scaling your model service: - Managing dependencies - Optimizing the model's performance (GPUs?) - Scaling the service horizontally - Rolling out new versions


Since we're MLEs, not infra engineers, it rarely makes sense to solve all of these problems ourselves. Serverless options (like AWS Lambda) handle scaling out-of-the-box. They're well suited to a wide range of ML applications and are our default recommendation


If you want a simpler developer experience or more deployment / scaling features out of the box, there are managed options. Sagemaker is a good first thing to try if you're on AWS. There are also a range of startups, some of which provide more features like serverless GPUs.

full_stack_dl's tweet image. If you want a simpler developer experience or more deployment / scaling features out of the box, there are managed options.

Sagemaker is a good first thing to try if you're on AWS.

There are also a range of startups, some of which provide more features like serverless GPUs.

The last deployment paradigm we cover is deploying to the edge. Sometimes edge is your only option, like if you're deploying to a device with no internet. Edge also minimizes latency, is great for security, and scales well because users bring their own compute.

full_stack_dl's tweet image. The last deployment paradigm we cover is deploying to the edge.

Sometimes edge is your only option, like if you're deploying to a device with no internet.

Edge also minimizes latency, is great for security, and scales well because users bring their own compute.

However, edge deployment is still an immature part of the stack, and it comes with pretty significant tradeoffs: - Edge devices have limited resources - Edge frameworks are immature - It's difficult to update models - It's difficult to get data back for debugging or retraining


If you do deploy to the edge, we recommend the following mindsets: - Choose architecture with target hardware in mind - Iterate locally, but don't make big changes without verifying on-device - Test models on production hardware - Build in fallbacks for failures / latency


To summarize, you only see if your model actually works after you deploy it, so deploy early, and deploy often! Check out the lecture if you want to learn more about deploying models to production. fullstackdeeplearning.com/course/2022/le…

full_stack_dl's tweet image. To summarize, you only see if your model actually works after you deploy it, so deploy early, and deploy often! 

Check out the lecture if you want to learn more about deploying models to production.

<a style="text-decoration: none;" rel="nofollow" target="_blank" href="https://fullstackdeeplearning.com/course/2022/lecture-5-deployment/">fullstackdeeplearning.com/course/2022/le…</a>

United States Trends
Loading...

Something went wrong.


Something went wrong.