IceVision is a framework for object detection which allows us to perform object detection in a variety of ways using various pre-trained models provided by this framework. It also offers data curation features along with a dashboard for exploratory data analysis. The best feature it has is that it provides an end-to-end deep learning workflow that allows the practitioners to train networks with easy-to-use robust high-performance libraries such as PyTorch-Lightning and FastAI. In this article, we are going to discuss the IceVision framework for object detection with hands-on implementation. The major points that we will discuss here are listed below.
Table of Contents
- What is IceVision?
- Installing IceVision
- Data Preparation
- Importing Libraries
- Download and Prepare a Dataset
- Parse the data
- Creating Datasets with Augmentations and Transform
- Model Building
- Pre-Modelling Procedures
- Training
- Training using FastAI
- Training using PyTorch Lightning
Let’s begin the discussion by understanding what IceVision is.
What is IceVision?
IceVision is a framework that allows us to preprocess our data for object detection and train a model for object detection on the data so that using the model we can make inferences on the data. The framework provides layered connections between deep learning engines, libraries and models. Also, the framework has datasets that can be used for learning the basic implementation of the IceVision frameworks for object detection where the models under the framework are built using the libraries like TorchVision and Ultralytics YOLO.
We can select from many models built on the framework and also switch between them very easily. Basically using the IceVision, we can train a model according to the datasets and after that, we can change the datasets or model as per our requirement. According to its official GitHub profile, some of the features of IceVision are listed below.
- Using the auto_fix from the framework, we can automate the data curation and cleaning procedure.
- We can also have access to a dashboard using the framework which can be helpful in explanatory data analysis.
- In the framework, we have various models which can be used for object detection, segmentation, and classification.
- The framework is compatible with the various libraries which can be used for various aspects of computer vision programming.
- We have various transformation module in the framework which help in training the model more accurately.
In the next part of the article, we are going to see a basic example of implementing IceVision framework.
Installing IceVision
Let’s start with the installation which can be done by using the following lines of codes.
!wget https://raw.githubusercontent.com/airctic/IceVision/master/IceVision_install.sh
The above-given lines of code will let us have the packages of Torch, TorchVision, IceVision framework, IceData, MMDetection, YOLOv5 and EfficientDet. After gathering, we can install them using the following line of code.
!bash IceVision_install.sh cuda11 master
Output
Since we are using Google Colab we have some of the requirements like torch and TorchVision already installed in the environment. We can also change the installation target to cuda10 or CPU. Now we can restart our kernel using the restart button on the runtime panel of the notebook or we can simply use the Ctrl + m button for that.
Data Preparation
For moving forward to the modelling, we are required to have records using which we can build a model. In this section of the article, we will discuss how we can prepare data for modelling using the IceVision framework.
Importing Libraries
We can import all the components of the IceVision framework using the following line of code.
from IceVision.all import *
Download and Prepare a Dataset
Now we can take our steps to the modelling side. Before going for the modelling, we are required to have a dataset for this purpose. We have a data set called Fridge Objects dataset with 134 images belonging to the four classes:
- Can
- Carton
- Milk bottle
- Water bottle
Using the IceVision module for data import, we can import our data using this link.
Import the Data
import icedata
path = icedata.fridge.load_data()
Output:
Parse the Data
Using the parser module of the framework, we can load the annotation file and split the data into the training and testing, and validation parts. The submodule under the parser helps in annotating for the common errors in the data.
# Create the parser
parser = parsers.VOCBBoxParser(annotations_dir=path / "odFridgeObjects/annotations", images_dir=path / "odFridgeObjects/images")
Using the following lines of code we can split the data into training and validation datasets.
# Parse annotations to create records
train, valid = parser.parse()
parser.class_map
Output:
Creating Datasets with Augmentations and Transform
As we know that data augmentation and transformation help in making a model well trained and perform accurately on the data. This framework also provides this facility where the Albumentations library helps in defining and executing transformations. There are various transformations provided in the framework. In this article, we are using the aug_tfms module for the transformation of the image which helps the model to get transformations like rotation, cropping, horizontal flips, and more.
Let’s define a function for transformation
train_trans = tfms.A.Adapter([*tfms.A.aug_tfms(size=384, presize=512), tfms.A.Normalize()])
valid_trans = tfms.A.Adapter([*tfms.A.resize_and_pad(384), tfms.A.Normalize()])
Using the function with data
train_data = Dataset(train, train_tfms)
valid_data = Dataset(valid, valid_tfms)
Let’s visualize the data after augmentation is performed.
vis = [train_data[1] for _ in range(8)]
print("training data")
show_samples(vis, ncols=4)
Output:
training data
vis = [valid_data[1] for _ in range(8)]
print("validation data")
show_samples(vis, ncols=4)
Output:
validation data
Model Building
Before training a model we are required to instantiate the model variable. Make the data according to the model and various procedures to follow before any modelling procedure. So let’s start with the pre modelling procedure.
Pre-Modelling Procedures
In order to build a model using the IceVision framework, we are required to select libraries, models, and backbones for the model. Also, it is mandatory for us to choose these all from the given options under the framework.
Here we are using the RetinaNet model with the backbone of resnet50_fpn_1x. Which can be specified by using the following line of codes.
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
Now we can instantiate the model using the following lines of code.
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
Since we have various options of models and backbone we are required to make the data according to the model. Till now we have seen how we can call the data and make changes on the data. For editing data according to the model, the framework provides the facility of data loaders using which, we can make changes on the data for modelling purposes.
# Data Loaders
train_load = model_type.train_dl(train_data, batch_size=8, num_workers=4, shuffle=True)
valid_load = model_type.valid_dl(valid_data, batch_size=8, num_workers=4, shuffle=False)
Let’s visualize the batch for validation in the loader.
model_type.show_batch(first(valid_load), ncols=4)
Output:
Now we can track the progress of the training using the FastAI and PyTorch lighting for which we can use the framework provided metric class. We are just required to instantiate a variable that can hold the metrics under it.
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
Training
Now the above-defined metrics can be used for training the model using the FastAI or PyTorch-lightning. Both will support the same metrics.
Training using fastai
training = model_type.fastai.learner(dls=[train_load, valid_load], model=model, metrics=metrics)
Output:
Tuning the Model
training.fine_tune(20, 0.00158, freeze_epochs=1)
Output:
The above-given output is some of the results from tunning of a model where the most optimal result is highlighted. In the tabular results, we have a measure of training and validation losses with the metrics which we have chosen to track the training.
We can also train the model using the PyTorch Lightning. The procedure is almost same but the coding part for PyTorch lightening is different. We can use the following line of codes for training the model using the pytorch lightening:
class LightModel(model_type.lightning.ModelAdapter):
def configure_optimizers(self):
return Adam(self.parameters(), lr=1e-4)
light_model = LightModel(model, metrics=metrics)
We can instantiate the model using the following lines of codes:
trainer = pl.Trainer(max_epochs=5, gpus=1)
trainer.fit(light_model, train_load, valid_load)
Also, we can check the results using the following lines of codes:
model_type.show_results(model, valid_ds, detection_threshold=.5)
Output:
The above-given output is the final result of the process we used for object detection using the IceVision framework. We can see that it is working well. We can use it for our projects because it is an open-source framework.
Final Words
In this article, we have seen an overview of the IceVision framework for object detection. Along with that, we have also seen how we can use models and data from the framework and how we can make a whole process work for the object detection task. I encourage users to follow the framework more and try to perform other tasks related to computer vision problems.
References