Analyze Video using AWS Services and OpenCV


On one of our last blogs, we talk about how to analyze videos on Python, using OpenCV to apply different filters over the video, like getting red color, difference and a detector. For this, we had to send a local video path to our code, run the code manually and wait for the result. But what if we can just upload the video, and this process can run automatically and in the cloud?

Amazon Web Services (AWS) has many resources that can help us with this. We are going to work with some of these resources, like Simple Storage Service (S3) to store the videos, both original video and the analyzed one, and Lambda to store our Python code and run it. Also, we are going to add a trigger between Lambda and S3, so every time one video is uploaded to S3, the code on Lambda runs automatically, analyze the video, and upload it to S3. Later, we are going to transcode it, using AWS Transcoder, to have a valid video to display.

Its easier to understand code on Github, so I’m going to give you the link for that. Our file will be analyzer.py, feel free to check that first or during this blog.

Credentials

To have access to the services on AWS Python, create a user on the IAM(another AWS resource), and get your credentials there, ACCESS KEY ID and SECRET ACCESS KEY. Instead of explaining this, which is pretty easy to get, I’m gonna make use of the information provided by Amazon, which is pretty complete. Here it is. These credentials give you the authorization to use S3, Lambda and the other resources directly from Python. Remember that your credentials give you access to anything on AWS, so keep them safe and don’t share them with anybody!

Security

AWS Credentials are so important that they can’t be stored anywhere. For example, this credentials shouldn't be in any file that you’re going to upload on Github, because other people will have access to this file, and therefore to the credentials. Fortunately, Lambda has Environment variables that can store this credentials, and you can keep them on a file for local tests. My file will be settings.py which will have the ACCESS KEY ID and SECRET ACCESS KEY, of course you will not see this file on my Github repository..

On settings.py:
ACCESS_KEY_ID = “XXXXXXXXXXXXX”
SECRET_ACCESS_KEY = “XXXXXXXXXXXXXXXXXXXXXXXXXX”
Replace XXXX with your own credentials. And add this file on your .gitignore, so Github ignores it every we push our project.

On Lambda, add the same values on your variables:





Uploading the analyzed, we are going to see how to work with environment variables or local file depending on the situation.

Storage

Amazon S3 uses Buckets as storage, and inside here, you can store any kind of file like text, media, audio, etc. Same as before, creating a bucket and the folders is pretty easy, you just have to take note of the names, since we are going to use them later. First create a bucket, and inside there create three folders. My bucket will be blog4geeks, and my respective folders will be original, tmp_analyzed and analyzed, pretty easy to remember. We actually need three folders, because the trigger runs every time a file is uploaded, if we upload the analyzed video on the same folder as the original one, it will execute the trigger again, and again, and again. So let’s just create three folders to avoid this. The tmp_analyzed will storage the video from the analysis, and later we are going to take this video and transcode to the analyzed folder.


Lambda Function

Creating the function itself is pretty easy too! Just some clicking, writing the name, setting Python 3.6 as Runtime and we are done! Here is the process in case you need help. This will create the Lambda function with a default file called lambda_function.py, which has a function called lambda_handler, with two parameters: event and context. This will be our connection between Python and Lambda, so every time a file is uploaded, the trigger will run this function, sending the file information on the Event parameter, like bucket where was uploaded, filename, etc. And from here, we can call our own function that we created for the analysis.

Dependencies

To work on the analysis locally, we have to install manually OpenCV and Numpy, but on S3 is a little bit trickier. We have to create a zip file with all the dependencies that we are going to need, like opencv, numpy, and opencv-contrib. The easiest way is to create a virtual environment, and then zip it. On our previous blog about video analysis, we created the environment and we installed the dependencies, so we can use this one. The name is env, and I’m using Python 3.5, with this we can create our zip file.

> cd env/lib/python3.5/site-packages

> zip -r9 ../../../../function.zip .


This will create a zip file called function.zip with all our dependencies. Using this zip, we have to create a Lambda Layer. The same as before, set a name on the layer and set Python 3.6 as Runtime. Later, go to your Lambda function, and add this new layer to the function. Here is some help in case that you need it. Given this, every time that our Lambda function runs, it will run this layer first, using it as environment, so we can have access to all the functionalities on OpenCV and Numpy, and any other dependency that you want to use, just like if you were running it locally.

To run the AWS services on Python locally, we need boto3, which is the SDK for AWS on Python. You can install it with pip: > pip install boto3. We don’t need to add it on the zip for Lambda, since Lambda can use boto3 by default.

Code

Once we set the trigger for S3, every time that we upload a file, S3 automatically calls the function with the information of the file uploaded, and this information is sent on the Event parameter on the lambda_handler. If you print event, you will see that is a big JSON file with a lot of information, but the only information that we need is filename and bucket name, and we can get that information like this:

On file lambda_function.py
> data_retrieved = event['Records'][0]['s3']
> file_name = data_retrieved['object']['key']
> bucket_name = data_retrieved['bucket']['name']

Let’s send this to our analyzer, like this.
> from analysis import Detector
> detector = Detector(file_name, bucket_name )
> detector.detect_movement()

The analyzer itself can’t work with just this, but it can with the full path of S3, this way we don’t have to download the video first, and just using it straight from S3 will work! From the information retrieved, we have filename and bucket name, so let’s build our URL from this:
> video_path = "http://{}.s3.amazonaws.com/{}".format(bucket_name, file_name)

To make things easier, let’s split the file from the folder, so we can use it anywhere we want.
> filename = file_name.split("/")[-1]

If you notice, the URL is http and not https, OpenCV will not run if is https, that’s why we specify it as just http. Just like before when we sent the local path to the file, we can run the detector using the S3 link instead.

So we had cv2.VideoCapture(“/tmp/vid_test.mp4”), and now should be cv2.VideoCapture(“http://blog4geeks.s3.amazonaws.com/original/vid_test.mp4”)

Written the Analyzed Video

Our analyzer.py has a function called video_writer, and it writes the analyzed video on /tmp/<filename> using OpenCV. We have to upload this file to S3 again, after is analyzed. Here is the code for video_writer.

def video_writer(self, prefix):

out = cv2.VideoWriter(

"/tmp/"+prefix+"_"+self.filename,

cv2.VideoWriter_fourcc(*'mp4v'),

30, (int(self.width), int(self.height))

)

return out


Prefix parameter can be red, diff or keypoints, the three videos that we are going to upload, so we have red_vid_test.mp4, diff_vid_test.mp4 and keypoints_vid_test.mp4. These are the three test that we ran on our blog.
Uploading the Analyzed File

To upload a file on S3, we need boto3 to run the S3 service, and of course we need the bucket name and the location where the file will be saved. Also, we need the credentials so boto3 can access the bucket with the required permissions. On our analyzer.py file we have the code for this, let’s check the function upload_file_s3.
from os import environ

def upload_file_s3(self, prefix):

if 'ACCESS_KEY_ID' in environ:

ACCESS_KEY_ID = environ['ACCESS_KEY_ID']

SECRET_ACCESS_KEY = environ['SECRET_ACCESS_KEY']

else:

import settings

ACCESS_KEY_ID = settings.ACCESS_KEY_ID

SECRET_ACCESS_KEY = settings.SECRET_ACCESS_KEY

s3_session = boto3.session.Session(

aws_access_key_id=ACCESS_KEY_ID,

aws_secret_access_key=SECRET_ACCESS_KEY

).resource('s3')

analyzed_video = open("/tmp/"+prefix+"_"+self.filename, "rb")

s3_session.Bucket(self.bucket).put_object(

Key="tmp_analyzed/"+prefix+"_"+self.filename,

Body=analyzed_video,

ContentType='video/mp4'

)

analyzed_video.close()

return True


If we import environ from os, we can know if we are working on Lambda or locally, so if we check that 'ACCESS_KEY_ID' is in environ, the environment variable on Lambda is set, and we should get our credentials from there, if not, then we’re working locally, so we should get them from our settings.py file. Later we use boto3 and our credentials to read and upload the file, let’s check the result.






Ummm, what? Even though we specify our file as video/mp4, it can’t be played! To change this, we need to use AWS Transcoder to transcode our video into a playable one. Let’s create our Pipeline on Elastic Transcoder first.





Pretty easy, we just need to set our bucket as input and output, we can take care of the folders later.





Our Pipeline ID is important, since we need it to run this on our Python code. The same as before, we can call Transcoder the same way we called S3.

def aws_transcode(self, prefix):

preset_id = '1351620000001-000010'

if 'ACCESS_KEY_ID' in environ:

ACCESS_KEY_ID = environ['ACCESS_KEY_ID']

SECRET_ACCESS_KEY = environ['SECRET_ACCESS_KEY']

else:

import settings

ACCESS_KEY_ID = settings.ACCESS_KEY_ID

SECRET_ACCESS_KEY = settings.SECRET_ACCESS_KEY

client_job = boto3.client(

'elastictranscoder',

aws_access_key_id=ACCESS_KEY_ID,

aws_secret_access_key=SECRET_ACCESS_KEY

)


outputs = [{


'Key': "analyzed/"+prefix+"_"+self.filename,


'PresetId': preset_id


}]


client_job.create_job(


PipelineId='1569599713304-xg0p7a',


Input={'Key': "tmp_analyzed/"+prefix+"_"+self.filename},


Outputs=outputs


)

The preset_id tells Transcoder the configuration for the output file, the one I’m using is for transcoding the file into a Generic 720p file. Here is the preset list in case you want to change it to 480p, 1080p, or iPhone format even. Later, we pass all this information to our Job, which takes care of this. This function is asynchronous, which means that the response for this is immediate, and our job is stored on a queue, which Transcoder will take care on its own. Seconds later (depending on the size of the file) we will have our result.

Uploading our code on Lambda


Now that everything is ready, let's put our code on Lambda. Check my make.sh file on Github, it has these commands.
> zip analyzer.zip analyzer.py

> zip analyzer.zip lambda_function.py

You can run this manually on the Terminal, or create a file and call it, just like I did. To run it, use ./make.sh and it will all the commands that you have on your file. With this generated analyzer.py, go to your Lambda function and upload it.




Then click on save, and after some seconds, you will see the code on the Code Editor.

Setting the Trigger

AWS has triggers that, when one service is used, it can call another automatically. Setting a trigger between S3 and Lambda is pretty easy too, since AWS allows this by default. On our bucket, go Properties and on the Events tab, create a new Notification.





That’s it! Every time a file is uploaded on original/ with the suffix .mp4 on our blog4geeks bucket, it will trigger our Lambda function. Which, is going to send the information of the file, and is going to follow the flow that we already mentioned.





Our files will be stored under analyzed/ folder, and from here, you can use anywhere you want.

Conclusions

If you notice, using Lambda and OpenCV is pretty easy, since AWS gives you all the tools needed to do it, but requires a lot of steps to take care of first, like credentials, setting Lambda, S3, Transcoder and the other resources. Aside from that, the code is almost the same, OpenCV works from local paths and S3 videos, so getting the video from one or the other doesn’t affect our analysis, we just need some steps before and after to make it work.


About 4Geeks: 4Geeks is a global product development and growth marketing company, and all-in-between, focused on 10X ROI for startups, small and mid-size companies around the world. 4Geeks serves industries like E-Commerce & Retail, Startups, HealthTech, Marketing, Banking & FinTech and Real Estate. Headquartered in United States, and nearshore development centers in Mexico and Costa Rica. Pura Vida!


Plan to start an extended engineering team in Latin America? Start right here.

Comments

Popular posts from this blog

4 Tools We Use to Manage High-Performance Distributed Engineering Teams

Scrum Methodology

The value of social media in our company