搭建基于Python的视频服务器

本文基于Python后台搭建了一个视频服务器，集视频的上传，视频的格式转化，视频的播放于一体，后台基于Django框架和Amazon S3的存储，视频格式的转换基于Encoding.com的在线服务，消息队列基于RabbitMQ，视频上传和转换处理完毕后，采用浏览器的Html5播放，采用了Video.js。

Stickyworld’s consultation web app has supported video for a long time but it’s been hosted via a YouTube embed. When we started building the new version of the web app we wanted to take control of the video content and also free our users from YouTube’s terms of service.

I personally had worked on projects with clients in the past which did video transcoding and it was never something easy to achieve. It takes a lot to accept every video, audio and container format under the sun and output them into various video formats that the web knows and loves.

With that in mind we decided the conversion process would be handled byEncoding.com. They let you encode your first Gigabyte of video with them for free and then have a tiered pricing system there after.

Throughout the development of the code below I would upload a two-second, 178KB video to test that everything was working. When the exceptions stopped being raised, I tested larger and more exotic files.

Stage 1: The User uploads a video

At the moment the new codebase just has a quick-and-dirty HTML5-based uploading mechanism. This is the CoffeeScript for uploading from the client to the server:

$scope.upload_slide = (upload_slide_form) ->    file = document.getElementById("slide_file").files[0]    reader = new FileReader()    reader.readAsDataURL file    reader.onload = (event) ->      result = event.target.result      fileName = document.getElementById("slide_file").files[0].name      $.post "/world/upload_slide",        data: result        name: fileName        room_id: $scope.room.id        (response_data) ->          if response_data.success? is not yes            console.error "There was an error uploading the file", response_data          else            console.log "Upload successful", response_data    reader.onloadstart = ->      console.log "onloadstart"    reader.onprogress = (event) ->      console.log "onprogress", event.total, event.loaded, (event.loaded / event.total) * 100    reader.onabort = ->      console.log "onabort"    reader.onerror = ->      console.log "onerror"    reader.onloadend = (event) ->      console.log "onloadend", event

It would be nice to loop through?("slide_file").files?and upload each file individually instead of just the first file. This is somewhere we’ll be addressing soon.

Stage 2: Validate and upload to S3

On the backend we’re running Django and RabbitMQ. The key modules we’re using are:

$ pip install 'Django>=1.5.2' 'django-celery>=3.0.21' \    'django-storages>=1.1.8' 'lxml>=3.2.3' 'python-magic>=0.4.3'

I created two models:?SlideUploadQueue?to store references to all the uploads and?SlideVideoMedia?to store all the references to the processed videos.

class SlideUploadQueue(models.Model):    created_by = models.ForeignKey(User)    created_time = models.DateTimeField(db_index=True)    original_file = models.FileField(        upload_to=filename_sanitiser, blank=True, default='')    media_type = models.ForeignKey(MediaType)    encoding_com_tracking_code = models.CharField(        default='', max_length=24, blank=True)    STATUS_AWAITING_DATA = 0    STATUS_AWAITING_PROCESSING = 1    STATUS_PROCESSING = 2    STATUS_AWAITING_3RD_PARTY_PROCESSING = 5    STATUS_FINISHED = 3    STATUS_FAILED = 4    STATUS_LIST = (        (STATUS_AWAITING_DATA, 'Awaiting Data'),        (STATUS_AWAITING_PROCESSING, 'Awaiting processing'),        (STATUS_PROCESSING, 'Processing'),        (STATUS_AWAITING_3RD_PARTY_PROCESSING,            'Awaiting 3rd-party processing'),        (STATUS_FINISHED, 'Finished'),        (STATUS_FAILED, 'Failed'),    )    status = models.PositiveSmallIntegerField(        default=STATUS_AWAITING_DATA, choices=STATUS_LIST)    class Meta:        verbose_name = 'Slide'        verbose_name_plural = 'Slide upload queue'    def save(self, *args, **kwargs):        if not self.created_time:            self.created_time = \                datetime.utcnow().replace(tzinfo=pytz.utc)        return super(SlideUploadQueue, self).save(*args, **kwargs)    def __unicode__(self):        if self.id is None:            return 'new '        return ' %d' % self.idclass SlideVideoMedia(models.Model):    converted_file = models.FileField(        upload_to=filename_sanitiser, blank=True, default='')    FORMAT_MP4 = 0    FORMAT_WEBM = 1    FORMAT_OGG = 2    FORMAT_FL9 = 3    FORMAT_THUMB = 4    supported_formats = (        (FORMAT_MP4, 'MPEG 4'),        (FORMAT_WEBM, 'WebM'),        (FORMAT_OGG, 'OGG'),        (FORMAT_FL9, 'Flash 9 Video'),        (FORMAT_THUMB, 'Thumbnail'),    )    mime_types = (        (FORMAT_MP4, 'video/mp4'),        (FORMAT_WEBM, 'video/webm'),        (FORMAT_OGG, 'video/ogg'),        (FORMAT_FL9, 'video/mp4'),        (FORMAT_THUMB, 'image/jpeg'),    )    format = models.PositiveSmallIntegerField(        default=FORMAT_MP4, choices=supported_formats)    class Meta:        verbose_name = 'Slide video'        verbose_name_plural = 'Slide videos'    def __unicode__(self):        if self.id is None:            return 'new '        return ' %d' % self.id

Our models use a?filename_sanitiser?method in each?models.FileField?field to automatically adjust filenames into a?<model>/<uuid4>.<extention>format. This sanitises each filename and makes sure they’re unique. To add to that, we used signed URLs that expire so we can control who is served our content and for how long.

def filename_sanitiser(instance, filename):    folder = instance.__class__.__name__.lower()    ext = 'jpg'    if '.' in filename:        t_ext = filename.split('.')[-1].strip().lower()        if t_ext != '':            ext = t_ext    return '%s/%s.%s' % (folder, str(uuid.uuid4()), ext)

A file uploaded as?testing.mov?would turn into?https://our-bucket.s3.amazonaws.com/slideuploadqueue/3fe27193-e87f-4244-9aa2-66409f70ebd3.mov?and would be uploaded by Django Storages.

In our backend endpoint where video is uploaded to from the browser we validate the uploaded content using?Magic. This detects what kind of file it is based on it’s contents:

@verify_auth_token@return_jsondef upload_slide(request):    file_data = request.POST.get('data', '')    file_data = base64.b64decode(file_data.split(';base64,')[1])    description = magic.from_buffer(file_data)

So if?description?matches something like?MPEG v4 system?or?Apple QuickTime movie?then we know it’s suitable for transcoding. If it isn’t something like the above, we can flag it up with the user.

Next, we’ll save the video into a our?SlideUploadQueue?model and send a job off to RabbitMQ. Because we’re using Django Storages, it’ll be uploaded to Amazon S3 automatically.

slide_upload = SlideUploadQueue()...slide_upload.status = SlideUploadQueue.STATUS_AWAITING_PROCESSINGslide_upload.save()slide_upload.original_file.\    save('anything.%s' % file_ext, ContentFile(file_data))slide_upload.save()task = ConvertRawSlideToSlide()task.delay(slide_upload)

Stage 3: Send the video to a 3rd party

RabbitMQ will take over and handle the?task.delay(slide_upload)?call.

Here all we’re doing is sending?Encoding.com?a URL of our video and instructions on what output formats we want. They’ll give us a job number in return which we’ll use to check up on the progress on the transcoding at a later point.

class ConvertRawSlideToSlide(Task):    queue = 'backend_convert_raw_slides'    ...    def _handle_video(self, slide_upload):        mp4 = {            'output': 'mp4',            'size': '320x240',            'bitrate': '256k',            'audio_bitrate': '64k',            'audio_channels_number': '2',            'keep_aspect_ratio': 'yes',            'video_codec': 'mpeg4',            'profile': 'main',            'vcodecparameters': 'no',            'audio_codec': 'libfaac',            'two_pass': 'no',            'cbr': 'no',            'deinterlacing': 'no',            'keyframe': '300',            'audio_volume': '100',            'file_extension': 'mp4',            'hint': 'no',        }        webm = {            'output': 'webm',            'size': '320x240',            'bitrate': '256k',            'audio_bitrate': '64k',            'audio_sample_rate': '44100',            'audio_channels_number': '2',            'keep_aspect_ratio': 'yes',            'video_codec': 'libvpx',            'profile': 'baseline',            'vcodecparameters': 'no',            'audio_codec': 'libvorbis',            'two_pass': 'no',            'cbr': 'no',            'deinterlacing': 'no',            'keyframe': '300',            'audio_volume': '100',            'preset': '6',            'file_extension': 'webm',            'acbr': 'no',        }        ogg = {            'output': 'ogg',            'size': '320x240',            'bitrate': '256k',            'audio_bitrate': '64k',            'audio_sample_rate': '44100',            'audio_channels_number': '2',            'keep_aspect_ratio': 'yes',            'video_codec': 'libtheora',            'profile': 'baseline',            'vcodecparameters': 'no',            'audio_codec': 'libvorbis',            'two_pass': 'no',            'cbr': 'no',            'deinterlacing': 'no',            'keyframe': '300',            'audio_volume': '100',            'file_extension': 'ogg',            'acbr': 'no',        }        flv = {            'output': 'fl9',            'size': '320x240',            'bitrate': '256k',            'audio_bitrate': '64k',            'audio_channels_number': '2',            'keep_aspect_ratio': 'yes',            'video_codec': 'libx264',            'profile': 'high',            'vcodecparameters': 'no',            'audio_codec': 'libfaac',            'two_pass': 'no',            'cbr': 'no',            'deinterlacing': 'no',            'keyframe': '300',            'audio_volume': '100',            'file_extension': 'mp4',        }        thumbnail = {            'output': 'thumbnail',            'time': '5',            'video_codec': 'mjpeg',            'keep_aspect_ratio': 'yes',            'file_extension': 'jpg',        }        encoder = Encoding(settings.ENCODING_API_USER_ID,            settings.ENCODING_API_USER_KEY)        resp = encoder.add_media(source=[slide_upload.original_file.url],            formats=[mp4, webm, ogg, flv, thumbnail])        media_id = None        if resp is not None and resp.get('response') is not None:            media_id = resp.get('response').get('MediaID')        if media_id is None:            slide_upload.status = SlideUploadQueue.STATUS_FAILED            slide_upload.save()            log.error('Unable to communicate with encoding.com')            return False        slide_upload.encoding_com_tracking_code = media_id        slide_upload.status = \            SlideUploadQueue.STATUS_AWAITING_3RD_PARTY_PROCESSING        slide_upload.save()        return True

Encoding.com?recommended some?less-than-ideal python wrappers?for communicating with their service. I added a few fixes into the module but there is still work to be done to get it to a state I’m happy with. Below is what this wrapper currently looks like in our codebase:

import httplibfrom lxml import etreeimport urllibfrom xml.parsers.expat import ExpatErrorimport xmltodictENCODING_API_URL = 'manage.encoding.com:80'class Encoding(object):    def __init__(self, userid, userkey, url=ENCODING_API_URL):        self.url = url        self.userid = userid        self.userkey = userkey    def get_media_info(self, action='GetMediaInfo', ids=[],        headers={'Content-Type': 'application/x-www-form-urlencoded'}):        query = etree.Element('query')        nodes = {            'userid': self.userid,            'userkey': self.userkey,            'action': action,            'mediaid': ','.join(ids),        }        query = self._build_tree(etree.Element('query'), nodes)        results = self._execute_request(query, headers)        return self._parse_results(results)    def get_status(self, action='GetStatus', ids=[], extended='no',        headers={'Content-Type': 'application/x-www-form-urlencoded'}):        query = etree.Element('query')        nodes = {            'userid': self.userid,            'userkey': self.userkey,            'action': action,            'extended': extended,            'mediaid': ','.join(ids),        }        query = self._build_tree(etree.Element('query'), nodes)        results = self._execute_request(query, headers)        return self._parse_results(results)    def add_media(self, action='AddMedia', source=[], notify='', formats=[],        instant='no',        headers={'Content-Type': 'application/x-www-form-urlencoded'}):        query = etree.Element('query')        nodes = {            'userid': self.userid,            'userkey': self.userkey,            'action': action,            'source': source,            'notify': notify,            'instant': instant,        }        query = self._build_tree(etree.Element('query'), nodes)        for format in formats:            format_node = self._build_tree(etree.Element('format'), format)            query.append(format_node)        results = self._execute_request(query, headers)        return self._parse_results(results)    def _build_tree(self, node, data):        for k, v in data.items():            if isinstance(v, list):                for item in v:                    element = etree.Element(k)                    element.text = item                    node.append(element)            else:                element = etree.Element(k)                element.text = v                node.append(element)        return node    def _execute_request(self, xml, headers, path='', method='POST'):        params = urllib.urlencode({'xml': etree.tostring(xml)})        conn = httplib.HTTPConnection(self.url)        conn.request(method, path, params, headers)        response = conn.getresponse()        data = response.read()        conn.close()        return data    def _parse_results(self, results):        try:            return xmltodict.parse(results)        except ExpatError, e:            print 'Error parsing encoding.com response'            print e            return None

Left on the todo list include HTTPS-only transmission with strict validation of?Encoding.com’s SSL certificate and to write some unit tests (tickets, tickets and more tickets).

Stage 4: Download all new video formats

We have a periodic job running every 15 seconds via RabbitMQ which is checking up on the progress of the video transcoding:

class CheckUpOnThirdParties(PeriodicTask):    run_every = timedelta(seconds=settings.THIRD_PARTY_CHECK_UP_INTERVAL)    ...    def _handle_encoding_com(self, slides):        format_lookup = {            'mp4': SlideVideoMedia.FORMAT_MP4,            'webm': SlideVideoMedia.FORMAT_WEBM,            'ogg': SlideVideoMedia.FORMAT_OGG,            'fl9': SlideVideoMedia.FORMAT_FL9,            'thumbnail': SlideVideoMedia.FORMAT_THUMB,        }        encoder = Encoding(settings.ENCODING_API_USER_ID,            settings.ENCODING_API_USER_KEY)        job_ids = [item.encoding_com_tracking_code for item in slides]        resp = encoder.get_status(ids=job_ids)        if resp is None:            log.error('Unable to check up on encoding.com')            return False

We’ll go through the response from?Encoding.com?validating each piece of the response as we go along:

if resp.get('response') is None:    log.error('Unable to get response node from encoding.com')    return Falseresp_id = resp.get('response').get('id')if resp_id is None:    log.error('Unable to get media id from encoding.com')    return Falseslide = SlideUploadQueue.objects.filter(    status=SlideUploadQueue.STATUS_AWAITING_3RD_PARTY_PROCESSING,    encoding_com_tracking_code=resp_id)if len(slide) != 1:    log.error('Unable to find a single record for %s' % resp_id)    return Falseresp_status = resp.get('response').get('status')if resp_status is None:    log.error('Unable to get status from encoding.com')    return Falseif resp_status != u'Finished':    log.debug("%s isn't finished, will check back later" % resp_id)    return Trueformats = resp.get('response').get('format')if formats is None:    log.error("No output formats were found. Something's wrong.")    return Falsefor format in formats:    try:        assert format.get('status') == u'Finished', \        "%s is not finished. Something's wrong." % format.get('id')        output = format.get('output')        assert output in ('mp4', 'webm', 'ogg', 'fl9',            'thumbnail'), 'Unknown output format %s' % output        s3_dest = format.get('s3_destination')        assert 'http://encoding.com.result.s3.amazonaws.com/'\            in s3_dest, 'Suspicious S3 url: %s' % s3_dest        https_link = \            'https://s3.amazonaws.com/encoding.com.result/%s' %\            s3_dest.split('/')[-1]        file_ext = https_link.split('.')[-1].strip()        assert len(file_ext) > 0,\            'Unable to get file extension from %s' % https_link        count = SlideVideoMedia.objects.filter(slide_upload=slide,            format=format_lookup[output]).count()        if count != 0:            print 'There is already a %s file for this slide' % output            continue        content = self.download_content(https_link)        assert content is not None,\            'There is no content for %s' % format.get('id')    except AssertionError, e:        log.error('A format did not pass all assertions: %s' % e)        continue

At this point we’ve asserted everything is as it should be a we can save each of the videos:

media = SlideVideoMedia()media.format = format_lookup[output]media.converted_file.save('blah.%s' % file_ext, ContentFile(content))media.save()

Stage 5: Video via HTML5

On our frontend we’ve created a page with an HTML5 video element. We’re using?video.js?to display the video in the best-supported format for each browser.

? bower install video.jsbower caching git://github.com/videojs/video.js-component.gitbower cloning git://github.com/videojs/video.js-component.gitbower fetching video.jsbower checking out video.js#v4.0.3bower copying /home/mark/.bower/cache/video.js/5ab058cd60c5615aa38e8e706cd0f307bower installing video.js#4.0.3

In our?index.jade?file we include it’s dependencies:

!!! 5html(lang="en", class="no-js")  head    meta(http-equiv='Content-Type', content='text/html; charset=UTF-8')    ...    link(rel='stylesheet', type='text/css', href="http://blog.zhourunsheng.com/components/video-js-4.1.0/video-js.css")    script(type='text/javascript', src="http://blog.zhourunsheng.com/components/video-js-4.1.0/video.js")

In a Angular.js/JADE-based template we’ve included a?<video>?tag and it’s?<source>?children tags. There is also a?poster?element that will show a static image of the video we’ve transcoded from the first few moments of the video.

#main.span12    video#example_video_1.video-js.vjs-default-skin(controls, preload="auto", width="640", height="264", poster="{{video_thumbnail}}", data-setup='{"example_option":true}', ng-show="videos")        source(ng-repeat="video in videos", src="http://blog.zhourunsheng.com/2013/08/%e6%90%ad%e5%bb%ba%e5%9f%ba%e4%ba%8epython%e7%9a%84%e8%a7%86%e9%a2%91%e6%9c%8d%e5%8a%a1%e5%99%a8/{{video.src}}", type="{{video.type}}")

This will print out every video format we’ve converted into, each as it’s own?<source>?tag.?Video.js?will decide which of them to play based on the browser the user is using.

We still have a lot of work to do around fallback support, building unit tests and improving the robustness of our?Encoding.com?service wrapper. If this sort of work interests you please do?get in touch.

文章节选：http://techblog.stickyworld.com/video-with-python.html

The post 搭建基于Python的视频服务器 appeared first on 润物无声.

击败不等于击倒，跌倒了，爬起来，想一想，为什么跌倒了，

相关文章：

你感兴趣的文章：

标签云：