零. 前言
不久前,腾讯宣布自家的重磅中台美术工具PAG进行了开源,PAG有自己的一套AE插件和文件格式,支持特效预览和性能监控,可谓是美术最爱的所见即所得。
透明融合特效是直播产品送礼业务常用的一种特效,其特点是让MP4支持透明度,以及将文字、图片、直播视频嵌入到MP4中,一个示例视频如下:
为了学习这款强大的中台工具,便尝试用PAG做出透明特效效果,并探索其中的实现原理。下面是对PAG的iOS端部分源码的一些阅读和自己的理解。
一. PAGFile
PAGFile是包含图层、渲染基础信息的数据结构,其大体结构如下所示:
对应类继承关系如下图所示:
可以看到,我们需要得到视频、图片等数据才能转换为对应渲染用的数据,所以我们从底层往高层看,对于视频来说,我们需要获得帧信息、透明度、宽高等信息;对于遮罩来说,我们需要得到遮罩对应的每一帧的变换信息。
目前我们能拿到的只有PAG文件的地址,得到对应的stream,需要对stream进行解码和信息提取:
void ReadTags(DecodeStream* stream, T parameter, void (*reader)(DecodeStream*, TagCode, T)) {
auto header = ReadTagHeader(stream);
if (stream->context->hasException()) {
return;
}
while (header.code != TagCode::End) {
auto tagBytes = stream->readBytes(header.length);
reader(&tagBytes, header.code, parameter);
if (stream->context->hasException()) {
return;
}
header = ReadTagHeader(stream);
if (stream->context->hasException()) {
return;
}
}
}
static void ReadTag_VectorCompositionBlock(DecodeStream* stream, CodecContext* context) {
auto composition = ReadVectorComposition(stream);
context->compositions.push_back(composition);
}
static void ReadTag_VideoCompositionBlock(DecodeStream* stream, CodecContext* context) {
auto composition = ReadVideoComposition(stream);
context->compositions.push_back(composition);
}
VectorComposition* ReadVectorComposition(DecodeStream* stream) {
auto composition = new VectorComposition();
composition->id = stream->readEncodedUint32();
ReadTags(stream, composition, ReadTagsOfVectorComposition);
Codec::InstallReferences(composition->layers);
return composition;
}
读取到对应的Stream后,根据已设计好的偏移(写入和读取约定好某几位存储哪些信息),读取对应的宽、高、透明度、视频帧等信息。
VideoSequence* ReadVideoSequence(DecodeStream* stream, bool hasAlpha) {
auto sequence = new VideoSequence();
sequence->width = stream->readEncodedInt32();
sequence->height = stream->readEncodedInt32();
sequence->frameRate = stream->readFloat();
if (hasAlpha) {
sequence->alphaStartX = stream->readEncodedInt32();
sequence->alphaStartY = stream->readEncodedInt32();
}
auto sps = ReadByteDataWithStartCode(stream);
auto pps = ReadByteDataWithStartCode(stream);
sequence->headers.push_back(sps.release());
sequence->headers.push_back(pps.release());
auto count = stream->readEncodedUint32();
for (uint32_t i = 0; i < count; i++) {
auto videoFrame = new VideoFrame();
sequence->frames.push_back(videoFrame);
videoFrame->isKeyframe = stream->readBitBoolean();
}
for (uint32_t i = 0; i < count; i++) {
auto videoFrame = sequence->frames[i];
videoFrame->frame = ReadTime(stream);
videoFrame->fileBytes = ReadByteDataWithStartCode(stream).release();
}
if (stream->bytesAvailable() > 0) {
count = stream->readEncodedUint32();
for (uint32_t i = 0; i < count; i++) {
TimeRange staticTimeRange = {};
staticTimeRange.start = ReadTime(stream);
staticTimeRange.end = ReadTime(stream);
sequence->staticTimeRanges.push_back(staticTimeRange);
}
}
return sequence;
}
将视频、图片的信息读取出来后,封装成对应的Composition,其中视频的类为VideoComposition
,遮罩图片的类为VectorComposition
。
根据debug的结果可以看到,index为0的composition为透明视频
index为1的composition为遮罩图片
最后一个composition的Layers对应前两个composition生成的Layer
在构造File的时候,取最后一个compoision,生成了一个包含视频帧
和遮罩图片
信息的mainComposition
,且读取到的图层数量numLayers
为2.
File::File(std::vector<Composition*> compositionList, std::vector<pag::ImageBytes*> imageList)
: images(std::move(imageList)), compositions(std::move(compositionList)) {
mainComposition = compositions.back();
scaledTimeRange.start = 0;
scaledTimeRange.end = mainComposition->duration;
rootLayer = PreComposeLayer::Wrap(mainComposition).release();
updateEditables(mainComposition);
for (auto composition : compositions) {
if (composition->type() != CompositionType::Vector) {
_numLayers++;
continue;
}
for (auto layer : static_cast<VectorComposition*>(composition)->layers) {
if (layer->type() == LayerType::PreCompose) {
continue;
}
_numLayers++;
}
}
}
使用mainComposition
进行Layer的构造,当CompositionType
为PreCompose
的时候,说明该图层是在AE插件预生成好了的;当CompositionType
为Vector
的时候,说明该图层是可在代码层面编辑的。
在这里,遮罩图片的CompositionType
为Vector
,LayerType
为Image
,里面包含了一些出现时机、持续时间、某帧对应效果信息:
而透明通道视频的CompositionType
为PreCompose
,根据VideoSequence
记录每一帧的信息。
根据上述的两个Composition,构建出对应的Layer(PAGComposition类型),再让根rootLayer
(PAGFile类型)的layers字段持有这两个对应的Layer,合成生成PAGFile
,给业务层使用。
std::shared_ptr<PAGLayer> PAGFile::BuildPAGLayer(std::shared_ptr<File> file, Layer* layer) {
PAGLayer* pagLayer;
switch (layer->type()) {
...
case LayerType::Image: {
pagLayer = new PAGImageLayer(file, static_cast<ImageLayer*>(layer));
pagLayer->_editableIndex = file->getEditableIndex(static_cast<ImageLayer*>(layer));
} break;
case LayerType::PreCompose: {
if (layer == file->getRootLayer()) {
pagLayer = new PAGFile(file, static_cast<PreComposeLayer*>(layer));
} else {
pagLayer = new PAGComposition(file, static_cast<PreComposeLayer*>(layer));
}
auto composition = static_cast<PreComposeLayer*>(layer)->composition;
if (composition->type() == CompositionType::Vector) {
auto& layers = static_cast<VectorComposition*>(composition)->layers;
// The index order of PAGLayers is different from Layers in File.
for (int i = static_cast<int>(layers.size()) - 1; i >= 0; i--) {
auto childLayer = layers[i];
auto childPAGLayer = BuildPAGLayer(file, childLayer);
static_cast<PAGComposition*>(pagLayer)->layers.push_back(childPAGLayer);
childPAGLayer->_parent = static_cast<PAGComposition*>(pagLayer);
if (childLayer->trackMatteLayer) {
childPAGLayer->_trackMatteLayer = BuildPAGLayer(file, childLayer->trackMatteLayer);
childPAGLayer->_trackMatteLayer->trackMatteOwner = childPAGLayer.get();
}
}
}
} break;
default:
pagLayer = new PAGLayer(file, layer);
break;
}
auto shared = std::shared_ptr<PAGLayer>(pagLayer);
pagLayer->weakThis = shared;
return shared;
}
至此,一个封装好的PAGFile
就出来了,它包含了两个部分:视频信息和遮罩图片信息,渲染时根据PAGFile
的内容进行解包,转换为对应的渲染信息。
二. PAGView
PAGView主要是通过PAGPlayer类进行特效和遮罩的渲染,根据我们前面封装好的PAGFile文件,读取到视频帧、图片、位置、变换等信息。
渲染主要原理是:根据Layout信息和Texture信息,调用GL相关的Draw操作进行渲染。
由此,PAGView的主要作用为:
将前面封装好的PAGFile进行解包,得到视频对应的Sequence信息、图片对应的imageBytes信息,进行纹理读取;
同时需要读取Layout进行视频、遮罩图片的定位,最后调用GL进行渲染。
1. 关联PAGView与PAGFile
PAGStage
类继承了PAGComposition
,表示他是所有图层的根节点,被PAGPlayer
所持有,而PAGPlayer
被PAGView
持有。
下面方法是PAGStage
对PAGFile
进行doAddLayer
方法,目的是将PAGFile
下的所有图层都声明被PAGStage
持有。
bool PAGComposition::doAddLayer(std::shared_ptr<PAGLayer> pagLayer, int index) {
...
pagLayer->attachToTree(rootLocker, stage);
if (rootFile && file == pagLayer->file) {
pagLayer->onAddToRootFile(rootFile);
}
this->layers.insert(this->layers.begin() + index, pagLayer);
pagLayer->_parent = this;
...
return true;
}
void PAGComposition::onAddToStage(PAGStage* pagStage) {
PAGLayer::onAddToStage(pagStage);
for (auto& layer : layers) {
layer->onAddToStage(pagStage);
}
}
最后PAGStage
会将layer、effect等内容,绑定一个特定的id,便于之后渲染提取。自此,PAGStage
就可以知道整个渲染过程用到的所有图层、序列帧、图片信息、变换效果、位置等信息。
void PAGStage::addReference(PAGLayer* pagLayer) {
addToReferenceMap(pagLayer->uniqueID(), pagLayer);
addToReferenceMap(pagLayer->layer->uniqueID, pagLayer);
if (pagLayer->layerType() == LayerType::PreCompose) {
auto composition = static_cast<PreComposeLayer*>(pagLayer->layer)->composition;
addToReferenceMap(composition->uniqueID, pagLayer);
} else if (pagLayer->layerType() == LayerType::Image) {
auto imageBytes = static_cast<ImageLayer*>(pagLayer->layer)->imageBytes;
addToReferenceMap(imageBytes->uniqueID, pagLayer);
auto pagImage = static_cast<PAGImageLayer*>(pagLayer)->getPAGImage();
if (pagImage != nullptr) {
addReference(pagImage.get(), pagLayer);
}
}
auto targetLayer = pagLayer->layer;
for (auto& style : targetLayer->layerStyles) {
addToReferenceMap(style->uniqueID, pagLayer);
}
for (auto& effect : targetLayer->effects) {
addToReferenceMap(effect->uniqueID, pagLayer);
}
invalidateCacheScale(pagLayer);
}
2. PAGView的每帧回调渲染
PAGView
的每一帧渲染基于CADisplayLink
,每帧都会回调一次updateView
操作,使PAGPlayer加载对应的视频和图片信息。
+ (void)StartDisplayLink {
caDisplayLink = [CADisplayLink displayLinkWithTarget:[ValueAnimator class]
selector:@selector(HandleDisplayLink:)];
//这里本来是默认的mode,当ui处于drag模式下时,无法进行渲染, 所以改成commonmodes...
[caDisplayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSRunLoopCommonModes];
}
- (void)actualUpdateView {
[pagPlayer setProgress:self.animatorProgress];
[self flush];
}
- (BOOL)flush {
if (self.isInBackground) {
return false;
}
auto result = [pagPlayer flush];
if (self.bufferPrepared) {
[PAGView RegisterFlushQueueDestoryMethod];
}
return result;
}
- (void)updateViewAsync {
if (self.isAsyncFlushing) {
return;
}
self.isAsyncFlushing = TRUE;
NSOperationQueue* flushQueue = [PAGView FlushQueue];
[self retain];
NSBlockOperation* operation = [NSBlockOperation blockOperationWithBlock:^{
[self actualUpdateView];
self.isAsyncFlushing = FALSE;
dispatch_async(dispatch_get_main_queue(), ^{
[self release];
});
}];
[flushQueue addOperation:operation];
}
PAGLayer的flush操作如下,其关键的几步在于:stage->draw
、lastGraphic->prepare(renderCache)
、pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear)
bool PAGPlayer::flushInternal(BackendSemaphore* signalSemaphore) {
...
if (contentVersion != stage->getContentVersion()) {
contentVersion = stage->getContentVersion();
Recorder recorder = {};
stage->draw(&recorder);
lastGraphic = recorder.makeGraphic();
}
auto presentingStart = GetTimer();
if (lastGraphic) {
lastGraphic->prepare(renderCache);
}
if (!pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear)) {
return false;
}
...
return true;
}
2.1 图层信息的提取与封装
stage->draw
对应将PAGFile
解包,对所有图层的包含的信息进行提取,stage则相当于图层的根节点,他继承了PAGComposition
并直接调用其draw
方法:
void PAGComposition::draw(Recorder* recorder) {
...
auto count = static_cast<int>(layers.size());
for (int i = 0; i < count; i++) {
auto& childLayer = layers[i];
if (!childLayer->layerVisible) {
continue;
}
DrawChildLayer(recorder, childLayer.get());
}
...
}
根据前面我们可以知道,stage
包含了两个子图层,一个是视频图层,一个是遮罩图片图层,他们也会调用对应的draw
方法。
void PAGComposition::draw(Recorder* recorder) {
if (!contentModified() && layerCache->contentStatic()) {
// 子项未发生任何修改且内容是静态的,可以使用缓存快速跳过所有子项绘制。
getContent()->draw(recorder);
return;
}
auto preComposeLayer = static_cast<PreComposeLayer*>(layer);
auto composition = preComposeLayer->composition;
if (composition->type() == CompositionType::Bitmap ||
composition->type() == CompositionType::Video) {
auto layerFrame = layer->startTime + contentFrame;
auto compositionFrame = preComposeLayer->getCompositionFrame(layerFrame);
auto graphic = stage->getSequenceGraphic(composition, compositionFrame);
recorder->drawGraphic(graphic);
}
...
}
这里可以看到,stage
可以根据图层来找到对应的序列帧信息SequenceGraphic
,他通过图层的id
和uniqueID
进行缓存,并查找到对应的序列帧,并封装成对应的Graphic
。
std::shared_ptr<Graphic> PAGStage::getSequenceGraphic(Composition* composition,
Frame compositionFrame) {
auto result = sequenceCache.find(composition->id);
if (result != sequenceCache.end()) {
if (result->second.compositionFrame == compositionFrame) {
return result->second.graphic;
}
sequenceCache.erase(result);
}
SequenceCache cache = {};
cache.graphic = RenderSequenceComposition(composition, compositionFrame);
cache.compositionFrame = compositionFrame;
sequenceCache[composition->uniqueID] = cache;
return cache.graphic;
}
std::shared_ptr<Graphic> RenderSequenceComposition(Composition* composition,
Frame compositionFrame) {
auto sequence = Sequence::Get(composition);
if (sequence == nullptr) {
return nullptr;
}
auto sequenceFrame = sequence->toSequenceFrame(compositionFrame);
std::shared_ptr<Graphic> graphic = nullptr;
if (composition->type() == CompositionType::Video) {
graphic = MakeVideoSequenceGraphic(static_cast<VideoSequence*>(sequence), sequenceFrame);
} else {
auto proxy = new SequenceProxy(sequence, sequenceFrame, sequence->width, sequence->height);
graphic =
Picture::MakeFrom(sequence->composition->uniqueID, std::unique_ptr<SequenceProxy>(proxy));
}
auto scaleX = static_cast<float>(composition->width) / static_cast<float>(sequence->width);
auto scaleY = static_cast<float>(composition->height) / static_cast<float>(sequence->height);
return Graphic::MakeCompose(graphic, Matrix::MakeScale(scaleX, scaleY));
}
视频序列帧相关的信息则最后封装为RGBAAAPicture
static std::shared_ptr<Graphic> MakeVideoSequenceGraphic(VideoSequence* sequence,
Frame contentFrame) {
// 视频序列帧导出时没有记录准确的画面总宽高,需要自己通过 width 和 alphaStartX 计算,
// 如果遇到奇数尺寸导出插件会自动加一,这里匹配导出插件的规则。
auto videoWidth = sequence->alphaStartX + sequence->width;
if (videoWidth % 2 == 1) {
videoWidth++;
}
auto videoHeight = sequence->alphaStartY + sequence->height;
if (videoHeight % 2 == 1) {
videoHeight++;
}
auto proxy = new SequenceProxy(sequence, contentFrame, videoWidth, videoHeight);
RGBAAALayout layout = {sequence->width, sequence->height, sequence->alphaStartX,
sequence->alphaStartY};
return Picture::MakeFrom(sequence->composition->uniqueID, std::unique_ptr<SequenceProxy>(proxy),
layout);
}
std::shared_ptr<Graphic> Picture::MakeFrom(ID assetID, std::unique_ptr<TextureProxy> proxy,
const RGBAAALayout& layout) {
if (layout.alphaStartX == 0 && layout.alphaStartY == 0) {
return Picture::MakeFrom(assetID, std::move(proxy));
}
if (proxy == nullptr || layout.alphaStartX + layout.width > proxy->width() ||
layout.alphaStartY + layout.height > proxy->height()) {
return nullptr;
}
return std::shared_ptr<RGBAAAPicture>(new RGBAAAPicture(assetID, proxy.release(), layout));
}
同样地,遮罩图片也可以封装成一个Graphic
:
std::shared_ptr<Graphic> Picture::MakeFrom(ID assetID, const Bitmap& bitmap) {
if (bitmap.isEmpty()) {
return nullptr;
}
auto proxy = new BitmapTextureProxy(bitmap);
return std::shared_ptr<Graphic>(
new TextureProxyPicture(assetID, proxy, bitmap.isHardwareBacked()));
}
2.2 预渲染:Reader的加载
lastGraphic->prepare(renderCache)
主要是将前面封装好的结构,进行渲染前的解码,这里只有视频帧会有具体操作,生成一个reader并放入renderCache
的缓存中,只会在播放前生成:
VideoSequenceReader::VideoSequenceReader(std::shared_ptr<File> file, VideoSequence* sequence,
DecodingPolicy policy)
: SequenceReader(std::move(file), sequence) {
VideoConfig config = {};
auto demuxer = std::make_unique<VideoSequenceDemuxer>(sequence);
config.hasAlpha = sequence->alphaStartX + sequence->alphaStartY > 0;
config.width = sequence->alphaStartX + sequence->width;
if (config.width % 2 == 1) {
config.width++;
}
config.height = sequence->alphaStartY + sequence->height;
if (config.height % 2 == 1) {
config.height++;
}
for (auto& header : sequence->headers) {
auto bytes = ByteData::MakeWithoutCopy(header->data(), header->length());
config.headers.push_back(std::move(bytes));
}
config.mimeType = "video/avc";
config.colorSpace = YUVColorSpace::Rec601;
config.frameRate = sequence->frameRate;
reader = std::make_unique<VideoReader>(config, std::move(demuxer), policy);
}
bool RenderCache::prepareSequenceReader(Sequence* sequence, Frame targetFrame,
DecodingPolicy policy) {
auto composition = sequence->composition;
if (!_videoEnabled && composition->type() == CompositionType::Video) {
return false;
}
usedAssets.insert(composition->uniqueID);
auto staticComposition = composition->staticContent();
if (sequenceCaches.count(composition->uniqueID) != 0) {
#ifdef PAG_BUILD_FOR_WEB
sequenceCaches[composition->uniqueID]->prepareAsync(targetFrame);
#endif
return false;
}
if (staticComposition && hasSnapshot(composition->uniqueID)) {
// 静态的序列帧采用位图的缓存逻辑,如果上层缓存过 Snapshot 就不需要预测。
return false;
}
auto file = stage->getSequenceFile(sequence);
auto reader = MakeSequenceReader(file, sequence, policy);
sequenceCaches[composition->uniqueID] = reader;
reader->prepareAsync(targetFrame);
return true;
}
生成reader
之后,我们在渲染的时候就可以使用reader
来读取视频数据,从而获取对应的纹理了。
三. 开始渲染
根据前面准备好的内容,开始进行渲染操作,调用了pagSurface->draw(renderCache, lastGraphic, signalSemaphore, _autoClear)
,这里的pagSurface
是画布的上层,持有画布和负责一些渲染的调度。
bool PAGSurface::draw(RenderCache* cache, std::shared_ptr<Graphic> graphic,
BackendSemaphore* signalSemaphore, bool autoClear) {
if (device == nullptr) {
device = drawable->getDevice();
}
auto context = lockContext();
if (!context) {
return false;
}
if (surface != nullptr && autoClear && contentVersion == cache->getContentVersion()) {
unlockContext();
return false;
}
if (surface == nullptr) {
surface = drawable->createSurface(context);
}
if (surface == nullptr) {
unlockContext();
return false;
}
contentVersion = cache->getContentVersion();
cache->attachToContext(context);
auto canvas = surface->getCanvas();
if (autoClear) {
canvas->clear();
}
if (graphic) {
// FBO相关操作,对应纹理的获取、顶点、片段着色器的执行
graphic->draw(canvas, cache);
}
surface->flush(signalSemaphore);
cache->detachFromContext();
drawable->setTimeStamp(pagPlayer->getTimeStampInternal());
// EAGL RBO渲染操作
drawable->present(context);
unlockContext();
return true;
}
渲染主要是进行FBO和RBO相关的操作,对应的代码是graphic->draw(canvas, cache);
和drawable->present(context);
FBO操作中,Recorder
会将之前封装好的每个图层都一一加载,并根据预设的matrix
、blendMode
等信息,生成一条渲染链,以生成纹理信息和顶点坐标信息,最后调用GL底层接口进行相应的渲染操作。流程较长,可以到对应文件看到相关的渲染操作,此处就不贴代码了。
生成好FBO信息后,就需要对RBO进行一系列操作,最后回调给EAGLContext进行渲染。
void EAGLWindow::onPresent(Context* context, int64_t) {
auto gl = GLContext::Unwrap(context);
if (layer) {
gl->bindRenderbuffer(GL::RENDERBUFFER, colorBuffer);
auto eaglContext = static_cast<EAGLDevice*>(context->getDevice())->eaglContext();
[eaglContext presentRenderbuffer:GL::RENDERBUFFER];
gl->bindRenderbuffer(GL::RENDERBUFFER, 0);
} else {
gl->flush();
}
}
四. 总结与分析
1. PAG的工作流程
PAG在透明融合特效中的流程主要分为以下步骤:
- 设计师通过AE插件进行设计后,生成一个封装好的.pag格式的可执行文件。
- 用户侧对.pag文件进行解析,得到图层相关信息。
- 每帧回调时,调用渲染接口,对图层信息进行提取和封装;对视频信息进行解析。
- 调用底层渲染接口,从而渲染到屏幕上。
2. PAG与MP4在透明融合特效渲染的对比
对于透明融合特效功能来说,
PAG的做法是:将设计师想要的操作,哪一帧该怎么样渲染哪些图层,都浓缩在了.pag文件里面。
MP4+json文件的做法是:MP4包含了特效的原始信息,通过json来知道哪一帧应该将遮罩融合到MP4中去。
总的来说,PAG是一个大厂的优秀团队制作出来的中台产品,透明融合特效只是它能实现的一小部分功能,PAG的功能非常齐全,拓展性也很好,基于OpenGL的底层设计让他们能够用一处代码复用到多端的文件中。
PAG代码对各个层级的封装确实写得很好,各个组件各司其职参与了整个从解包到渲染的流程,其思路值得我们学习,但由于架构思路和我们产品原有的实现相差较大,所以只能从抽象意义上学习他们的思路。
从渲染性能和接入成本来说,相对于直接用MP4进行特效渲染,PAG渲染占用的CPU占比会相对较大,原因可能是对自定义的文件格式进行解包占用了一定的CPU。
与此同时libpag.framework体积大小为32.4MB。如果只需要PAG实现其中的某个功能,有种大材小用的感觉,接入的成本相对较高。后续如果需要大规模使用PAG的素材库的时候再考虑接入比较好。