一、前言
有时候在读电子文档的过程中,往往会遇到图片形式的文本,想要复制下来,记个笔记甚是不便,需要对照着打字输入,活生生被逼成键盘侠啊......
被逼无奈,何不自己造个轮子,开发一款自己专属的文字识别工具呢,于是我们找到了Matlab App Designer。
玩过 Matlab 的朋友们都知道,构建图形用户界面,Matlab提供了两种工具,一是用guide
构建,俗称GUI
,在未来版本中会移除;二是用App Designer
,俗称App
,这是官方推荐的,也是以后主流的框架。
今天我们就通过一个简单案例来介绍如何利用App
设计一个图片文字识别工具。
搭建的方式主要有两种:
App设计器:灵活、方便、简单,现代化方法;
基于
uifigure
的编程方式:灵活、重构方便,适合构建复杂、大型的图形用户界面,原始社会方法。
这里我们就以编程方式进行创建。
二、预备
1. API接口
文字识别涉及到光学字符识别(Optical Character Recognition,OCR)技术,如果我们自己造这种底层的轮子,要有高精度的识别率,那估计累得够呛。
幸运的是市场上已经有成熟的工具了,如百度智能云、阿里云、科大讯飞等均提供了API接口,只需借过来用就完事。这里主要以百度智能云
提供的文字识别API为例。
免费申请文字识别功能后,在控制台可以查看到API Key
和Secret Key
,由这两个参数可以获得access_token
,它是调用API
接口的必需参数(如下图红色方框所示)。
通过查看文字识别的技术文档,我们可以得到通用文字识别(标准版)
的请求接口,如下:
HTTP 方法:
POST
请求URL:
https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic
URL参数:属性名:access_token,值:通过
API Key
和Secret Key
获取的access_token
,参考“Access Token获取”Header:属性名:Content-Type,值:application/x-www-form-urlencoded
请求参数:属性名:image,值:图像数据,base64编码后进行urlencode,要求base64编码和urlencode后大小不超过4M,最短边至少15px,最长边最大4096px,支持jpg/jpeg/png/bmp格式
返回参数:属性名:words_result,值:识别结果数组
关于具体的HTTP请求过程接下来会细聊。
2. 图像的Base64编码
Base64是网络上最常见的用于传输8Bit字节码的编码方式之一,它是包括小写字母
a-z
、大写字母A-Z
、数字0-9
、符号+
、/
共64个字符的字符集,等号=
用来作为后缀用途。任何符号都可以转换成这个字符集中的字符,该转换过程就叫做Base64编码。Base64编码具有不可读性,需要解码后才能阅读。
许多编程语言都提供了现成的Base64编码库函数,Matlab也不例外,大家不妨 help matlab.net.base64encode
查看细节。
下面提供三种Matlab中的实现方式:
- Java类---org.apache.commons.codec.binary.Base64 和 matlab.net.base64encode
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n40" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> function base64string = img2base64(fileName)
%IMG2BASE64 Coding an image to base64 file
% INPUTS:
% fileName string, an image file name
% OUTPUTS:
% base64string string, the input image's base64 code
% USAGE:
% >>base64string = img2base64('1.jpg')
% >>base64string = 'xxx'
%
try
fid = fopen(fileName, 'rb');
bytes = fread(fid);
fclose(fid);
% -------------------------------------------
% First method
% -------------------------------------------
encoder = org.apache.commons.codec.binary.Base64;
base64string = char(encoder.encode(bytes))';
% -------------------------------------------
% Second method
% -------------------------------------------
% base64string = matlab.net.base64encode(bytes);
catch
disp('The file does not exist!');
base64string = '';
end % end try
end % end function</pre>
- 使用Python
base64
模块
Matlab中可以直接使用Python,那Python中提供的模块base64
就可以直接使用了,源代码如下:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n45" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> function base64string = img2base64_(fileName)
%IMG2BASE64 Coding an image to base64 file
% INPUTS:
% fileName string, an image file name
% OUTPUTS:
% base64string string, the input image's base64 code
% USAGE:
% >>base64string = img2base64('1.jpg')
% >>base64string = 'xxx'
%
try
f = py.open(fileName, 'rb');
bytes = f.read();
f.close();
temp = char(py.base64.b64encode(bytes));
temp = regexp(temp, '(?<=b'').+(?='')', 'match');
base64string = temp{1};
catch
disp('The file does not exist!');
base64string = '';
end % end try
end % end function</pre>
我们可以对如下所示的同一张图片(500 x 500)进行base64编码,比较一下编码速度:
结果:'/9j/4AAQSkZ...AAAAAAD/9k='
- Java类---org.apache.commons.codec.binary.Base64 ⏲ 0.000783 秒
- matlab.net.base64encode ⏲ 0.017589 秒
- Python
base64
模块 ⏲ 0.000709 秒
可以发现使用Java类和Python base64
模块的方法,速度相当,而使用matlab.net.base64encode
速度要慢20多倍,但编码一张大小为500 x 500的图像耗时0.02秒左右,其速度是非常之快了。
综合一下,我们推荐使用org.apache.commons.codec.binary.Base64
类进行base64编码。
3. 屏幕截图
识别扫描版pdf文档、视频教程等中的文字时,我们需要对待识别文字所在区域截个图,保存为图像再进行后续识别操作。要实现上述过程,首先需要对屏幕进行截图,Matlab通过借助java.awt.Robot
这个Java类来实现,截屏源代码如下所示:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n61" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> function imgData = screenSnipping
%screenSnipping Capturel full-screen to an image
% Output:
% imgData, uint8, image data.
% Source code from: https://www.mathworks.com/support/search.html/answers/362358-how-do-i-take-a-screenshot-using-matlab.html?fq=asset_type_name:answer%20category:matlab/audio-and-video&page=1
% Modified: Qingpinwangzi
% Date: Apr 14, 2021.
% Take screen capture
robo = java.awt.Robot;
tk = java.awt.Toolkit.getDefaultToolkit();
rectSize = java.awt.Rectangle(tk.getScreenSize());
cap = robo.createScreenCapture(rectSize);
% Convert to an RGB image
rgb = typecast(cap.getRGB(0, 0, cap.getWidth, cap.getHeight, [], 0, cap.getWidth), 'uint8');
imgData = zeros(cap.getHeight, cap.getWidth, 3, 'uint8');
imgData(:, :, 1) = reshape(rgb(3:4:end), cap.getWidth, [])';
imgData(:, :, 2) = reshape(rgb(2:4:end), cap.getWidth, [])';
imgData(:, :, 3) = reshape(rgb(1:4:end), cap.getWidth, [])';
end</pre>
4. 调用百度API识别文字
上述第1节中我们提到过,access_token
是调用API
接口的必需参数。通过阅读技术文档得知,需要API Key
和Secret Key
进行http请求就可以获得,核心代码如下:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n64" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> url = ['https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=', apiKey, '&client_secret=', secretKey];
res = webread(url, options);
access_token = res.access_token;</pre>
有了access_token
我们就可以调用文字识别API进行文字识别了,这里再分享下识别文字的源代码:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n66" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> function result = getWordsByBaiduOCR(fileName, apiKey, secretKey, accessToken, apiURL, outType)
%GETWORDSBYBAIDUOCR return recognition words
% INPUTS:
% fileName string, an image file name
% apiKey string, the API Key of the application
% secretKey string, The Secret Key of the application
% accessToken string, default is '', get the Access Token by API
% Key and Secret Key.
% apiURL string, such as:
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate'
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic'
% 'https://aip.baidubce.com/rest/2.0/ocr/v1/general_basic'
% outType, 'MultiLine|SingleLine'
% OUTPUTS:
% result []|struct
% USAGE:
% >>result = getWordsByBaiduOCR(fileName, apiKey, secretKey, accessToken, apiURL)
% Date: Mar 18, 2021.
% Author: 清贫王子
%
options = weboptions('RequestMethod', 'post');
if isempty(outType)
outType = 'MultiLine';
end
if isempty(accessToken)
url = ['https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=', apiKey, '&client_secret=', secretKey];
res = webread(url, options);
access_token = res.access_token;
else
access_token = accessToken;
end % end if
url = [apiURL, '?access_token=', access_token];
options.HeaderFields = { 'Content-Type', 'application/x-www-form-urlencoded'};
imgBase64String = img2base64(fileName);
if isempty(imgBase64String)
result = '';
return
end % end if
res = webwrite(url, 'image', imgBase64String, options);
wordsRsult = res.words_result;
data.ocrResultChar = '';
if strcmp(outType, 'SingleLine')
for ii = 1 : size(wordsRsult, 1)
data.ocrResultChar = [data.ocrResultChar, wordsRsult(ii,1).words];
end % end for
elseif strcmp(outType, 'MultiLine')
for ii = 1 : size(wordsRsult, 1)
data.ocrResultChar{ii} = wordsRsult(ii,1).words;
end % end for
end
result = data.ocrResultChar;
end % end function</pre>
简单测试下这个函数,输入下面所示的图片,我们进行图片(截图地址:https://ww2.mathworks.cn/products/matlab/app-designer.html)中的文字识别。
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n69" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> >> result =
1×7 cell 数组
列 1 至 4
{'App设计工具帮助您…'} {'开发专业背景。您只…'} {'面(GUI)设计布局,…'} {'编程。'}
列 5 至 7
{'要共享App,您可以使…'} {' MATLAB Compile…'} {'桌面App或 Web App'}
result{1}
ans =
'App设计工具帮助您创建专业的App,同时并不要求软件'</pre>
识别结果中共有7个cell
,代表识别了图片中的7行
文字,即1个cell
对应1行
识别的文字,如result{1}
的结果。
三、工具搭建
以基于uifigure
的编程方式创建APP,我们推荐面向对象(OOP)方法编程,简单起见,这里主要封装一个类来实现所需的功能。当然更标准的做法是利用MVC
等设计模式将界面和逻辑分离,能达到对扩展开放,对修改封闭
的软件设计原则。
1. 功能需求
我们的功能需求非常简单,主要有以下两个功能:
识别已经存在的图像中的文字
识别扫描版pdf文档、视频教程等中的文字
实现第1个功能,我们只需要加载图像,然后调用识别函数进行识别,将识别结果显示到文本区域就可以了;而实现第2个功能,首先需要屏幕截图,选取待识别文字所在的区域,存储为图像,后续处理和实现第1个功能的一样。
根据上述描述,我们需要的控件有:加载图像按钮,截图按钮,图像显示器,识别结果显示文本域。另外,需要一个清理按钮,用于清除显示的图像和识别结果;还需要一个设置按钮,用于配置API Key
和Secret Key
。
便于叙述,我们先展示下最终设计的结果,如下图所示:
文字识别工具主界面 | 设置界面 |
在设置界面中,需要两个标签和两个文本框,两外需要两个按钮。据此,我们需要的控件都清楚了,接下来让我们一起来创建他们吧!
2. 实现细节
主要封装一个类来实现所需的功能,我们给这个类起个名:ReadWords
,这个类需要继承matlab.apps.AppBase
,它的属性就是界面中的所有控件,那么这个类看上去应该是这样的:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n93" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
end % end classdef</pre>
下面说明下一些重要的属性
公有属性:
UIFig 必须是
matlab.ui.Figure
类的属性,通过uifigure
构造,这是整个工具的主窗口ContainerForMain 必须是
matlab.ui.container.GridLayout
类的属性,通过uigridlayout
构造,这是主窗口的布局容器ThisTB 必须是
matlab.ui.container.Toolbar
类的属性,通过uitoolbar
构造,这是工具栏的容器,用于放置SnippingToolBtn
、ImgLoadToolBtn
、SetupToolBtn
、CleanToolBtn
这4个工具按钮ImgShow 必须是
matlab.ui.control.Image
类的属性,通过uiimage
构造,用于显示加载或者截图后的图像WordsShowTA 必须是
matlab.ui.control.TextArea
类的属性,通过uitextarea
构造,用于显示文字识别结果ContainerForSetup 设置界面中的网格容器
APIKeyText和SecrectKeyText 主要用于输入
APIKey
和SecrectKey
ResetBtn和SaveBtn两个按钮分别用来实现重置和保存
APIKey
和SecrectKey
从属、隐藏属性:
APIKeyVal 用于接收APIKeyText中输入的
APIKey
的值SecrectKeyVal 用于接收SecrectKeyText中输入的
SecrectKey
的值
受保护属性:
-
HasSetup 用于标识是否配置了
APIKey
和SecrectKey
,默认为false
至此,我们设置好了所有的属性,然后进行构造方法、析构方法以及类方法的编写。
加上构造方法、析构方法以及从属属性APIKeyVal
和SecrectKeyVal
的get
方法的代码后看上去是这样的:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n125" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
%%
methods
% --------------------------------------
% % Constructor
% --------------------------------------
function app = ReadWords
% Create UIFigure and components
app.buildApp();
% Register the app with App Designer
registerApp(app, app.UIFig)
if nargout == 0
clear app
end
end % end Constructor
% --------------------------------------
% % Destructor
% --------------------------------------
% Code that executes before app deletion
function delete(app)
% Delete UIFigure when app is deleted
delete(app.UIFig)
end % end Constructor
% --------------------------------------
% % Get/Set methods
% --------------------------------------
% get.APIKeyVal
function apiKeyVal = get.APIKeyVal(app)
apiKeyVal = app.APIKeyText.Value;
end
% get.SecrectKeyVal
function secrectKeyVal = get.SecrectKeyVal(app)
secrectKeyVal = app.SecrectKeyText.Value;
end
end % end methods
end % end classdef</pre>
析构方法(Destructor)的写法是固定的,构造方法中的registerApp(app, app.UIFig)
也是固定的,另外的buildApp()
方法就用来创建界面、注册各个控件。
我们将后续的方法都创建为私有方法,添加了buildApp()
方法后的整个ReadWords
类是下面这样的:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n128" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
%%
methods
% --------------------------------------
% % Constructor
% --------------------------------------
function app = ReadWords
% Create UIFigure and components
app.buildApp();
% Register the app with App Designer
registerApp(app, app.UIFig)
if nargout == 0
clear app
end
end % end Constructor
% --------------------------------------
% % Destructor
% --------------------------------------
% Code that executes before app deletion
function delete(app)
% Delete UIFigure when app is deleted
delete(app.UIFig)
end % end Constructor
% --------------------------------------
% % Get/Set methods
% --------------------------------------
% get.APIKeyVal
function apiKeyVal = get.APIKeyVal(app)
apiKeyVal = app.APIKeyText.Value;
end
% get.SecrectKeyVal
function secrectKeyVal = get.SecrectKeyVal(app)
secrectKeyVal = app.SecrectKeyText.Value;
end
end % end methods
%%
methods(Access = private)
% buildApp
function buildApp(app)
%
% --------------------------------------
% % Main Figure
% --------------------------------------
app.UIFig = uifigure();
app.UIFig.Icon = 'icons/img2text.png';
app.UIFig.Name = 'ReadWords';
app.UIFig.Visible = 'off';
app.UIFig.Position = [app.UIFig.Position(1), app.UIFig.Position(2), 745, 420];
app.UIFig.AutoResizeChildren = 'on';
app.UIFig.Units = 'Normalized';
app.setAutoResize(app.UIFig, true);
% --------------------------------------
% % Toolbar
% --------------------------------------
app.ThisTB = uitoolbar(app.UIFig);
% SetupToolBtn
app.SetupToolBtn = uipushtool(app.ThisTB);
app.SetupToolBtn.Icon = 'icons/setup.png';
app.SetupToolBtn.Tooltip = 'Setup';
% SnippingToolBtn
app.SnippingToolBtn = uipushtool(app.ThisTB);
app.SnippingToolBtn.Icon = 'icons/snip.png';
app.SnippingToolBtn.Tooltip = 'Screenshot';
% ImgLoadToolBtn
app.ImgLoadToolBtn = uipushtool(app.ThisTB);
app.ImgLoadToolBtn.Icon = 'icons/load.png';
app.ImgLoadToolBtn.Tooltip = 'Load image';
% CleanToolBtn
app.CleanToolBtn = uipushtool(app.ThisTB);
app.CleanToolBtn.Icon = 'icons/clean.png';
app.CleanToolBtn.Tooltip = 'Clean';
% --------------------------------------
% % ContainerForMain
% --------------------------------------
app.ContainerForMain = uigridlayout(app.UIFig, [1, 2]);
% ContainerForMain
imgShowPanel = uipanel(app.ContainerForMain, 'Title', 'Original');
resultShowPanel = uipanel(app.ContainerForMain, 'Title', 'Result');
% ImgShow
imgShowPanelLay = uigridlayout(imgShowPanel, [1, 1]);
imgShowPanelLay.RowSpacing = 0;
imgShowPanelLay.ColumnSpacing = 0;
app.ImgShow = uiimage(imgShowPanelLay);
% WordsShowTA
resultShowPanelLay = uigridlayout(resultShowPanel, [1, 1]);
resultShowPanelLay.RowSpacing = 0;
resultShowPanelLay.ColumnSpacing = 0;
app.WordsShowTA = uitextarea(resultShowPanelLay);
app.WordsShowTA.FontSize = 22;
% --------------------------------------
% % ContainerForSetup
% --------------------------------------
app.ContainerForSetup = uigridlayout(app.UIFig, [4, 3]);
app.ContainerForSetup.RowHeight = {22, 22, 22, '1x'};
app.ContainerForSetup.ColumnWidth = {'1x', '1x', '2.5x'};
app.ContainerForSetup.Visible = 'off';
apiKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'API Key');
apiKeyLabel.HorizontalAlignment = 'right';
apiKeyLabel.Layout.Row = 1;
apiKeyLabel.Layout.Column = 1;
% APIKeyText
app.APIKeyText = uieditfield(app.ContainerForSetup);
app.APIKeyText.Layout.Row = 1;
app.APIKeyText.Layout.Column = 2;
secrectKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'Secrect Key');
secrectKeyLabel.HorizontalAlignment = 'right';
secrectKeyLabel.Layout.Row = 2;
secrectKeyLabel.Layout.Column = 1;
% SecrectKeyText
app.SecrectKeyText = uieditfield(app.ContainerForSetup);
app.SecrectKeyText.Layout.Row = 2;
app.SecrectKeyText.Layout.Column = 2;
% ResetBtn
app.ResetBtn = uibutton(app.ContainerForSetup, 'Text', 'Reset');
app.ResetBtn.Layout.Row = 3;
app.ResetBtn.Layout.Column = 1;
% SaveBtn
app.SaveBtn = uibutton(app.ContainerForSetup, 'Text', 'Save');
app.SaveBtn.Layout.Row = 3;
app.SaveBtn.Layout.Column = 2;
% Set visibility for UIFig
movegui(app.UIFig, 'center');
app.UIFig.Visible = 'on';
% --------------------------------------
% % RunstartupFcn
% --------------------------------------
app.runStartupFcn(@startupFcn);
end % end buildApp
end % methods
end % end classdef</pre>
需要注意的是,工具栏按钮和窗口的图标来源于:https://www.easyicon.cc/。一些常见的图标素材都可以从中免费下载。我们已经将图标下载完毕,需要的朋友可以点击下方链接来下载:
链接:https://pan.baidu.com/s/11kIvt4SX-MhQ2ltEeC18ZA 提取码:5i3k
另外,app.runStartupFcn(@startupFcn);
语句调用的是父类matlab.apps.AppBase
的方法,我们将各个控件的注册任务放在startupFcn
这个方法中完成。这里不妨先注释掉这个语句,直接运行ReadWords.m
便可以显示出我们刚才在buildApp
方法中构造的界面了,动图演示如下:
可以看到,我们在点击工具栏各个按钮时,没有反应,这是因为到目前为止我们还没有给各个控件注册回调方法,那接下来将会在startupFcn
这个方法中完成各个控件的注册任务,代码如下:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n135" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> classdef ReadWords < matlab.apps.AppBase
%%
properties
UIFig matlab.ui.Figure
ContainerForMain matlab.ui.container.GridLayout
ThisTB matlab.ui.container.Toolbar
SnippingToolBtn matlab.ui.container.toolbar.PushTool
ImgLoadToolBtn matlab.ui.container.toolbar.PushTool
SetupToolBtn matlab.ui.container.toolbar.PushTool
CleanToolBtn matlab.ui.container.toolbar.PushTool
ImgShow matlab.ui.control.Image
WordsShowTA matlab.ui.control.TextArea
ContainerForSetup matlab.ui.container.GridLayout
APIKeyText matlab.ui.control.EditField
SecrectKeyText matlab.ui.control.EditField
ResetBtn matlab.ui.control.Button
SaveBtn matlab.ui.control.Button
end % end properties
%%
properties(Hidden, Dependent)
APIKeyVal
SecrectKeyVal
end % end properties
%%
properties(Access = protected)
HasSetup = false
end % end properties
%%
methods
% --------------------------------------
% % Constructor
% --------------------------------------
function app = ReadWords
% Create UIFigure and components
app.buildApp();
% Register the app with App Designer
registerApp(app, app.UIFig)
if nargout == 0
clear app
end
end % end Constructor
% --------------------------------------
% % Destructor
% --------------------------------------
% Code that executes before app deletion
function delete(app)
% Delete UIFigure when app is deleted
delete(app.UIFig)
end % end Constructor
% --------------------------------------
% % Get/Set methods
% --------------------------------------
% get.APIKeyVal
function apiKeyVal = get.APIKeyVal(app)
apiKeyVal = app.APIKeyText.Value;
end
% get.SecrectKeyVal
function secrectKeyVal = get.SecrectKeyVal(app)
secrectKeyVal = app.SecrectKeyText.Value;
end
end % end methods
%%
methods(Access = private)
% buildApp
function buildApp(app)
%
% --------------------------------------
% % Main Figure
% --------------------------------------
app.UIFig = uifigure();
app.UIFig.Icon = 'icons/img2text.png';
app.UIFig.Name = 'ReadWords';
app.UIFig.Visible = 'off';
app.UIFig.Position = [app.UIFig.Position(1), app.UIFig.Position(2), 745, 420];
app.UIFig.AutoResizeChildren = 'on';
app.UIFig.Units = 'Normalized';
app.setAutoResize(app.UIFig, true);
% --------------------------------------
% % Toolbar
% --------------------------------------
app.ThisTB = uitoolbar(app.UIFig);
% SetupToolBtn
app.SetupToolBtn = uipushtool(app.ThisTB);
app.SetupToolBtn.Icon = 'icons/setup.png';
app.SetupToolBtn.Tooltip = 'Setup';
% SnippingToolBtn
app.SnippingToolBtn = uipushtool(app.ThisTB);
app.SnippingToolBtn.Icon = 'icons/snip.png';
app.SnippingToolBtn.Tooltip = 'Screenshot';
% ImgLoadToolBtn
app.ImgLoadToolBtn = uipushtool(app.ThisTB);
app.ImgLoadToolBtn.Icon = 'icons/load.png';
app.ImgLoadToolBtn.Tooltip = 'Load image';
% CleanToolBtn
app.CleanToolBtn = uipushtool(app.ThisTB);
app.CleanToolBtn.Icon = 'icons/clean.png';
app.CleanToolBtn.Tooltip = 'Clean';
% --------------------------------------
% % ContainerForMain
% --------------------------------------
app.ContainerForMain = uigridlayout(app.UIFig, [1, 2]);
% ContainerForMain
imgShowPanel = uipanel(app.ContainerForMain, 'Title', 'Original');
resultShowPanel = uipanel(app.ContainerForMain, 'Title', 'Result');
% ImgShow
imgShowPanelLay = uigridlayout(imgShowPanel, [1, 1]);
imgShowPanelLay.RowSpacing = 0;
imgShowPanelLay.ColumnSpacing = 0;
app.ImgShow = uiimage(imgShowPanelLay);
% WordsShowTA
resultShowPanelLay = uigridlayout(resultShowPanel, [1, 1]);
resultShowPanelLay.RowSpacing = 0;
resultShowPanelLay.ColumnSpacing = 0;
app.WordsShowTA = uitextarea(resultShowPanelLay);
app.WordsShowTA.FontSize = 22;
% --------------------------------------
% % ContainerForSetup
% --------------------------------------
app.ContainerForSetup = uigridlayout(app.UIFig, [4, 3]);
app.ContainerForSetup.RowHeight = {22, 22, 22, '1x'};
app.ContainerForSetup.ColumnWidth = {'1x', '1x', '2.5x'};
app.ContainerForSetup.Visible = 'off';
apiKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'API Key');
apiKeyLabel.HorizontalAlignment = 'right';
apiKeyLabel.Layout.Row = 1;
apiKeyLabel.Layout.Column = 1;
% APIKeyText
app.APIKeyText = uieditfield(app.ContainerForSetup);
app.APIKeyText.Layout.Row = 1;
app.APIKeyText.Layout.Column = 2;
secrectKeyLabel = uilabel(app.ContainerForSetup, 'Text', 'Secrect Key');
secrectKeyLabel.HorizontalAlignment = 'right';
secrectKeyLabel.Layout.Row = 2;
secrectKeyLabel.Layout.Column = 1;
% SecrectKeyText
app.SecrectKeyText = uieditfield(app.ContainerForSetup);
app.SecrectKeyText.Layout.Row = 2;
app.SecrectKeyText.Layout.Column = 2;
% ResetBtn
app.ResetBtn = uibutton(app.ContainerForSetup, 'Text', 'Reset');
app.ResetBtn.Layout.Row = 3;
app.ResetBtn.Layout.Column = 1;
% SaveBtn
app.SaveBtn = uibutton(app.ContainerForSetup, 'Text', 'Save');
app.SaveBtn.Layout.Row = 3;
app.SaveBtn.Layout.Column = 2;
% Set visibility for UIFig
movegui(app.UIFig, 'center');
app.UIFig.Visible = 'on';
% --------------------------------------
% % RunstartupFcn
% --------------------------------------
app.runStartupFcn(@startupFcn);
end % end buildApp
% startupFcn
function startupFcn(app, ~, ~)
% Setup APIKeyText and SecrectKeyText
if exist('apikey.mat', 'file')
temp = load('apikey.mat');
app.APIKeyText.Value = temp.key.apiKeyVal;
app.APIKeyText.Editable = 'off';
app.SecrectKeyText.Value = temp.key.secrectKeyVal;
app.SecrectKeyText.Editable = 'off';
end
% Register callback
app.SnippingToolBtn.ClickedCallback = @app.clickedSnippingToolBtn;
app.ImgLoadToolBtn.ClickedCallback = @app.clickedImgLoadToolBtn;
app.SetupToolBtn.ClickedCallback = @app.clickedSetupToolBtn;
app.CleanToolBtn.ClickedCallback = @app.clickedCleanToolBtn;
app.ResetBtn.ButtonPushedFcn = @app.callbackResetBtn;
app.SaveBtn.ButtonPushedFcn = @app.callbackSaveBtn;
end % end function
end % methods
end % end classdef</pre>
由此,我们总共为6个按钮注册了6个回调方法,需要都进行实现,不然触发按钮时,该按钮不会做出响应。简单起见,这里我们以实现设置界面中的SaveBtn
的回调方法callbackSaveBtn
为例子来说明。
在没有设置APIKey
或SecrectKey
前,触发SnippingToolBtn
或者ImgLoadToolBtn
会有先进行设置的提示:
callbackSaveBtn
方法实现的逻辑:首先由HasSetup
属性判断是否进行了APIKey
和SecrectKey
的设置(初始默认是false
没有设置),如果没有设置,会提示没有APIKey
或SecrectKey
,则需要输入APIKey
和SecrectKey
的值,然后点击保存按钮,那么后台会将获取到的值存储下来(.mat文件),更新HasSetup
的值为true
,后续我们就不必要再次输入了,要想更换值的话,点击重置按钮重新配置即可;如果进行了设置(HasSetup
属性为true
),直接保存即可。
具体的代码如下:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n141" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> % --------------------------------------
% % Callback functions
% --------------------------------------
% callbackSaveBtn
function callbackSaveBtn(app, ~, ~)
if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value)
key.apiKeyVal = app.APIKeyText.Value;
key.secrectKeyVal = app.SecrectKeyText.Value;
if exist('apikey.mat', 'file')
delete('apikey.mat');
end
save('apikey.mat', 'key');
!attrib +s +h apikey.mat
uialert(app.UIFig, 'Save successfully!', 'Confirm', 'Icon', 'success');
app.APIKeyText.Editable = 'off';
app.SecrectKeyText.Editable = 'off';
else
uialert(app.UIFig, 'API Key or Secrect Key is empty!', 'Confirm', 'Icon', 'warning');
end % end if
end % callbackSaveBtn</pre>
实现了保存按钮的功能后,就可以得到如下动图所示的效果了。
其他的回调函数源代码:
<pre class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" lang="matlab" cid="n145" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.8rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: var(--codeboxes); position: relative !important; border-radius: 0.3rem; color: rgb(255, 255, 255); padding: 8px 1.5rem 6px 0px; margin-bottom: 1.5rem; margin-top: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> % clickedSnippingToolBtn
function clickedSnippingToolBtn(app, ~, ~)
if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value)
app.UIFig.Visible = 'off';
pause(0.1);
outFileName = 'temp.png';
cropImg(outFileName);
!attrib +s +h temp.png
%
app.ImgShow.ImageSource = imread(outFileName);
app.UIFig.Visible = 'on';
%
apiURL = 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic';
words = getWordsByBaiduOCR(outFileName, app.APIKeyVal, app.SecrectKeyVal, '', apiURL, 'MultiLine');
app.WordsShowTA.Value = words;
else
msg = {'API Key or Secrect Key is empty!'; 'Please set it up first!'};
uialert(app.UIFig, msg, 'Confirm', 'Icon', 'warning');
end
end % end clickedSnippingToolBtn
% clickedImgLoadToolBtn
function clickedImgLoadToolBtn(app, ~, ~)
if ~isempty(app.SecrectKeyText.Value) && ~isempty(app.APIKeyText.Value)
[fName, fPath] = uigetfile({'.png'; '.jpg'; '.bmp'; '.tif'}, 'Open image');
if ~isequal(any([fName, fPath]), 0)
img = imread(strcat(fPath, fName));
outFileName = 'temp.png';
if exist(outFileName, 'file')
delete(outFileName)
end
imwrite(img, outFileName);
!attrib +s +h temp.png
%
app.ImgShow.ImageSource = imread(outFileName);
app.UIFig.Visible = 'on';
%
apiURL = 'https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic';
words = getWordsByBaiduOCR(outFileName, app.APIKeyVal, app.SecrectKeyVal, '', apiURL, 'MultiLine');
app.WordsShowTA.Value = words;
else
return
end % end if
else % end if
msg = {'API Key or Secrect Key is empty!'; 'Please set it up first!'};
uialert(app.UIFig, msg, 'Confirm', 'Icon', 'warning');
end
end % end clickedImgLoadToolBtn
% clickedSetupToolBtn
function clickedSetupToolBtn(app, ~, ~)
if ~app.HasSetup
app.ContainerForMain.Visible = 'off';
app.ContainerForSetup.Visible = 'on';
app.HasSetup = true;
else
app.ContainerForMain.Visible = 'on';
app.ContainerForSetup.Visible = 'off';
app.HasSetup = false;
end
end % end clickedSetupToolBtn
% clickedCleanToolBtn
function clickedCleanToolBtn(app, ~, ~)
app.WordsShowTA.Value = '';
app.ImgShow.ImageSource = '';
end % end clickedCleanToolBtn
% callbackResetBtn
function callbackResetBtn(app, ~, ~)
app.APIKeyText.Value = '';
app.APIKeyText.Editable = 'on';
app.SecrectKeyText.Value = '';
app.SecrectKeyText.Editable = 'on';
end % callbackResetBtn</pre>
四、使用演示
现在让我们来测试一下搭建的图像识别工具吧,比如,某麻子同学是一名研究生,在阅读那种扫描版的pdf文献时,想把其中的一段语句复制下来用于记录笔记或者做PPT用,这时我们的工具就派上用场了:
刹那间,某麻子同学得到了想要的结果,露出了久违的幸福的一笑!
五、结语
至此,我们完成了一个比较完整的文字识别工具!希望您喜欢,并且可以从中获得有用的东西。
本文完整代码,请在GZH内回复“文字识别工具”进行下载。
【往期推荐】