java如何提高百度文字识别的准确度

Python018

java如何提高百度文字识别的准确度,第1张

java文字识别程序的关键是寻找一个可以调用的OCR引擎。tesseract-ocr就是一个这样的OCR引擎,在1985年到1995年由HP实验室开发,现在在Google。tesseract-ocr 3.0发布,支持中文。不过tesseract-ocr 3.0不是图形化界面的客户端,别人写的FreeOCR图形化客户端还不支持导入新的 3.0 traineddata。但这标志着,现在有自由的中文OCR软件了。

java中使用tesseract-ocr3.01的步骤如下:

1.下载安装tesseract-ocr-setup-3.01-1.exe(3.0以上版本才增加了中文识别)

2.在安装向导中可以选择需要下载的语言包。

3.到网上搜索下载java图形处理所需的2个包:jai_imageio-1.1-alpha.jar,swingx-1.6.1.jar

4.java程序清单:

文字识别私有化部署方案

可部署至「本地服务器」的文字识别服务,支持主流 CPU/GPU 环境及国产化系统部署,通用场景、卡证、票据、iOCR 等各类 OCR 模型及自定义平台均可提供容器化部署包,在专有网络环境下一键部署应用,保障数据私密性。同时,可提供通用型一体机或国产化一体机,软硬一体交付,开箱即用,统一维保

快捷部署

容器化打包,支持本地物理机、私有云等多种部署方式,提供一键部署工具和常用运维工具,快速接入、高效运维

数据安全

专有网络环境下本地化部署,数据无需公网上传,实现业务网络公私分离,保障企业核心生产数据的私密性要求

适配广泛

CPU 及 GPU 环境均可部署,主流 GPU 显卡类型均已适配,并可支持国产化系统部署

授权灵活

根据QPS和使用期限进行授权,可自由选择不同QPS配置,灵活适应不同场景、不同业务的并发量需求

成为开发者

三步完成账号的基本注册与认证:

STEP1:点击百度AI开放平台导航右侧的控制台,选择需要使用的AI服务项。若为未登录状态,将跳转至登录界面,请您使用百度账号登录。如还未持有百度账户,可以点击此处注册百度账户。

STEP2:首次使用,登录后将会进入开发者认证页面,请填写相关信息完成开发者认证。注:(如您之前已经是百度云用户或百度开发者中心用户,此步可略过)。

STEP3:通过控制台左侧导航,选择产品服务-人工智能,进入具体AI服务项的控制面板(如文字识别、人脸识别),进行相关业务操作。

希望能帮到你,谢谢!

摘要图像识别是目前很热门的研究领域,涉及的知识很广,包括信息论、模式识别、模糊数学、图像编码、内容分类等等。本文仅对使用Java实现了一个简单的图像文本二值处理,关于识别并未实现。

步骤

建立文本字符模板二值矩阵

对测试字符进行二值矩阵化处理

代码

/*

* @(#)StdModelRepository.java

*

* This program is free softwareyou can redistribute it and/or modify

* it under the terms of the GNU General Public License as published by

* the Free Software Foundationeither version 3 of the License, or

* (at your option) any later version.

*

* This program is distributed in the hope that it will be useful,

* but WITHOUT ANY WARRANTYwithout even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* GNU Library General Public License for more details.

* You should have received a copy of the GNU General Public License

* along with this programif not, write to the Free Software

* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

*/

package cn.edu.ynu.sei.recognition.utilimport java.awt.Imageimport java.awt.image.BufferedImageimport java.io.Fileimport java.io.IOExceptionimport java.util.ArrayListimport java.util.Listimport java.util.logging.Levelimport java.util.logging.Loggerimport javax.imageio.ImageIO/** * Hold character charImgs as standard model repository.

* @author 88250

* @version 1.0.0.0, Mar 20, 2008

*/

public class StdModelRepository {

/** * hold character images

*/ List charImgs = new ArrayList()

/** * default width of a character

*/ static int width = 16 /** * default height of a character

*/ static int height = 28 /** * standard character model matrix

*/ public int[][][] stdCharMatrix = new int[27][width][height]

/** * Default constructor.

*/ public StdModelRepository() {

BufferedImage lowercase = null try {

lowercase = ImageIO.read(new File("lowercase.png"))

} catch (IOException ex) {

Logger.getLogger(StdModelRepository.class.getName()).

log(Level.SEVERE, null, ex)

}

for (int i = 0 i <26 i++) {

charImgs.add(lowercase.getSubimage(i * width,

0,

width,

height))

}

for (int i = 0 i <charImgs.size()i++) {

Image image = charImgs.get(i)

int[] pixels = ImageUtils.getPixels(image,

image.getWidth(null),

image.getHeight(null))

stdCharMatrix[i] = ImageUtils.getSymbolMatrix(pixels, 0).clone()

ImageUtils.displayMatrix(stdCharMatrix[i])

}

}

}

/*

* @(#)ImageUtils.java

*

* This program is free softwareyou can redistribute it and/or modify

* it under the terms of the GNU General Public License as published by

* the Free Software Foundationeither version 3 of the License, or

* (at your option) any later version.

*

* This program is distributed in the hope that it will be useful,

* but WITHOUT ANY WARRANTYwithout even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* GNU Library General Public License for more details.

* You should have received a copy of the GNU General Public License

* along with this programif not, write to the Free Software

* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

*/

package cn.edu.ynu.sei.recognition.utilimport java.awt.Imageimport java.awt.image.PixelGrabberimport java.util.logging.Levelimport java.util.logging.Logger/** * Mainipulation of image data.

* @author 88250

* @version 1.0.0.3, Mar 20, 2008

*/

public class ImageUtils {

/** * Return all of the pixel values of sepecified <code>image<.>* @param image the sepecified image

* @param width width of the image

* @param height height of the image

* @return */ public static int[] getPixels(Image image, int width, int height) {

int[] pixels = new int[width * height]

try {

new PixelGrabber(image, 0, 0, width, height, pixels, 0, width).grabPixels()

} catch (InterruptedException ex) {

Logger.getLogger(ImageUtils.class.getName()).

log(Level.SEVERE, null, ex)

}

return pixels

}

资源来自:

http://blog.csdn.net/chief1985/article/details/2229572

比如云脉OCR文档识别API接口,开发支持Java、C++、C、 object pascal及objective-C等多种语言,用户在云脉OCR SDK开发者平台上注册并登录即可自主调用..