产品定位:大数据/机器学习平台
面向的用户:开发者/算法工程师
这次主要研究的是深度学习平台。
支持的框架:Tensorflow, Mxnet, Caffe
这次使用https://help.aliyun.com/document_detail/50654.html?spm=a2c4g.11186623.6.591.zC378V 平台提供的代码和数据进行了测试,按照文档可以运行起来。
但是问题是:训练产生的模型放在哪里?
训练生成的模型如何发布?
发布后,如何调用?如果做推理。
继续研究
但是出错了,具体信息如下:(需要找人看一下)
2018-02-27 18:50:47 INFO Current task status:RUNNING
2018-02-27 18:50:47 INFO Start execute shell on node oxs-base-biz-gateway011193082232.nu29.
2018-02-27 18:50:47 INFO Current working dir /home/admin/alisatasknode/taskinfo/20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt
2018-02-27 18:50:47 INFO Full Command ..
2018-02-27 18:50:47 INFO -------------------------
2018-02-27 18:50:47 INFO /opt/taobao/tbdpapp/paiwrapper/paiservice.sh /home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt//910558 1829668957174154 DEV 910558 http://dms.cn-beijing.data.aliyun-inc.com/
2018-02-27 18:50:47 INFO -------------------------
2018-02-27 18:50:47 INFO List of passing environment ..
2018-02-27 18:50:47 INFO -------------------------
2018-02-27 18:50:47 INFO SKYNET_SOURCEID=null:
2018-02-27 18:50:47 INFO SKYNET_ONDUTY=1829668957174154:
2018-02-27 18:50:47 INFO SKYNET_ENVTYPE=1:
2018-02-27 18:50:47 INFO SKYNET_PTYPE=1002:
2018-02-27 18:50:47 INFO IS_NEW_SCHEDULE=true:
2018-02-27 18:50:47 INFO SKYNET_TENANT_ID=198836943440800:
2018-02-27 18:50:47 INFO SKYNET_SOURCENAME=group_198836943440800_dev:
2018-02-27 18:50:47 INFO SKYNET_EXENAME=:
2018-02-27 18:50:47 INFO TASK_WHITE_LIST=:
2018-02-27 18:50:47 INFO SKYNET_CYCTIME=20180227000000:
2018-02-27 18:50:47 INFO SKYNET_PRGNAME=:
2018-02-27 18:50:47 INFO SKYNET_APP_ID=35192:
2018-02-27 18:50:47 INFO SKYNET_SYSTEM_ENV=:
2018-02-27 18:50:47 INFO SKYNET_PARAVALUE=1829668957174154 DEV 910558 http://dms.cn-beijing.data.aliyun-inc.com/:
2018-02-27 18:50:47 INFO SKYNET_TASKID=1605844:
2018-02-27 18:50:47 INFO SKYNET_RERUN_TIME=0:
2018-02-27 18:50:47 INFO SKYNET_NODENAME=TensorFlow(V1.2)-2:
2018-02-27 18:50:47 INFO SKYNET_ACTIONID=1:
2018-02-27 18:50:47 INFO YUNQU_APP_NAME=:
2018-02-27 18:50:47 INFO KILL_SIGNAL=SIGKILL:
2018-02-27 18:50:47 INFO SKYNET_ID=-1:
2018-02-27 18:50:47 INFO SKYNET_FLOW_PARAVALUE=group:adidas:
2018-02-27 18:50:47 INFO SKYNET_PRIORITY=1:
2018-02-27 18:50:47 INFO SKYNET_GMTDATE=:
2018-02-27 18:50:47 INFO SKYNET_ONDUTY_WORKNO=1829668957174154:
2018-02-27 18:50:47 INFO SKYNET_CYCTYPE=0:
2018-02-27 18:50:47 INFO SKYNET_CONNECTION=***************:
2018-02-27 18:50:47 INFO SKYNET_JOBID=193294:
2018-02-27 18:50:47 INFO SKYNET_BIZDATE=20180226:
2018-02-27 18:50:47 INFO ALISA_TASK_ID=T3_0001179129:
2018-02-27 18:50:47 INFO ALISA_TASK_EXEC_TARGET=group_198836943440800_dev:
2018-02-27 18:50:47 INFO ALISA_TASK_PRIORITY=1:
2018-02-27 18:50:47 INFO --- Invoking Shell command line now ---
2018-02-27 18:50:47 INFO =================================================================
LOGBACK: No context given for ch.qos.logback.classic.encoder.PatternLayoutEncoder@77556fd
JobId: 910558-1605844, Worker: null, JCS version: basein, max parallelism: 30
Execution Plan:
____Nodes:
________ #1[odpscmd]
____Dependencies:
[1] start subjob: #1[odpscmd]
[1] Start OdpsCmdHandler:jobId=910558-1605844
[1] local log file = /home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt//T3_0001179129_jcs.log
[1] user accessId :LTAImjOrNBOQ1F6Q
[1] execute command : set biz_id=1829668957174154^alipay^LTAImjOrNBOQ1F6Q^2018-02-27; PAI -name tensorflow_ext121 -project algo_public -DossHost="oss-cn-beijing-internal.aliyuncs.com" -Dbuckets="oss://paitesting.oss-cn-beijing-internal.aliyuncs.com/train.tfrecords/" -DgpuRequired="100" -Darn="acs:ram::1829668957174154:role/aliyunodpspaidefaultrole" -Dscript="oss://paitesting.oss-cn-beijing-internal.aliyuncs.com/tensorflow_mnist.py";
[1] execute endpoint : http://service.cn.maxcompute.aliyun.com/api
[1] OK
[1] ID = 20180227105050631gkspr8jc2
[1] Odps Instance Id = 20180227105050631gkspr8jc2
二月 27, 2018 6:50:51 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies
警告: Cookie rejected [bs_n_lang="en_US", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"
二月 27, 2018 6:50:52 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies
警告: Cookie rejected [ck2="2f8709cf9971ac7d243abf3d39ff1244", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"
[1] Sub Instance ID = 2018022718505347e8f0d8_62c4_496c_bc29_6a3c60d9e1f2
二月 27, 2018 6:50:56 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies
警告: Cookie rejected [bs_n_lang="en_US", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"
二月 27, 2018 6:50:56 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies
警告: Cookie rejected [ck2="776ba43efacf2af856e118ff3d1b44de", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"
[1] train: running
[1] train: 2018-02-27 18:51:02 TensorflowTask_job:0/0/0[0%]
[1] train: 2018-02-27 18:51:08 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:14 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:19 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:25 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:30 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:36 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:41 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:47 TensorflowTask_job:1/0/1[0%]
[1] train: 2018-02-27 18:51:52 TensorflowTask_job:0/0/1[0%]
[1] Instance 20180227105050631gkspr8jc2 Failed.
[1] FAILED: Failed 2018022718505347e8f0d8_62c4_496c_bc29_6a3c60d9e1f2:ODPS-1202005:Algo Job Failed-User Error-Failed to execute system command.(1)
[1] Execute Odpscmd Failed!
[1] ERROR: run subjob: #1[odpscmd] failed!
Run job failed, time taken: 77s
2018-02-27 18:52:05 INFO =================================================================
2018-02-27 18:52:05 INFO Exit code of the Shell command 1
2018-02-27 18:52:05 INFO --- Invocation of Shell command completed ---
2018-02-27 18:52:05 ERROR Shell run failed!
2018-02-27 18:52:05 ERROR Current task status: ERROR
2018-02-27 18:52:05 INFO Cost time is: 77.775s
/home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt/T3_0001179129.log-END-EOF
提交了工单,原来是文档描述错误,需要把代码和数据放置在目录 'oss://bucketname/'下,并且在数据源选择时 选到目录这一级。
又试了一下,运行没有问题了。
那就说明是训练是结束了,但是这个模型应该如何发布呢?或者,我想要测试一下这个模型的效果。