1、问题描述
23/04/20 16:18:21 INFO manager.SqlManager: Executing SQL statement: SELECT * FROM JKDA.HEALTHYCHECKUP WHERE rownum<=10 AND (1 = 0)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.reflect.Array.newArray(Native Method)
at java.lang.reflect.Array.newInstance(Array.java:75)
at oracle.jdbc.driver.BufferCache.get(BufferCache.java:226)
at oracle.jdbc.driver.PhysicalConnection.getCharBuffer(PhysicalConnection.java:7422)
at oracle.jdbc.driver.OracleStatement.prepareAccessors(OracleStatement.java:983)
at oracle.jdbc.driver.T4CTTIdcb.receiveCommon(T4CTTIdcb.java:273)
at oracle.jdbc.driver.T4CTTIdcb.receive(T4CTTIdcb.java:144)
at oracle.jdbc.driver.T4C8Oall.readDCB(T4C8Oall.java:771)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:346)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:861)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3449)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3493)
1.2 脚本参考
hive -e "
alter table ${lake_database}.${lake_table} drop partition(day='${biz_date}');
"
sqoop import \
-Doracle.sessionTimeZone=Asia/Shanghai \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.map.cpu.vcores=4 \
-Dmapreduce.map.memory.mb=3096 \
-Dmapred.child.java.opts=-Xmx3096m \
--null-string '\\N' \
--null-non-string '\\N' \
--connect jdbc:oracle:thin:@//xxxxx.xx.xxx.xx:1521/orcl \
--query "SELECT ${oracle_columns} FROM JKDA.UP WHERE rownum<=10 AND \$CONDITIONS" \
--split-by '${split-by}' \
--username XXX\
--password XXXXX\
--hive-overwrite \
--hcatalog-database '${lake_database}' \
--hcatalog-table '${lake_table}' \
--hcatalog-partition-keys day \
--hcatalog-partition-values '${biz_date}' \
--hcatalog-storage-stanza 'stored as orc' \
-m 2
2、排查思路
一开始以为是真正意义上的 OOM,尝试着调整参数,及加大 -m map数量,但最终都无济于事。
后来想着降低数据量,通过 rownum 查询10条数据导入到hive,java heap space 错误依然存在。
通过这个查询文件到相应错误及现象。
通过猜想是不是应为此表字段数过多(304个字段),尝试着修改,成功。
3、解决思路
- 将大宽表字段数控制到 300以内即可解决此错误。
- select 尽量少用 * ,不要怕麻烦,写出需要的字段名称。