Windows下pig-0.17啓動時遇到的問題及解決

背景

今天開始學pig,一個對大型數據集處理的更高級的抽象。

學習時,在啓動Pig的時候遇到了一些問題,通過修改pig.cmd,這些問題都已經解決。

問題及解決方法

首先,把HADOOP_HOME、PIG_HOME這些環境變量設置好

hadoop-config.cmd找不到

這是因爲hadoop-config-script路徑指錯了,改成下面的就行,指向hadoop-env.cmd

set HADOOP_SBIN_PATH=%HADOOP_HOME%\etc\hadoop

set hadoop-config-script=%HADOOP_SBIN_PATH%\hadoop-env.cmd

-Xmx 1000不是可運行程序

這是因爲腳本里沒有設置%JAVA%的值,只需要把它指向java.exe就行

  set JAVA=D:\develop\jdk1.7.0_80\bin\java.exe

  call %JAVA% %JAVA_HEAP_MAX% %PIG_OPTS% -classpath %CLASSPATH% org.apache.pig.Main %PIGARGS%

找不到Hadoop

這是因爲%CLASSPATH%裏沒有指向pig\lib\hadoop2-runtime目錄,指向它即可

  if defined PIG_HOME (
    for %%i in (%PIG_HOME%\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
    for %%i in (%PIG_HOME%\lib\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
    for %%i in (%PIG_HOME%\lib\h2\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )

    for %%i in (%PIG_HOME%\lib\hadoop2-runtime\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    ) 

    if not defined PIG_CONF_DIR (
      set PIG_CONF_DIR=%PIG_HOME%\conf
    )
  )

加入第四個for循環即可

core-site.xml沒有找到

同樣,也是讓%CLASSPATH%包含hadoop的core-site.xml,因爲core-site.xml指明瞭pig要連接哪個hadoop站點

set CLASSPATH=%CLASSPATH%;%HADOOP_HOME%\etc\hadoop\

可運行的pig.cmd

這裏,貼一下完整的可運行的pig.cmd

@echo off
:: Licensed to the Apache Software Foundation (ASF) under one or more
:: contributor license agreements.  See the NOTICE file distributed with
:: this work for additional information regarding copyright ownership.
:: The ASF licenses this file to You under the Apache License, Version 2.0
:: (the "License"); you may not use this file except in compliance with
:: the License.  You may obtain a copy of the License at
::
::     http://www.apache.org/licenses/LICENSE-2.0
::
:: Unless required by applicable law or agreed to in writing, software
:: distributed under the License is distributed on an "AS IS" BASIS,
:: WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
:: See the License for the specific language governing permissions and
:: limitations under the License.

:: The Pig command script
::
:: Environment Variables
::
::     JAVA_HOME           The java implementation to use.    Overrides JAVA_HOME.
::
::     PIG_CLASSPATH       Extra Java CLASSPATH entries.
::
::     HADOOP_HOME         Environment HADOOP_HOME
::
::     HADOOP_CONF_DIR     Hadoop conf dir
::
::     PIG_HEAPSIZE        The maximum amount of heap to use, in MB. 
::                                        Default is 1000.
::
::     PIG_OPTS            Extra Java runtime options.
::
::     PIG_CONF_DIR    Alternate conf dir. Default is ${PIG_HOME}/conf.
::
::     HBASE_CONF_DIR - Optionally, the HBase configuration to run against
::                      when using HBaseStorage
::

setlocal enabledelayedexpansion

set HADOOP_BIN_PATH=%HADOOP_HOME%\bin
set HADOOP_SBIN_PATH=%HADOOP_HOME%\etc\hadoop

set hadoop-config-script=%HADOOP_SBIN_PATH%\hadoop-env.cmd
call %hadoop-config-script%

:main
set PIGARGS=
:ProcessCmdLine
	if [%1]==[] goto :FinishArgs  

	if %1==--config (
    set HADOOP_CONF_DIR=%2
    shift 
		shift
    if exist %HADOOP_CONF_DIR%\hadoop-env.cmd (
      call %HADOOP_CONF_DIR%\hadoop-env.cmd
    )
		goto :ProcessCmdLine 
  )
	REM Account for quotes around %1 if needed when checking for -useHCatalog
	REM because the string may come in quoted from WebHCat.
	if %1==-useHCatalog (
        shift
        set HCAT_FLAG="true"
        goto :ProcessCmdLine 
	)
	if %1==^"-useHCatalog^" (
        shift
        set HCAT_FLAG="true"
        goto :ProcessCmdLine
	)
	set PIGARGS=%PIGARGS% %1
    shift
    goto :ProcessCmdLine
:FinishArgs
  if not defined PIG_HOME (
    if exist %HADOOP_HOME%\pig (
      set PIG_HOME=%HADOOP_HOME%\pig
    )
  )

  if not defined PIG_HOME (
    set PIG_HOME=%~dp0\..\
  )

  if not defined PIG_HEAPSIZE (
    set PIG_HEAPSIZE=1000
  )

  if defined PIG_HOME (
    for %%i in (%PIG_HOME%\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
    for %%i in (%PIG_HOME%\lib\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
    for %%i in (%PIG_HOME%\lib\h2\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
    for %%i in (%PIG_HOME%\lib\hadoop2-runtime\*.jar) do (
      set CLASSPATH=!CLASSPATH!;%%i
    )
rem    for %%i in (%PIG_HOME%\lib\spark\*.jar) do (
rem      set CLASSPATH=!CLASSPATH!;%%i
rem    )
    if not defined PIG_CONF_DIR (
      set PIG_CONF_DIR=%PIG_HOME%\conf
    )
  )

  set HCAT_DEPENDCIES=
  set HCAT_CLASSPATH=
  if not defined HCAT_FLAG (
    goto HCAT_END
  )

  REM Try to set HCAT_HOME if not set.  Use of HCATALOG_HOME is deprecated.
  REM Future development should use HCAT_HOME for consistency with non-Windows
  REM environments.
  if not defined HCAT_HOME (
    if defined HCATALOG_HOME (
       set HCAT_HOME=%HCATALOG_HOME%
    ) else (
       echo "Warning: HCAT_HOME not set"
    )
  )
  
  if defined HCAT_HOME (
      call :AddJar %HCAT_HOME%\share\hcatalog *hcatalog-*.jar
  ) else (
      echo "HCAT_HOME should be defined"
      exit /b 1
  )
  echo "before HIVE_HOME"
  if defined HIVE_HOME (
      call :AddJar %HIVE_HOME%\lib hive-metastore-*.jar
      call :AddJar %HIVE_HOME%\lib libthrift-*.jar
      call :AddJar %HIVE_HOME%\lib hive-exec-*.jar
      call :AddJar %HIVE_HOME%\lib libfb303-*.jar
      call :AddJar %HIVE_HOME%\lib jdo*-api-*.jar
      call :AddJar %HIVE_HOME%\lib slf4j-api-*.jar
      call :AddJar %HIVE_HOME%\lib hive-hbase-handler-*.jar
      call :AddJar %HIVE_HOME%\lib httpclient-*.jar

      REM Include datanucleus to support embedded metastore use case via setting
      REM hive.metastore.uris to ''
      call :AddJar %HIVE_HOME%\lib datanucleus-*.jar

      REM Include sqljdbc4.jar to support SQL server or Windows Azure SQLDB as embedded metastore.
      call :AddJar %HIVE_HOME%\lib sqljdbc4.jar

      REM Include derby to support local metastore as embedded metastore.
      call :AddJar %HIVE_HOME%\lib derby*.jar
      echo "HIVE_HOME ok"
  ) else (
      echo "HIVE_HOME should be defined"
      exit /b 1
  )
  set PIG_CLASSPATH=%PIG_CLASSPATH%;%HCAT_CLASSPATH%;%HIVE_HOME%\conf
  set PIG_OPTS=%PIG_OPTS% -Dpig.additional.jars.uris=%HCAT_DEPENDCIES%,%PIG_ADDITIONAL_JARS_COMMA%
:HCAT_END
  
  if defined PIG_CLASSPATH (
    set CLASSPATH=!CLASSPATH!;%PIG_CLASSPATH%
  )

  if defined PIG_CONF_DIR (
    set CLASSPATH=!CLASSPATH!;%PIG_CONF_DIR%
  )

  if defined HADOOP_CONF_DIR (
    set CLASSPATH=!CLASSPATH!;%HADOOP_CONF_DIR%
  )

  if not defined PIG_LOGDIR (
    set PIG_LOGDIR=%HADOOP_LOG_DIR%
  )
  
  if not defined PIG_LOGFILE (
    set PIG_LOGFILE=%PIG_LOGDIR%
  )
  
  if defined PIG_HEAPSIZE (
    set JAVA_HEAP_MAX= -Xmx%PIG_HEAPSIZE%M
  )

  set CLASSPATH=!CLASSPATH!;%JAVA_HOME%\lib\tools.jar

  if defined HBASE_CONF_DIR (
    set CLASSPATH=!CLASSPATH!;%HBASE_CONF_DIR%
  )

  set PIG_OPTS=%PIG_OPTS% -Dpig.log.dir=%PIG_LOGDIR%
  set PIG_OPTS=%PIG_OPTS% -Dpig.logfile=%PIG_LOGFILE%
  set PIG_OPTS=%PIG_OPTS% -Dpig.home.dir=%PIG_HOME%
  set PIG_OPTS=%PIG_OPTS% -Dpig.root.logger=%HADOOP_ROOT_LOGGER%
  set PIG_OPTS=%PIG_OPTS% -Dfile.encoding=UTF-8 
  set PIG_OPTS=%PIG_OPTS% %HADOOP_OPTS%

  set JAVA=D:\develop\jdk1.7.0_80\bin\java.exe

  set CLASSPATH=%CLASSPATH%;%HADOOP_HOME%\etc\hadoop\

  call %JAVA% %JAVA_HEAP_MAX% %PIG_OPTS% -classpath %CLASSPATH% org.apache.pig.Main %PIGARGS%

  exit /b %ERRORLEVEL%
  goto endlocal

  :AddJar
    pushd %1
    for /f %%a IN ('dir /b %2') do (
       set HCAT_CLASSPATH=!HCAT_CLASSPATH!;%1\%%a
       set HCAT_DEPENDCIES=!HCAT_DEPENDCIES!,file:///%1\%%a
    )
    popd
:endlocal

直接替換原來的pig.cmd,然後把java路徑完成自己的即可

結語

官網的也不一定是對的,實踐還是檢驗真理的唯一標準。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章