Python使用hdfs存放文件时报Proxy error: 502 Server dropped connection解决方案

Python017

Python使用hdfs存放文件时报Proxy error: 502 Server dropped connection解决方案,第1张

Python3 使用hdfs分布式文件储存系统

from pyhdfs import *

client = HdfsClient(hosts="testhdfs.org, 50070",

user_name="web_crawler")    #    创建一个连接

client.get_home_directory()    # 获取hdfs根路径

client.listdir(PATH)    # 获取hdfs指定路径下的文件列表

client.copy_from_local(file_path, hdfs_path, overwrite=True)    # 把本地文件拷贝到服务器,不支持文件夹;overwrite=True表示存在则覆盖

​client.delete(PATH, recursive=True)    # 删除指定文件

hdfs_path必须包含文件名及其后缀,不然不会成功

如果连接

HdfsClient

报错

Traceback (most recent call last):

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code

    exec(code_obj, self.user_global_ns, self.user_ns)

  File "

    client.get_home_directory()

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 565, in get_home_directory

    return _json(self._get('/', 'GETHOMEDIRECTORY', **kwargs))['Path']

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 391, in _get

    return self._request('get', *args, **kwargs)

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 377, in _request

    _check_response(response, expected_status)

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 799, in _check_response

    remote_exception = _json(response)['RemoteException']

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 793, in _json

    "Expected JSON. Is WebHDFS enabled? Got {!r}".format(response.text))

pyhdfs.HdfsException: Expected JSON. Is WebHDFS enabled? Got '\n\n\n\n

502 Server dropped connection

\n

The following error occurred while trying to access http://%2050070:50070/webhdfs/v1/?user.name=web_crawler&op=GETHOMEDIRECTORY :

\n 502 Server dropped connection

\n

Generated Fri, 21 Dec 2018 02:03:18 GMT by Polipo on .\n\r\n'

则一般是访问认证错误,可能原因是账户密码不正确或者无权限,或者本地网络不在可访问名单中

我在自己的Linux环境下安装了libhdfs3,发现不工作,提示找不到hdfs3这个库

于是按照网上的提示,先尝试用pip来安装解决,但是发现还是无解!

于是我转向anaconda2: https://www.anaconda.com/download/#macos

找到对应的installer安装,总算安装成功

开始安装hdfs3

然后找到对应的安装路径

我的python文件头前加入以下几句话,就可以解决这个问题

你好,你可以利用python3的python3-magic来获得文件的编码格式。下面是对应的代码

import magic

blob = open('unknown-file').read()

m = magic.open(magic.MAGIC_MIME_ENCODING)

m.load()

encoding = m.buffer(blob) # "utf-8" "us-ascii" etc