from pyhdfs import *
client = HdfsClient(hosts="testhdfs.org, 50070",
user_name="web_crawler") # 创建一个连接
client.get_home_directory() # 获取hdfs根路径
client.listdir(PATH) # 获取hdfs指定路径下的文件列表
client.copy_from_local(file_path, hdfs_path, overwrite=True) # 把本地文件拷贝到服务器,不支持文件夹;overwrite=True表示存在则覆盖
client.delete(PATH, recursive=True) # 删除指定文件
hdfs_path必须包含文件名及其后缀,不然不会成功
如果连接
HdfsClient
报错
Traceback (most recent call last):
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
client.get_home_directory()
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 565, in get_home_directory
return _json(self._get('/', 'GETHOMEDIRECTORY', **kwargs))['Path']
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 391, in _get
return self._request('get', *args, **kwargs)
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 377, in _request
_check_response(response, expected_status)
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 799, in _check_response
remote_exception = _json(response)['RemoteException']
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 793, in _json
"Expected JSON. Is WebHDFS enabled? Got {!r}".format(response.text))
pyhdfs.HdfsException: Expected JSON. Is WebHDFS enabled? Got '\n\n\n\n
502 Server dropped connection
\n
The following error occurred while trying to access http://%2050070:50070/webhdfs/v1/?user.name=web_crawler&op=GETHOMEDIRECTORY :
\n 502 Server dropped connection
\n
Generated Fri, 21 Dec 2018 02:03:18 GMT by Polipo on .\n\r\n'
则一般是访问认证错误,可能原因是账户密码不正确或者无权限,或者本地网络不在可访问名单中
我在自己的Linux环境下安装了libhdfs3,发现不工作,提示找不到hdfs3这个库于是按照网上的提示,先尝试用pip来安装解决,但是发现还是无解!
于是我转向anaconda2: https://www.anaconda.com/download/#macos
找到对应的installer安装,总算安装成功
开始安装hdfs3
然后找到对应的安装路径
在我的python文件头前加入以下几句话,就可以解决这个问题
你好,你可以利用python3的python3-magic来获得文件的编码格式。下面是对应的代码import magic
blob = open('unknown-file').read()
m = magic.open(magic.MAGIC_MIME_ENCODING)
m.load()
encoding = m.buffer(blob) # "utf-8" "us-ascii" etc