Python

发布日期: 2019-05-12

阅读次数:

一、Python的环境部署

Python安装、Python的IDE安装本文不再赘述，网上有很多教程

爬虫必备的几个库：Requests、Selenium、lxml、Beatiful Soup

Requests 是基于urllib编写的第三方扩展库，是采用Apache2 Licensed开源协议的HTTP库
Selenium是一个自动化测试工具，利用它我们可以驱动浏览器执行特定的动作，如点击、下拉等操作。对于一些JavaScript渲染的页面来说，这种抓取方式非常有效。
lxml是Python的一个解析库，支持HTML和XML的解析，支持XPath解析方式，而且解析效率非常高
Beatiful Soup是Python的一个HTML或XML的解析库，我们可以用它来方便地从网页中提取数据

二、Python的基础语法

可参考我的《趣学Python——教孩子学编程》系列笔记

《趣学Python——教孩子学编程》学习笔记第1-3章

《趣学Python——教孩子学编程》学习笔记第4-6章

《趣学Python——教孩子学编程》学习笔记第7-8章

《趣学Python——教孩子学编程》学习笔记第9-10章

《趣学Python——教孩子学编程》学习笔记第11-12章

《趣学Python——教孩子学编程》学习笔记第13章

三、Python文件的读取与输出

键盘输入

# 键盘输入（python3将raw_input和input进行了整合，只有input）
str = input("Please enter:")
print("你输入的内容是：", str)

打开文件

# 打开一个文件
fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "wb")
print("文件名：", fo.name)
print("是否已关闭：", fo.closed)
print("访问模式：", fo.mode)

关闭文件

# 关闭一个文件
fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "wb")
fo.close()
print("是否已关闭：", fo.closed)

写入文件内容

# 写入文件内容
fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "r+")
fo.write("www.baidu.com www.cctvjiatao.com")
fo.flush()
fo.close()

读取文件内容

fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "r+")
str = fo.read(11)
print("读取的字符串是：", str)
fo.close()

查找当前位置

# 查找当前位置
fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "r+")
str = fo.read(11)
position = fo.tell()
print("当前读取的位置是：", position)
# result: 当前文件位置： 11
fo.close()

文件指针重定位

# 文件指针重定位
fo = open(r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt", "r+")
str = fo.read(11)
print("读取的字符串1：", str)
# result: 重新读取的字符串1： www.baidu.c
position = fo.tell()
print("当前文件位置：", position)
# result: 当前文件位置： 11
str = fo.read(11)
print("读取的字符串2：", str)
# result: 读取的字符串2： om www.cctv
postion = fo.seek(0, 0)
str = fo.read(11)
print("读取的字符串3：", str)
# result: 读取的字符串3： www.baidu.c
fo.close()

文件重命名

# 文件重命名 filejiatao.txt——>filejiatao2.txt
import os

src_file = r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao.txt"
dst_file = r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao2.txt"
os.rename(src_file, dst_file)

删除文件

# 删除一个文件
import os

dirty_file = r"D:\DataguruPyhton\PythonSpider\lesson2\filejiatao2.txt"
os.remove(dirty_file)

异常处理1

2-1.jpg

# 异常处理1
try:
    fh = open("testfile.txt", "w")
    fh.write("this is my test file for exception handing!")
except IOError:
    print("Eorror:can\'t find file or read data")
else:
    print("witten content in the file successfully")
    fh.close()

异常处理2

# 异常处理2
try:
    fh = open("testfile.txt", "w")
    fh.write("this is my test file for exception handing!")
finally:
    print("Eorror:I don\'t kown why ...")

异常处理3

# 异常处理3
def temp_convert(var):
    try:
        return int(var)
    # except ValueError,Argument:
    #     print("The argument does not contain numbers\n",Argument)
    except (ValueError) as Argument:
        print("The argument does not contain numbers\n", Argument)


temp_convert("xyz")

转载请注明: 苦乐随缘 Python网络爬虫实战之二：环境部署、基础语法、文件操作

Python网络爬虫实战之三：基本工具库urllib和requests

Python网络爬虫实战之三：基本工具库urllib和requests一、urlliburllib简介urllib是Python中一个功能强大用于操作URL，并在爬虫时经常用到的一个基础库，无需额外安装，默认已经安装到python中。 ur

2019-05-13 Python

Python requests urllib

Python网络爬虫实战之一：网络爬虫理论基础

一、浏览网页的基本过程和通信基础当我们在浏览器地址栏输入： http://www.baidu.com 回车后会浏览器显示百度的首页，那这段网络通信过程中到底发生了什么？简单来说这段过程发生了以下四个步骤：浏览器通过DNS服务器查

2019-05-11 Python

Python

一、Python的环境部署

二、Python的基础语法

三、Python文件的读取与输出

你的赏识是我前进的动力