前言
在加速百度搜索引擎收录站点方面,百度站长目前提供自动提交链接和手动提交链接两种方式,其中自动提交又分为主动推送、自动推送和 sitemap
三种形式。按百度的说法,主动推送的效果最好,百度站长平台后台提供了 Curl
、PHP
、Ruby
的推送示例代码,但唯独没有提供 Python
示例代码。本文会给出现成的 Python
版本主动推送代码,系统环境依赖 Linux
,软件环境依赖 Python3
、Curl
。
Python3 代码
以下代码会读取特定域名下的 sitemap
站点地图文件,然后通过 Curl
命令将站点地图文件中合法 (结尾为 .html
)的 URL 批量提交给百度站长平台,请自行替换代码中的 domain
、token
、site_map_url
变量值。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
|
import re import logging import subprocess from io import StringIO from urllib import request
domain = 'www.example.com'
token = 'xxxxxxxxxxxxxxxxx'
site_map_url = 'https://www.example.com/sitemap.xml'
push_max_lines = 1000
push_urls_file = "/tmp/baidu_zhanzhang_push_url.txt"
push_url = 'http://data.zz.baidu.com/urls?site={domain}&token={token}'.format(domain=domain, token=token)
log_file = "/tmp/baidu/baidu_zhanzhang_push.log"
def regexpMatchUrl(content): pattern = re.findall(r'(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?', content, re.IGNORECASE) if pattern: return True else: return False
def regexpMatchWebSite(content): pattern = re.findall(r''.join(domain), content, re.IGNORECASE) if pattern: return True else: return False
def getUrl(content): pattern = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+.html', content, re.IGNORECASE) if pattern: return pattern[0] else: return ''
def createUrlFile(url_file_path, max_lines): content = request.urlopen(site_map_url).read().decode('utf8') website_map_file = StringIO(content) url_file = open(url_file_path, 'w') index = 0 for line in website_map_file: if(regexpMatchUrl(line) and regexpMatchWebSite(line)): url = getUrl(line) if(url != ''): index = index + 1 url_file.writelines(url + "\n") if(index >= max_lines): break url_file.close() website_map_file.close()
def pushUrlFile(url, url_file_path, log_file): shell_cmd_line = "curl -H 'Content-Type:text/plain' --data-binary @" + url_file_path + " " + '\"' + url + '\"' (status, output) = subprocess.getstatusoutput(shell_cmd_line) logging.info(output + "\n")
if __name__ == "__main__": logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', filename=log_file) createUrlFile(push_urls_file, push_max_lines) pushUrlFile(push_url, push_urls_file, log_file)
|
Crontab 定时任务
Linux 系统环境下,配合 Python 脚本 + Crontab 定时任务,即可定时主动提交链接到百度站长平台。
1 2
| 0 */2 * * * /usr/bin/python3 /usr/local/baidu-push/baidu_zhanzhang_push.py
|
脚本输出的日志信息
1 2 3 4 5 6 7 8 9
| $ cat /tmp/baidu/baidu_zhanzhang_push.log
2019-02-18 23:15:20,985 - www - INFO - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 6092 100 30 100 6062 82 16671 --:--:-- --:--:-- --:--:-- 16653 {"remain":98069,"success":138}
|
Docker 一键部署推送服务
Dockerfile
的内容如下,构建生成 Docker 镜像后,使用命令直接启动 Docker 镜像即可。- 使用命令直接启动 Docker 镜像时,需要通过
-v
参数将宿主机的 Python 脚本文件挂载到 Docker 容器内的 /usr/local/python_scripts/baidu_zhanzhang_push.py
位置。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| from augurproject/python2-and-3
MAINTAINER clay<656418510@qq.com>
RUN mkdir -p /tmp/baidu
RUN touch /var/log/cron.log
RUN mkdir -p /usr/local/python_scripts
ENV workpath /usr/local/python_scripts
WORKDIR $workpath
RUN echo "Asia/Shanghai" > /etc/timezone RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN cp /etc/apt/sources.list /etc/apt/backup.sources.list RUN echo "deb http://mirrors.163.com/debian/ stretch main non-free contrib" > /etc/apt/sources.list RUN echo "deb http://mirrors.163.com/debian/ stretch-updates main non-free contrib" >> /etc/apt/sources.list RUN echo "deb http://mirrors.163.com/debian/ stretch-backports main non-free contrib" >> /etc/apt/sources.list RUN echo "deb-src http://mirrors.163.com/debian/ stretch main non-free contrib" >> /etc/apt/sources.list RUN echo "deb-src http://mirrors.163.com/debian/ stretch-updates main non-free contrib" >> /etc/apt/sources.list RUN echo "deb-src http://mirrors.163.com/debian/ stretch-backports main non-free contrib" >> /etc/apt/sources.list RUN echo "deb http://mirrors.163.com/debian-security/ stretch/updates main non-free contrib" >> /etc/apt/sources.list RUN echo "deb-src http://mirrors.163.com/debian-security/ stretch/updates main non-free contrib" >> /etc/apt/sources.list
RUN apt-get -y update && apt-get -y upgrade RUN apt-get -y install python-rsa python-requests cron rsyslog vim htop net-tools telnet apt-utils tree wget curl git make gcc RUN apt-get -y autoclean && apt-get -y autoremove
RUN sed -i "s/#cron./cron./g" /etc/rsyslog.conf
RUN echo "0 */2 * * * root /usr/bin/python3 /usr/local/python_scripts/baidu_zhanzhang_push.py" >> /etc/crontab
CMD service rsyslog start && service cron start && tail -f -n 20 /var/log/cron.log
|
若通过 Docker-Compose
来管理 Docker 镜像,那么 YML 配置文件的内容如下:
1 2 3 4 5 6 7 8 9 10 11 12
| version: '3.5'
services: baidu-push: image: clay/baidu-push:1.0 container_name: hexo-baidu-push restart: always environment: TZ: 'Asia/Shanghai' volumes: - /usr/local/baidu-push/logs:/tmp/baidu - /usr/local/baidu-push/baidu_zhanzhang_push.py:/usr/local/python_scripts/baidu_zhanzhang_push.py
|
数据卷挂载:
/usr/local/baidu-push/logs
:宿主机里的日志目录/usr/local/baidu-push/baidu_zhanzhang_push.py
:宿主机里 Python 脚本文件的路径