GitHub中文网 官网 开源项目维护协作指南 GitHub中文网 官网 开源项目维护协作指南
首页
基础内容
GitHub-Actions
GitHub-Tips
首页
基础内容
GitHub-Actions
GitHub-Tips
  • 自动为README添加TOC目录
  • 自动将项目贡献者列表添加到README中
  • 自动优雅地为项目构建Releases
  • 自动获取博客rss文章
  • 自动构建兼容多CPU架构的docker镜像并发布到DockerHub
  • 自动执行代码扫描预检查等工作
  • 自动生成GitHub的Fans
  • 自动生成个人star列表并归类
  • 自动获取项目当前star与fork写到Description
  • 自动对仓库内图片进行无损压缩
  • 自动检测项目中的问题链接
    • 前言
    • 配置
    • PR自动检查
    • 效果
  • github-actions
lenix
2022-08-08
目录

自动检测项目中的问题链接

# 前言

我维护的开源项目 Thanks-Mirror (opens new window) 整理记录了各个包管理器,系统镜像,以及常用软件的好用镜像,随着项目越来越完善,到今天,已经累计整理链接 1091 个,随着时间推移,一些国内镜像可能会停止维护,如何自定感知那些已经失效的链接,就是一个需要考虑的事情了。

本文就介绍一个有意思的小动作,它的主要功能是可以自动扫描仓库内的链接,然后对链接进行请求,根据自定义的规则,自动抛出异常的链接,然后将这些链接创建到issue当中。

# 配置

所用Actions:lycheeverse/lychee-action

使用配置其实非常简单,基本上阅读完官方介绍文档就可以上手使用了,不过官方文档介绍的方式并不是很灵活,官方是借助其开源的项目:lychee (opens new window)来完成检查,本文将针对这个开源项目拓展的配置文件,来实现更加丰富的能力。

首先添加Actions配置文件,e.g. .github/workflows/links-check.yml:

name: 🔗 检查链接
on:
  repository_dispatch:
  push:
    branches:
      - main
  workflow_dispatch:
  schedule:
    - cron: "00 18 * * *"
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Link Checker
        id: lychee
        uses: lycheeverse/lychee-action@v1.5.1
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
        with:
            # Check all markdown and html files in repo (default)
            args: --config ./.github/config/lychee.toml README.md
            # Use json as output format (instead of markdown)
            format: markdown
            # Use different output file path
            output: ./lychee/out.md
      - name: Create Issue From File
        if: steps.lychee.outputs.exit_code != 0
        uses: peter-evans/create-issue-from-file@v3
        with:
          title: 🔗 链接检查报告
          content-filepath: ./lychee/out.md
          labels: report, automated issue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

简单介绍这个动作:当有内容提交,以及每天18点会自动运行(当然也可以手动运行),自动检测 README.md文件中的所有链接,使用配置文件 ./.github/config/lychee.toml,结果输出到 ./lychee/out.md,输出格式为Markdown,如果全部检查通过,则不会有任何动作,如果检查失败,则会自动创建issue。

上边内容提到了 .github/config/lychee.toml,这里列出我使用的配置文件:

#############################  Display  #############################

# Verbose program output
verbose = true

# Show progress
progress = true

# Path to summary output file.
# output = "report.md"

#############################  Cache  ###############################

# Enable link caching. This can be helpful to avoid checking the same links on
# multiple runs.
cache = true

#############################  Runtime  #############################

# Number of threads to utilize.
# Defaults to number of cores available to the system if omitted.
threads = 6

# Maximum number of allowed redirects [default: 10]
max_redirects = 10

# Maximum number of concurrent network requests [default: 128]
max_concurrency = 30

#############################  Requests  ############################

# User agent to send with each request
user_agent = "curl/7.83.1"

# Website timeout from connect to response finished
timeout = 10

# Minimum wait time in seconds between retries of failed requests.
retry_wait_time = 2

# Comma-separated list of accepted status codes for valid links.
# Omit to accept all response types.
#accept = "text/html"

# Proceed for server connections considered insecure (invalid TLS)
insecure = true

# Comma-separated list of accepted status codes for valid links.
# Don't work as of yet until https://github.com/lycheeverse/lychee/issues/644
# is resolved
accept = [200,204,301,429,403]

# Only test links with the given scheme (e.g. https)
# Omit to check links with any scheme
#scheme = "https"

# Request method
method = "get"

# Custom request headers
headers = []

#############################  Exclusions  ##########################

# Exclude URLs from checking (supports regex)

# balena base images account for ~1400 request to GitHub, they are
# omitted to avoid being rate limited.
# See https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting
# The openvpn link is omitted as trying to auto chek it results in
# a 503, even when it is available.
# The meta-balena link is included in parameterized scripts and as
# a result will always produce a failing link.
# The myorg/myapp link is a dummy address used in an example contract so is omitted.
# The balena/resin API urls will not respond to unauthenticated requests
# The gstatic and googleapis links go 404 and are excluded ever since we started checking HTML
# balenaCLI linux binary URLs always error out since they are generated on run time only
# File URLs are excluded as they aren't checked properly and error out
exclude = [
    "developer.aliyun.com/*",
    "mirrors.ustc.edu.cn/*",
    "eryajf.net/*",
    "rsproxy.cn/*",
    "https://mirrors.cloud.tencent.com/go/",
    "http://maven.aliyun.com/nexus/content/groups/public/",
    "https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git",
    "https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git",
]

# Exclude URLs contained in a file from checking
exclude_file = []

include = []

include_verbatim = true

# Exclude all private IPs from checking
# Equivalent to setting `exclude_private`, `exclude_link_local`, and `exclude_loopback` to true
exclude_all_private = true

# # Exclude private IP address ranges from checking
# exclude_private = false

# # Exclude link-local IP address range from checking
# exclude_link_local = false

# # Exclude loopback IP address range and localhost from checking
# exclude_loopback = false

# Exclude all mail addresses from checking
exclude_mail = true
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

其中大部分内容都通用,可能需要调整的两个内容是:accept与 exclude,一开始我检查的时候,发现所有 developer.aliyun.com在GitHub Actions中访问都是网络失败,猜测应该是ali限制了外部访问,这也能理解,因此就把整个域名全部加到排除的行列了。

总之检查结果需要自己进行一些过滤分析,然后再结合配置文件的含义进行调整。

# PR自动检查

如上action并没有对PR进行检查,你还可以再添加一个动作,专门用于检测PR提交上来的链接:

$ cat link-check-pr.yml

name: Links (Fail Fast)
on:
  pull_request:
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Link Checker
        uses: lycheeverse/lychee-action@v1.5.1
        with:
          # Check all markdown and html files in repo (default)
            args: --config ./.github/config/lychee.toml README.md
            # Use json as output format (instead of markdown)
            format: markdown
            # Use different output file path
            output: ./lychee/out.md
            # Fail action on broken links
            fail: true
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

这样当pr时有异常的链接,将会检测失败,以前置预检一些可能是坏的链接合并到项目。

# 效果

检测通过之后的效果如下:

image_20220808_154825

自动对仓库内图片进行无损压缩

← 自动对仓库内图片进行无损压缩

Theme by Vdoing | Copyright © 2022-2022 github中文网 | github中文网
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式