curl 是 linux 世界里很厉害的角色,使用编程语言需要引用各种包,写 N 行代码才能实现的功能,curl 统统一招制敌。
网上介绍 curl 使用的文章很多,对单招简说的不过敏,但更喜欢威力更大的组合招式,会慢慢收集实例,慢慢补充该笔记。
最基本的用法就是获取网页内容:
curl https://ruby-china.org/
头信息
显示头信息及响应体 (-i, --include;Include protocol headers in the output (H/F))
仅显示头信息 (-I, --head; Show document info only)
-
头信息保存至本地 (-D, --dump-header FILE; Write the headers to FILE)
$ curl -I https://ruby-china.org/ HTTP/1.1 200 OK Server: nginx/1.10.0 Date: Sat, 22 Oct 2016 08:03:27 GMT Content-Type: text/html; charset=utf-8 ETag: W/"f996dba53edf3c7b1ea886d4a0dba117"
可以看到服务器响应的状态码 200,初次访问,服务器正常响应;响应内容的类型是 html, 编码类型是 utf-8;用于缓存机制的标记值 ETAG。
$ curl -D header.output https://ruby-china.org/
会打印链接网页内容,同时响应的头信息写入到文件中。
-
带着头信息访问 (-H, --header LINE; Pass custom header LINE to server (H))
资源是有限的,能省则省,具体到用户身上就是省流量、省血汗钱,带着上次访问的缓存标记值再次访问,如果服务器内容未变化,则会响应状态码 304,加载上次响应体即可。浏览器利用该规则改善用户使用体验,访问过的网页再次访问一般都更快。(服务器会直接响应头信息,无需再生成响应体,网络传输更快)
# 服务器响应: ETAG => 客户端回应: IF-None-Match # 服务器响应: Last-Modified => 客户端回应: If-Modified-Since $ curl -H 'IF-None-Match:W/"f996dba53edf3c7b1ea886d4a0dba117"' -I https://ruby-china.org/ HTTP/1.1 304 Not Modified Server: nginx/1.10.0 Date: Sat, 22 Oct 2016 08:07:54 GMT
不仅仅可以省流量,还可以做更多的事,做到服务器一大坨代码不变处理客户端万变,比如模拟的不同的浏览器、请求不同类型的内容、编码:
curl -H "Content-Type=text/html;charset=utf-8" https://ruby-china.org/
写至本地
-
保存至本地 (-o, --output FILE; Write to FILE instead of stdout)
服务器响应体写入指定文件,写入本地过程中会有进度信息实时打印在命令行终端,功能相当于
wget
命令。$ curl -o ruby-china.html -H 'IF-None-Match:W/"f996dba53edf3c7b1ea886d4a0dba117"' https://ruby-china.org/ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 $ du -sh ruby-china.html 0B ruby-china.html
本次访问头信息带了缓存标记值,服务器响应为空,写入本地文件大小为 0 B。
-
安静模式 (-s, --silent; Silent mode (don't output anything))
在某些场景下,比如脚本中格式化打印日志时,由于
curl -o
命令把服务器响应内容写入本地的进度条信息不受格式化控制,只好隐之。$ curl -s -o ruby-china.html https://ruby-china.org/ $ du -sh ruby-china.html 44K ruby-china.html 等同于 $ curl https://ruby-china.org/ > ruby-china.html
本次访问头信息未带缓存标记值,服务器正常响应,写入本地文件大小为 44K。
使用服务器端资源原文件名称保存到本地 (-O, --remote-name; Write output to a file named as the remote file)
-
断点下载 (-C, --continue-at OFFSET; Resumed transfer OFFSET)
$ curl -I http://d.bootcss.com/bootstrap-3.3.0-dist.zip HTTP/1.1 200 OK Server: marco/0.19 Content-Type: application/zip $ $ curl -s -O http://d.bootcss.com/bootstrap-3.3.0-dist.zip $ md5 bootstrap-3.3.0-dist.zip MD5 (bootstrap-3.3.0-dist.zip) = 7e2ba841e15aff2f572649a8e9f9b69c $ $ curl -s -o bootstrap.zip http://d.bootcss.com/bootstrap-3.3.0-dist.zip $ md5 bootstrap.zip MD5 (bootstrap.zip) = 7e2ba841e15aff2f572649a8e9f9b69c $ $ curl -s -O https://ruby-china.org/ curl: Remote file name has no length! curl: try 'curl --help' or 'curl --manual' for more information # 如何查看 remote file name?
网页访问响应的类型(Content-Type)是
text/html
, 属于纯广西范畴,不支持下载操作;而下载 zip 响应的类型是application/zip
, 使用-O
参数下载时无需指定文件名,直接使用服务器上文件的名称,-o
参数则可以自定义下载的文件名,比如不想显示版本显示,通过对比 md5 值可以看出这两个文件是完全一致的。断点下载,但未想到合适的实例,日后再补。
模拟登录
浏览器如何确认用户登录后的访问都是登录状态呢?是通过在登录成功后在浏览器 cookie 中写入了用户标识信息。
登录界面提交后,服务器验证成功后,会在服务器端跳转至话题页面,如何跟踪跳转后的页面?
cookie 写入本地 (-c, --cookie-jar FILE; Write cookies to FILE after operation (H))
-
提交行为 (POST) (-d, --data DATA; HTTP POST data (H))
$ curl -c sign.cookie --data 'user[login]=jay_li@intfocus.com&user[password]=password' https://ruby-china.org/account/sign_in <html><body>You are being <a href="https://ruby-china.org/topics">redirected</a>.</body></html>%
提交用户登录信息,同时把服务器响应的 cookie 写入本地文件,可以看到服务器响应的跳转界面,但这并非最终跳转界面。
-
跟踪跳转 (-L, --location; Follow redirects (H))
$ curl -L -s -o topics.html -c sign.cookie --data 'user[login]=jay_li@intfocus.com&user[password]=password' https://ruby-china.org/account/sign_in
由于登录后的网页内容太大,只好写入本地 topics.html 中,可以通过
grep
cut
两个命令配合把主题标题匹配出来。$ grep '<a title=' topics.html | cut -d '>' -f 2 | cut -d '<' -f 1 | grep -v ^$ | head -n 5 [活动] 推荐你心中的「极客代言人」,打造《中国技术社群英雄谱》 Ruby China 基于 Turbolinks 的 iOS 以及 Android 客户端发布了 阿里云 ubuntu 服务器的 ruby 安装 rbenv 很慢解决办法 The Well-Grounded Rubyist 中文版《Ruby 程序员修炼之道》已经印刷完毕,今天开始发货了。 Rails 为何不允许 resources 中的 new, create 等 map 到自定义的 action? $ $ irb irb(main):001:0> content = IO.read('topics.html') irb(main):002:0> puts content.scan(/<a title="(.*?)" href="\/topics\/.*">(.*?)<\/a>/).find_all { |arr| arr.length == 2 && arr[0] == arr[1] }.map(&:first).first(5).join("\n") [活动] 推荐你心中的「极客代言人」,打造《中国技术社群英雄谱》 Ruby China 基于 Turbolinks 的 iOS 以及 Android 客户端发布了 我从 Vue.js 回到了 jQuery The Well-Grounded Rubyist 中文版《Ruby 程序员修炼之道》已经印刷完毕,今天开始发货了。 阿里云 ubuntu 服务器的 ruby 安装 rbenv 很慢解决办法
-
读取 cookie 文件 (-b, --cookie STRING/FILE; Read cookies from STRING/FILE (H))
$ curl -b sign.cookie -s -o topics-visit-with-cookie.html https://ruby-china.org/topics $ diff topics.html topics-visit-with-cookie.html 27c27 < <meta name="csrf-token" content="okCvEjZQeO8BgrMLbrJtXQCiFzKBo5SJMKNMuwVYh6T7QCBoOueGetM44OczE3/k1Chyl9jqcppAykGiVWs6/Q==" /> --- > <meta name="csrf-token" content="YslFiXwVKNoU+uiThlFxl8+voYcNj7lKMz0oz0rGDgI7ycrzcKLWT8ZAu3/b8GMuGyXEIlTGX1lDVCXWGvWzWw==" /> 816c816 < redirect_to != return --- > 如果你不是特别对他的每一句话感兴趣,不要随意用“关注”人的功能,因为关注以后,他的所有发帖回帖都会以通知的方式提醒你的!
通过携带登录成功后服务器响应的 cookie 是可以访问话题页面链接的,通过对比登录后跳转后网页内容只有一些微小差异。
-
模拟浏览器 (-A, --user-agent STRING; Send User-Agent STRING to server (H))
有些网站不同浏览器(终端)访问响应不同的内容,比如百度,电脑访问就显示一个搜索输入框,而手机访问会显示新闻。可以通过下面这个命令来验证:
# Chrome console # console.log(navigator.userAgent) $ curl -o mobile-visit-baidu.html -A 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1' https://www.baidu.com/ $ curl -o pc-visit-baidu.html -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.59 Safari/537.36' https://www.baidu.com/
对两个文件的内容,可以很明显的可以看到不同浏览器访问百度首页,响应的内容不同的。
性能监测
-
(-w, --write-out FORMAT; Use output FORMAT after completion)
$ curl -o /dev/null -s -w "time_connect: %{time_connect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" https://ruby-china.org/ time_connect: 0.279 time_starttransfer: 0.475 time_total: 0.490 $ curl -b sign.cookie -o /dev/null -s -w "time_connect: %{time_connect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" https://ruby-china.org/ time_connect: 0.016 time_starttransfer: 0.217 time_total: 0.232
Help
$ curl --version
curl 7.43.0 (x86_64-apple-darwin14.0) libcurl/7.43.0 SecureTransport zlib/1.2.5
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
$ curl -h
Usage: curl [options...] <url>
Options: (H) means HTTP/HTTPS only, (F) means FTP only
--anyauth Pick "any" authentication method (H)
-a, --append Append to target file when uploading (F/SFTP)
--basic Use HTTP Basic Authentication (H)
--cacert FILE CA certificate to verify peer against (SSL)
--capath DIR CA directory to verify peer against (SSL)
-E, --cert CERT[:PASSWD] Client certificate file and password (SSL)
--cert-status Verify the status of the server certificate (SSL)
--cert-type TYPE Certificate file type (DER/PEM/ENG) (SSL)
--ciphers LIST SSL ciphers to use (SSL)
--compressed Request compressed response (using deflate or gzip)
-K, --config FILE Read config from FILE
--connect-timeout SECONDS Maximum time allowed for connection
-C, --continue-at OFFSET Resumed transfer OFFSET
-b, --cookie STRING/FILE Read cookies from STRING/FILE (H)
-c, --cookie-jar FILE Write cookies to FILE after operation (H)
--create-dirs Create necessary local directory hierarchy
--crlf Convert LF to CRLF in upload
--crlfile FILE Get a CRL list in PEM format from the given file
-d, --data DATA HTTP POST data (H)
--data-raw DATA HTTP POST data, '@' allowed (H)
--data-ascii DATA HTTP POST ASCII data (H)
--data-binary DATA HTTP POST binary data (H)
--data-urlencode DATA HTTP POST data url encoded (H)
--delegation STRING GSS-API delegation permission
--digest Use HTTP Digest Authentication (H)
--disable-eprt Inhibit using EPRT or LPRT (F)
--disable-epsv Inhibit using EPSV (F)
--dns-servers DNS server addrs to use: 1.1.1.1;2.2.2.2
--dns-interface Interface to use for DNS requests
--dns-ipv4-addr IPv4 address to use for DNS requests, dot notation
--dns-ipv6-addr IPv6 address to use for DNS requests, dot notation
-D, --dump-header FILE Write the headers to FILE
--egd-file FILE EGD socket path for random data (SSL)
--engine ENGINE Crypto engine (use "--engine list" for list) (SSL)
-f, --fail Fail silently (no output at all) on HTTP errors (H)
--false-start Enable TLS False Start.
-F, --form CONTENT Specify HTTP multipart POST data (H)
--form-string STRING Specify HTTP multipart POST data (H)
--ftp-account DATA Account data string (F)
--ftp-alternative-to-user COMMAND String to replace "USER [name]" (F)
--ftp-create-dirs Create the remote dirs if not present (F)
--ftp-method [MULTICWD/NOCWD/SINGLECWD] Control CWD usage (F)
--ftp-pasv Use PASV/EPSV instead of PORT (F)
-P, --ftp-port ADR Use PORT with given address instead of PASV (F)
--ftp-skip-pasv-ip Skip the IP address for PASV (F)
--ftp-pret Send PRET before PASV (for drftpd) (F)
--ftp-ssl-ccc Send CCC after authenticating (F)
--ftp-ssl-ccc-mode ACTIVE/PASSIVE Set CCC mode (F)
--ftp-ssl-control Require SSL/TLS for FTP login, clear for transfer (F)
-G, --get Send the -d data with a HTTP GET (H)
-g, --globoff Disable URL sequences and ranges using {} and []
-H, --header LINE Pass custom header LINE to server (H)
-I, --head Show document info only
-h, --help This help text
--hostpubmd5 MD5 Hex-encoded MD5 string of the host public key. (SSH)
-0, --http1.0 Use HTTP 1.0 (H)
--http1.1 Use HTTP 1.1 (H)
--http2 Use HTTP 2 (H)
--ignore-content-length Ignore the HTTP Content-Length header
-i, --include Include protocol headers in the output (H/F)
-k, --insecure Allow connections to SSL sites without certs (H)
--interface INTERFACE Use network INTERFACE (or address)
-4, --ipv4 Resolve name to IPv4 address
-6, --ipv6 Resolve name to IPv6 address
-j, --junk-session-cookies Ignore session cookies read from file (H)
--keepalive-time SECONDS Wait SECONDS between keepalive probes
--key KEY Private key file name (SSL/SSH)
--key-type TYPE Private key file type (DER/PEM/ENG) (SSL)
--krb LEVEL Enable Kerberos with security LEVEL (F)
--libcurl FILE Dump libcurl equivalent code of this command line
--limit-rate RATE Limit transfer speed to RATE
-l, --list-only List only mode (F/POP3)
--local-port RANGE Force use of RANGE for local port numbers
-L, --location Follow redirects (H)
--location-trusted Like '--location', and send auth to other hosts (H)
--login-options OPTIONS Server login options (IMAP, POP3, SMTP)
-M, --manual Display the full manual
--mail-from FROM Mail from this address (SMTP)
--mail-rcpt TO Mail to this/these addresses (SMTP)
--mail-auth AUTH Originator address of the original email (SMTP)
--max-filesize BYTES Maximum file size to download (H/F)
--max-redirs NUM Maximum number of redirects allowed (H)
-m, --max-time SECONDS Maximum time allowed for the transfer
--metalink Process given URLs as metalink XML file
--negotiate Use HTTP Negotiate (SPNEGO) authentication (H)
-n, --netrc Must read .netrc for user name and password
--netrc-optional Use either .netrc or URL; overrides -n
--netrc-file FILE Specify FILE for netrc
-:, --next Allows the following URL to use a separate set of options
--no-alpn Disable the ALPN TLS extension (H)
-N, --no-buffer Disable buffering of the output stream
--no-keepalive Disable keepalive use on the connection
--no-npn Disable the NPN TLS extension (H)
--no-sessionid Disable SSL session-ID reusing (SSL)
--noproxy List of hosts which do not use proxy
--ntlm Use HTTP NTLM authentication (H)
--oauth2-bearer TOKEN OAuth 2 Bearer Token (IMAP, POP3, SMTP)
-o, --output FILE Write to FILE instead of stdout
--pass PASS Pass phrase for the private key (SSL/SSH)
--path-as-is Do not squash .. sequences in URL path
--pinnedpubkey FILE Public key (PEM/DER) to verify peer against (OpenSSL/GnuTLS/NSS/wolfSSL/CyaSSL/GSKit only)
--post301 Do not switch to GET after following a 301 redirect (H)
--post302 Do not switch to GET after following a 302 redirect (H)
--post303 Do not switch to GET after following a 303 redirect (H)
-#, --progress-bar Display transfer progress as a progress bar
--proto PROTOCOLS Enable/disable PROTOCOLS
--proto-redir PROTOCOLS Enable/disable PROTOCOLS on redirect
-x, --proxy [PROTOCOL://]HOST[:PORT] Use proxy on given port
--proxy-anyauth Pick "any" proxy authentication method (H)
--proxy-basic Use Basic authentication on the proxy (H)
--proxy-digest Use Digest authentication on the proxy (H)
--proxy-negotiate Use HTTP Negotiate (SPNEGO) authentication on the proxy (H)
--proxy-ntlm Use NTLM authentication on the proxy (H)
--proxy-service-name NAME SPNEGO proxy service name
--service-name NAME SPNEGO service name
-U, --proxy-user USER[:PASSWORD] Proxy user and password
--proxy1.0 HOST[:PORT] Use HTTP/1.0 proxy on given port
-p, --proxytunnel Operate through a HTTP proxy tunnel (using CONNECT)
--pubkey KEY Public key file name (SSH)
-Q, --quote CMD Send command(s) to server before transfer (F/SFTP)
--random-file FILE File for reading random data from (SSL)
-r, --range RANGE Retrieve only the bytes within RANGE
--raw Do HTTP "raw"; no transfer decoding (H)
-e, --referer Referer URL (H)
-J, --remote-header-name Use the header-provided filename (H)
-O, --remote-name Write output to a file named as the remote file
--remote-name-all Use the remote file name for all URLs
-R, --remote-time Set the remote file's time on the local output
-X, --request COMMAND Specify request command to use
--resolve HOST:PORT:ADDRESS Force resolve of HOST:PORT to ADDRESS
--retry NUM Retry request NUM times if transient problems occur
--retry-delay SECONDS Wait SECONDS between retries
--retry-max-time SECONDS Retry only within this period
--sasl-ir Enable initial response in SASL authentication
-S, --show-error Show error. With -s, make curl show errors when they occur
-s, --silent Silent mode (don't output anything)
--socks4 HOST[:PORT] SOCKS4 proxy on given host + port
--socks4a HOST[:PORT] SOCKS4a proxy on given host + port
--socks5 HOST[:PORT] SOCKS5 proxy on given host + port
--socks5-hostname HOST[:PORT] SOCKS5 proxy, pass host name to proxy
--socks5-gssapi-service NAME SOCKS5 proxy service name for GSS-API
--socks5-gssapi-nec Compatibility with NEC SOCKS5 server
-Y, --speed-limit RATE Stop transfers below RATE for 'speed-time' secs
-y, --speed-time SECONDS Trigger 'speed-limit' abort after SECONDS (default: 30)
--ssl Try SSL/TLS (FTP, IMAP, POP3, SMTP)
--ssl-reqd Require SSL/TLS (FTP, IMAP, POP3, SMTP)
-2, --sslv2 Use SSLv2 (SSL)
-3, --sslv3 Use SSLv3 (SSL)
--ssl-allow-beast Allow security flaw to improve interop (SSL)
--stderr FILE Where to redirect stderr (use "-" for stdout)
--tcp-nodelay Use the TCP_NODELAY option
-t, --telnet-option OPT=VAL Set telnet option
--tftp-blksize VALUE Set TFTP BLKSIZE option (must be >512)
-z, --time-cond TIME Transfer based on a time condition
-1, --tlsv1 Use => TLSv1 (SSL)
--tlsv1.0 Use TLSv1.0 (SSL)
--tlsv1.1 Use TLSv1.1 (SSL)
--tlsv1.2 Use TLSv1.2 (SSL)
--trace FILE Write a debug trace to FILE
--trace-ascii FILE Like --trace, but without hex output
--trace-time Add time stamps to trace/verbose output
--tr-encoding Request compressed transfer encoding (H)
-T, --upload-file FILE Transfer FILE to destination
--url URL URL to work with
-B, --use-ascii Use ASCII/text transfer
-u, --user USER[:PASSWORD] Server user and password
--tlsuser USER TLS username
--tlspassword STRING TLS password
--tlsauthtype STRING TLS authentication type (default: SRP)
--unix-socket FILE Connect through this Unix domain socket
-A, --user-agent STRING Send User-Agent STRING to server (H)
-v, --verbose Make the operation more talkative
-V, --version Show version number and quit
-w, --write-out FORMAT Use output FORMAT after completion
--xattr Store metadata in extended file attributes
url_effective: 最终获取的 url 地址,尤其是当你指定给 curl 的地址存在 301 跳转,且通过 - L 继续追踪的情形。
http_code: http 状态码,如 200 成功, 301 转向, 404 未找到, 500 服务器错误等。(The numerical response code that was found in the last retrieved HTTP(S) or FTP(s) transfer. In 7.18.2 the alias response_code was added to show the same info.)
http_connect: The numerical code that was found in the last response (from a proxy) to a curl CONNECT request. (Added in 7.12.4)
time_total: 总时间,按秒计。精确到小数点后三位。 (The total time, in seconds, that the full operation lasted. The time will be displayed with millisecond resolution.)
time_namelookup: DNS 解析时间, 从请求开始到 DNS 解析完毕所用时间。(The time, in seconds, it took from the start until the name resolving was completed.)
time_connect: 连接时间, 从开始到建立 TCP 连接完成所用时间, 包括前边 DNS 解析时间,如果需要单纯的得到连接时间,用这个 time_connect 时间减去前边 time_namelookup 时间。以下同理,不再赘述。(The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.)
time_appconnect: 连接建立完成时间,如 SSL/SSH 等建立连接或者完成三次握手时间。(The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0))
time_pretransfer: 从开始到准备传输的时间。(The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.)
time_redirect: 重定向时间,包括到最后一次传输前的几次重定向的 DNS 解析,连接,预传输,传输时间。(The time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3))
time_starttransfer: 开始传输时间。在发出请求之后,Web 服务器返回数据的第一个字节所用的时间 (The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.)
size_download: 下载大小。(The total amount of bytes that were downloaded.)
size_upload: 上传大小。(The total amount of bytes that were uploaded.)
size_header: 下载的 header 的大小 (The total amount of bytes of the downloaded headers.)
size_request: 请求的大小。(The total amount of bytes that were sent in the HTTP request.)
speed_download: 下载速度,单位 - 字节每秒。(The average download speed that curl measured for the complete download. Bytes per second.)
speed_upload: 上传速度, 单位 - 字节每秒。(The average upload speed that curl measured for the complete upload. Bytes per second.)
content_type: 就是 content-Type,不用多说了,这是一个访问我博客首页返回的结果示例 (text/html; charset=UTF-8);(The Content-Type of the requested document, if there was any.)
num_connects: Number of new connects made in the recent transfer. (Added in 7.12.3)
num_redirects: Number of redirects that were followed in the request. (Added in 7.12.3)
redirect_url: When a HTTP request was made without -L to follow redirects, this variable will show the actual URL a redirect would take you to. (Added in 7.18.2)
ftp_entry_path: The initial path libcurl ended up in when logging on to the remote FTP server. (Added in 7.15.4)
ssl_verify_result: ssl 认证结果,返回 0 表示认证成功。(The result of the SSL peer certificate verification that was requested. 0 means the verification was successful. (Added in 7.19.0))
-q Disable .curlrc (must be first parameter)