Timothe Litt
2018-08-17 11:11:38 UTC
Message: 1
Date: Thu, 16 Aug 2018 15:45:46 +0200 (CEST)
Subject: Re: URL parsing for the command line tool?
Content-Type: text/plain; format=flowed; charset=US-ASCII
use case in mind.
1. host-specific sections in config files. So .curlrc can specify for example
a specific user-agent to use only when connecting to example.com and a
different user-agent for example.org.
2. command-line variables based on the most recently used URL. If you want to
save the output from a download in a directory named as the host name with the
"curl https://example.com/file -o "%{url_host}/%{url_file}".
That could be fun for those who download a HTML page and want to download
something that is pointed to with a relative URL within that.
curl $url > raw.html
extract_hrefs;
for i in $all_hrefs; do
curl $url --rel-url "$i" -O
done
Most of this seems like feature bloat. Date: Thu, 16 Aug 2018 15:45:46 +0200 (CEST)
Subject: Re: URL parsing for the command line tool?
Content-Type: text/plain; format=flowed; charset=US-ASCII
Please try something more real world (even contrived). Why would you want to
change hosts? Your example makes no sense to me.
I was just trying to provoke thoughts and ideas, I didn't have any particularchange hosts? Your example makes no sense to me.
use case in mind.
1. host-specific sections in config files. So .curlrc can specify for example
a specific user-agent to use only when connecting to example.com and a
different user-agent for example.org.
2. command-line variables based on the most recently used URL. If you want to
save the output from a download in a directory named as the host name with the
"curl https://example.com/file -o "%{url_host}/%{url_file}".
export in1="../download.html"
for i in DragonFlyBSD FreeBSD NetBSD; do
curl --base-url $base --output-url - $in1 "#" $i | ./download_curl.sh
done
curl http://example.org/foo --rel-url "../here/it/is" -Ofor i in DragonFlyBSD FreeBSD NetBSD; do
curl --base-url $base --output-url - $in1 "#" $i | ./download_curl.sh
done
That could be fun for those who download a HTML page and want to download
something that is pointed to with a relative URL within that.
curl $url > raw.html
extract_hrefs;
for i in $all_hrefs; do
curl $url --rel-url "$i" -O
done
host-specific sections in config files could be useful.
The rest can be easily (and unless you put a lot of work in, more
flexibly and easily) in the scripting language of your choice.
For Perl, see URI for parsing/dissecting URIs of all sorts. And
HTML::TreeBuilder for parsing HTML (including finding hrefs, <img> srcs,
etc.)Â Also, there are other Perl modules that provide a direct
interface to Curl. So it's quite easy to handle your examples - and the
more complex usages that they presage. (e.g. $uri = URI->new(
"http://example.org";)Â print $uri->host; $uri->path, $uri->fragment;
$abs = URI->abs( "bar/nil.jpg", "https://example.org/foo" ); print
$abs->path ; $rel = $abs->rel("https://example.org./"); ... ).
Python has similar URI parsing & Curl access modules.
I don't think you want Curl to become a scripting language. I'd stick
with the Unix philosophy of small tool, each of which does one thing
well, that can be composed for more complex tasks.
But if you go this way, be prepared to see a long list of "enhancement"
requests that will add development & maintenance effort to your plate.
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.