URL parsing for the command line tool? (Daniel Stenberg)

Timothe Litt

2018-08-17 11:11:38 UTC

Message: 1
Date: Thu, 16 Aug 2018 15:45:46 +0200 (CEST)
Subject: Re: URL parsing for the command line tool?
Content-Type: text/plain; format=flowed; charset=US-ASCII

Please try something more real world (even contrived). Why would you want to
change hosts? Your example makes no sense to me.

I was just trying to provoke thoughts and ideas, I didn't have any particular
use case in mind.
1. host-specific sections in config files. So .curlrc can specify for example
a specific user-agent to use only when connecting to example.com and a
different user-agent for example.org.
2. command-line variables based on the most recently used URL. If you want to
save the output from a download in a directory named as the host name with the
"curl https://example.com/file -o "%{url_host}/%{url_file}".

export in1="../download.html"
for i in DragonFlyBSD FreeBSD NetBSD; do
curl --base-url $base --output-url - $in1 "#" $i | ./download_curl.sh
done

curl http://example.org/foo --rel-url "../here/it/is" -O
That could be fun for those who download a HTML page and want to download
something that is pointed to with a relative URL within that.
curl $url > raw.html
extract_hrefs;
for i in $all_hrefs; do
curl $url --rel-url "$i" -O
done

Most of this seems like feature bloat.Â

host-specific sections in config files could be useful.

The rest can be easily (and unless you put a lot of work in, more
flexibly and easily) in the scripting language of your choice.

For Perl, see URI for parsing/dissecting URIs of all sorts.Â And
HTML::TreeBuilder for parsing HTML (including finding hrefs, <img> srcs,
etc.)Â Also, there are other Perl modules that provide a direct
interface to Curl.Â So it's quite easy to handle your examples - and the
more complex usages that they presage.Â (e.g. $uri = URI->new(
"http://example.org";)Â print $uri->host; $uri->path, $uri->fragment;
$abs = URI->abs( "bar/nil.jpg", "https://example.org/foo" ); print
$abs->path ; $rel = $abs->rel("https://example.org./"); ... ).

Python has similar URI parsing & Curl access modules.

I don't think you want Curl to become a scripting language.Â I'd stick
with the Unix philosophy of small tool, each of which does one thing
well, that can be composed for more complex tasks.

But if you go this way, be prepared to see a long list of "enhancement"
requests that will add development & maintenance effort to your plate.

Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.