Discussion:
Can server tell I'm using curl?
Gilles via curl-users
2021-04-06 11:37:42 UTC
Permalink
Hello,

(The site <https://curl.se/mail/list.cgi?list=curl-users> is missing a
search feature, so I can't tell if the question has — most likely — been
asked before.)

I'm using the following to try and download a picture from a site that
first requires logging on:

curl.exe -L -b cookies.firefox.txt -A "Mozilla/5.0 (X11; Linux x86_64;
rv:60.0) Gecko/20100101 Firefox/81.0" -o mypict.jpg
"https://www.acme.com/attachment.php?attachmentid=123&d=456"

As you can see, I first used Firefox to log on, and used an extension to
export the cookies.

Is the command wrong, or is the server somehow able to tell I'm using
curl to forbid its use? I'm having the same problem with wget (but it
might not understand the cookies file.)

Thank you.

PS: FWIW, here's what the cookies file looks like:

# Netscape HTTP Cookie File
.acme.com    TRUE    /    TRUE    0    __cfduid
d79da270734bb7edbffe3aa0aa1617703540
www.acme.com    FALSE    /    FALSE    0 BIGipServeracme-web_POOL   
13264714.0.0000
www.acme.com    FALSE    /    FALSE    0    IDstack    %2C25801%2C
www.acme.com    FALSE    /    TRUE    0    bblastvisit    16106268
www.acme.com    FALSE    /    TRUE    0    bblastactivity    0
www.acme.com    FALSE    /    FALSE    0    bbsessionhash
777e8c348e6085766c0333b83648
www.acme.com    FALSE    /    FALSE    0    vbseo_loggedin    yes
www.acme.com    FALSE    /    TRUE    0    bbgsess
%2B21telUApnVbrpW4KHZMhQ4lWLDa4k4eg8YeQ0mA%3D%3D
www.acme.com    FALSE    /    FALSE    0    _ibs
0:kn5ux65i:68e6708a-b31-4f9b-9151-0310a248ab
www.acme.com    FALSE    /    FALSE    0    _ibp
0:kn5ux65h:31a75cbb-da2-4922-b34a-1b500a86aa
David Colter via curl-users
2021-04-06 13:43:10 UTC
Permalink
Hello,
As you can see, I first used Firefox to log on, and used an extension to export the cookies.
Is the command wrong, or is the server somehow able to tell I'm using curl to forbid its use? I'm having the same problem with wget (but it might not understand the cookies file.)
You used a user-agent string and the server will not know that you are using cURL.

Since you are exporting the cookies, you are making an unautomated task. Let cURL and your script handle cookies for you. Read about this here. <https://curl.se/docs/http-cookies.html>

I would recommend that you use Firefox to perform the login request and picture request, then explore the Network tab of Web Developer Tools. There, you can alternate (right) click on the request, chose Copy > Copy as cURL.

David
Gilles via curl-users
2021-04-06 14:24:20 UTC
Permalink
Thanks much, it worked.

For others' benefit, and since Curl apparently can only download a
single file at a time, I added the wget version :

curl.exe -L -b cookies.firefox.txt
"https://www.acme.net/attachment.php?attachmentid=123&d=456" -H
"User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:87.0) Gecko/20100101
Firefox/87.0" -o mypic.jpg

wget --content-disposition --no-check-certificate
--header="Accept-Language: en;q=0.8,en-US;q=0.7,en;q=0.6"
--header="Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
--load-cookies=cookies.firefox.txt --header="User-Agent: Mozilla/5.0
(Windows NT 6.1; rv:87.0) Gecko/20100101 Firefox/87.0" -i pics.txt
Post by Gilles via curl-users
Hello,
On Apr 6, 2021, at 7:37 AM, Gilles via curl-users
As you can see, I first used Firefox to log on, and used an extension
to export the cookies.
Is the command wrong, or is the server somehow able to tell I'm using
curl to forbid its use? I'm having the same problem with wget (but it
might not understand the cookies file.)
You used a user-agent string and the server will not know that you are using cURL.
Since you are exporting the cookies, you are making an unautomated
task.  Let cURL and your script handle cookies for you. Read about
this here. <https://curl.se/docs/http-cookies.html>
I would recommend that you use Firefox to perform the login request
and picture request, then explore the Network tab of Web Developer
Tools.  There, you can alternate  (right) click on the request, chose
Copy > Copy as cURL.
David
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Daniel Stenberg via curl-users
2021-04-06 14:29:00 UTC
Permalink
since Curl apparently can only download a single file at a time
That's of course incorrect. It can download any amount of files in the same
command line, even in parallel if you want to.
--
/ daniel.haxx.se
| Commercial curl support up to 24x7 is available!
| Private help, bug fixes, support, ports, new features
| https://www.wolfssl.com/contact/
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mai
Gilles via curl-users
2021-04-06 14:38:06 UTC
Permalink
Post by Daniel Stenberg via curl-users
since Curl apparently can only download a single file at a time
That's of course incorrect. It can download any amount of files in the
same command line, even in parallel if you want to.
I guess it was implemented later than on a page I checked. Since I'm
more familiar with wget 's "-i" syntax, I didn't double-check.

-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://
Daniel Stenberg via curl-users
2021-04-06 14:42:19 UTC
Permalink
I guess it was implemented later than on a page I checked. Since I'm more
familiar with wget 's "-i" syntax, I didn't double-check.
curl has supported any-amount of URLs per invoke since day 1. I guess you
didn't read that in any official curl docs?
--
/ daniel.haxx.se
| Commercial curl support up to 24x7 is available!
| Private help, bug fixes, support, ports, new features
| https://www.wolfssl.com/contact/
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.ha
Gilles via curl-users
2021-04-06 15:13:37 UTC
Permalink
Post by Daniel Stenberg via curl-users
Post by Gilles via curl-users
I guess it was implemented later than on a page I checked. Since I'm
more familiar with wget 's "-i" syntax, I didn't double-check.
curl has supported any-amount of URLs per invoke since day 1. I guess
you didn't read that in any official curl docs?
No, some user pages.

Does it support something like "-i mylist.txt", which is useful when
downloading too many files to fit on the command line ?

-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-use
Daniel Stenberg via curl-users
2021-04-06 15:59:07 UTC
Permalink
Post by Gilles via curl-users
Does it support something like "-i mylist.txt", which is useful when
downloading too many files to fit on the command line ?
curl supports -K which can load a "config file" with command line options,
that then allows you to specify any amount of URLs in it. It's not exactly the
same as wget's -i though.
--
/ daniel.haxx.se
| Commercial curl support up to 24x7 is available!
| Private help, bug fixes, support, ports, new features
| https://www.wolfssl.com/contact/
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.hax
Gilles via curl-users
2021-04-06 17:46:29 UTC
Permalink
Post by Daniel Stenberg via curl-users
Post by Gilles via curl-users
Does it support something like "-i mylist.txt", which is useful when
downloading too many files to fit on the command line ?
curl supports -K which can load a "config file" with command line
options, that then allows you to specify any amount of URLs in it.
It's not exactly the same as wget's -i though.
I'll check it out. Thank you.

-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Dan Fandrich via curl-users
2021-04-06 20:57:56 UTC
Permalink
Is the command wrong, or is the server somehow able to tell I'm using curl to
forbid its use
I've heard that some sites check things like the order of headers being sent as
well as various HTTP/2 options and even use TCP fingerprinting to try to ferret
out robots masquerading as browsers. curl can't hide itself from sites doing
that sort of client detection, but fortunately, that seems to be rare.

Dan
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.
Paul Gilmartin via curl-users
2021-04-06 21:14:45 UTC
Permalink
Post by Dan Fandrich via curl-users
Is the command wrong, or is the server somehow able to tell I'm using curl to
forbid its use
I've heard that some sites check things like the order of headers being sent as
well as various HTTP/2 options and even use TCP fingerprinting to try to ferret
out robots masquerading as browsers. curl can't hide itself from sites doing
that sort of client detection, but fortunately, that seems to be rare.
What about reCAPTCHA?:
https://en.wikipedia.org/wiki/ReCAPTCHA#No_CAPTCHA_reCAPTCHA

-- gil


-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquet
bruce via curl-users
2021-04-06 23:21:43 UTC
Permalink
Hi..

When trying to crawl a site using one of the recaptcha processes, your
crawl can get ugly.

However, if you break up the process, the crawl might be reasonable.

1) When youre doing the crawl, does the crawl have to be totally
automated? Or, can it be automated after you get past the recaptch
process!

If you're only crawling a site with a few pages, then you may as well
do it manually.

However, devil is in the details!

If the target site generates a cookie after the recaptcha is handled,
then you might be able to use the "cookie" from the browser in your
curl process. This would allow you to continue on with the crawl
process. If you're lucky, the timeout of the cookie will last for a
good portion of the day.

I've kind of discovered, the sites I target, can usually be managed
using curl and a bit of clever thinking. This gets by a bunch of
avascript stuff by examining/implementing curl processes that manage
to track the required processes for the site.

Your Mileage might vary!

good luck

On Tue, Apr 6, 2021 at 5:22 PM Paul Gilmartin via curl-users
Post by Paul Gilmartin via curl-users
Post by Dan Fandrich via curl-users
Is the command wrong, or is the server somehow able to tell I'm using curl to
forbid its use
I've heard that some sites check things like the order of headers being sent as
well as various HTTP/2 options and even use TCP fingerprinting to try to ferret
out robots masquerading as browsers. curl can't hide itself from sites doing
that sort of client detection, but fortunately, that seems to be rare.
https://en.wikipedia.org/wiki/ReCAPTCHA#No_CAPTCHA_reCAPTCHA
-- gil
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail

Loading...