wget

Sends HTTP requests to fetch web pages or API responses. You can extract HTML DOM elements using CSS selectors, or receive JSON/XML API responses.

Command properties

PropertyDescription
Command typeDriver query or transforming
Required permissionNone
License usageCounted
Parallel executionNot supported
Distributed executionNot supported

Syntax

To send an HTTP request by specifying a URL directly:

wget url=STR [method={get|post|put|delete}] [selector=STR] [timeout=INT] [encoding=STR] [auth=STR] [format={form|json|xml}] [header=STR] [body=STR] [proxy=STR]

To send an HTTP request using the url field value from input records:

wget [method={get|post|put|delete}] [selector=STR] [timeout=INT] [encoding=STR] [auth=STR] [format={form|json|xml}] [header=STR] [body=STR] [proxy=STR]

Options

url=STR
URL for the HTTP request. Must start with http:// or https://. If this option is omitted, the command acts as a transforming query and reads the URL from the url field of each input record.
method={get|post|put|delete}
HTTP method (default: get).
  • get: GET request
  • post: POST request
  • put: PUT request
  • delete: DELETE request
selector=STR
CSS selector. When specified, extracts DOM elements matching the selector from the HTML response and assigns them as an array to the elements field. If not specified, the entire response is assigned to the line field.
timeout=INT
HTTP connection and read timeout in seconds (default: 30).
encoding=STR
Character encoding used to decode the HTTP response (default: utf-8).
auth=STR
HTTP Basic authentication credentials in username:password format.
format={form|json|xml}
Content-Type header format for POST, PUT, and DELETE requests (default: form).
  • form: application/x-www-form-urlencoded
  • json: application/json
  • xml: application/xml
header=STR
HTTP header field name. The value of this field in the input record (a string map) is used as the HTTP request headers.
body=STR
HTTP body field name. The value of this field in the input record is sent as the HTTP request body.
proxy=STR
HTTP proxy in host:port format.

Input fields

FieldTypeRequiredDescription
urlstringRequired if url option is not specifiedURL for the HTTP request. Read from this field when the url option is omitted.

Output fields

FieldTypeDescription
linestringHTTP response body. Assigned when the selector option is not specified.
elementsarrayList of DOM elements matching the CSS selector. Assigned when the selector option is specified.
_wget_codeintegerHTTP response code
_wget_errorstringError message. Assigned the value exceeds-max-size if the response exceeds the maximum size (default 10 MB).

Error codes

Parsing errors
Error codeMessageDescription
14100invalid wget method: get, post, put, delete supported.The method value is not one of get, post, put, or delete
14101invalid wget format: form, json, xml supported.The format value is not one of form, json, or xml
14102invalid wget url. reason: [error]The url value is not a valid URL format
14103invalid wget proxy. reason: [error]The proxy value is not in host:port format
Runtime errors

N/A

Description

The wget command sends HTTP requests and converts the responses into records.

When the url option is specified, the command acts as a driver query and sends an HTTP request to the specified URL, generating one record. When the url option is omitted, the command acts as a transforming query and reads the URL from the url field of each input record, sending one HTTP request per record. In transforming mode, records where the url field is null are skipped, and records are passed to the next command even if the request fails.

When the selector option is specified, the HTML response is parsed and DOM elements matching the CSS selector are extracted. Each element includes its attributes as key-value pairs, and text content is stored in own_text and text fields. The result is assigned to the elements field as an array of maps.

For POST, PUT, and DELETE requests, specifying the body option sends the value of that field as the request body. If the body option is not specified, the fields of the input record are automatically encoded and sent according to the format option. The form format uses URL encoding, the json format uses JSON serialization, and the xml format does not support automatic conversion.

Server certificate validation is not performed for HTTPS requests.

If the HTTP response size exceeds the maximum size (default 10 MB, configurable via the araqne.logdb.wget_max_size system property), driver query mode raises an error and transforming query mode assigns exceeds-max-size to the _wget_error field.

Examples

  1. Fetch a web page

    wget url="https://example.com"
    

    Fetches the web page at the specified URL and assigns the response body to the line field.

  2. Extract DOM elements using a CSS selector

    wget url="https://example.com" selector="h1"
    

    Extracts h1 tags from the web page and assigns them as an array to the elements field.

  3. Call a JSON API

    wget url="https://api.example.com/data" encoding=utf-8
    | parsejson
    

    Calls the JSON API and parses the response.

  4. Send data with a POST request

    json "{'src_ip': '192.0.2.1', 'action': 'block'}"
    | wget url="https://api.example.com/report" method=post format=json
    

    Sends the input record's fields as a JSON-formatted POST request body.

  5. Send sequential requests using the url field from input records

    json "[{'url': 'https://api.example.com/info/1'}, {'url': 'https://api.example.com/info/2'}]"
    | wget
    

    Sends an HTTP GET request using the url field value of each record.

  6. Send a request via an HTTP proxy

    wget url="https://example.com" proxy="198.51.100.10:8080"
    

    Sends an HTTP request through the specified proxy server.

  7. Send a request with custom headers and authentication

    json "{'url': 'https://api.example.com/secure'}"
    | eval headers = dict("X-Custom-Header", "value1")
    | wget header=headers auth="user:pass"
    

    Sends a request using a custom HTTP header and Basic authentication.

  8. Set a connection timeout

    wget url="https://api.example.com/report" timeout=60
    

    Sets the connection and read timeout to 60 seconds when connecting to a slow server.

  9. Send a pre-serialized body

    json "{'payload': 'action=block&ip=198.51.100.1'}"
    | wget url="https://api.example.com/block" method=post body=payload
    

    Sends the value of the field specified by the body option as the HTTP request body without modification. If the body option is omitted, all fields in the input record are automatically encoded according to the format option.

  10. Update a resource with a PUT request

    json "{'status': 'active'}"
    | wget url="https://api.example.com/items/42" method=put format=json
    

    Serializes the input record's fields as JSON and sends them as a PUT request body.

  11. Delete a resource with a DELETE request

    wget url="https://api.example.com/items/42" method=delete auth="admin:pass"
    

    Deletes the specified resource using Basic authentication.

Compatibility

The wget command has been available since before Logpresso Sonar 4.0.