wget
Sends HTTP requests to fetch web pages or API responses. You can extract HTML DOM elements using CSS selectors, or receive JSON/XML API responses.
Command properties
| Property | Description |
|---|---|
| Command type | Driver query or transforming |
| Required permission | None |
| License usage | Counted |
| Parallel execution | Not supported |
| Distributed execution | Not supported |
Syntax
To send an HTTP request by specifying a URL directly:
To send an HTTP request using the url field value from input records:
Options
url=STR- URL for the HTTP request. Must start with
http://orhttps://. If this option is omitted, the command acts as a transforming query and reads the URL from theurlfield of each input record. method={get|post|put|delete}- HTTP method (default:
get).
get: GET requestpost: POST requestput: PUT requestdelete: DELETE request
selector=STR- CSS selector. When specified, extracts DOM elements matching the selector from the HTML response and assigns them as an array to the
elementsfield. If not specified, the entire response is assigned to thelinefield. timeout=INT- HTTP connection and read timeout in seconds (default:
30). encoding=STR- Character encoding used to decode the HTTP response (default:
utf-8). auth=STR- HTTP Basic authentication credentials in
username:passwordformat. format={form|json|xml}- Content-Type header format for POST, PUT, and DELETE requests (default:
form).
form:application/x-www-form-urlencodedjson:application/jsonxml:application/xml
header=STR- HTTP header field name. The value of this field in the input record (a string map) is used as the HTTP request headers.
body=STR- HTTP body field name. The value of this field in the input record is sent as the HTTP request body.
proxy=STR- HTTP proxy in
host:portformat.
Input fields
| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Required if url option is not specified | URL for the HTTP request. Read from this field when the url option is omitted. |
Output fields
| Field | Type | Description |
|---|---|---|
| line | string | HTTP response body. Assigned when the selector option is not specified. |
| elements | array | List of DOM elements matching the CSS selector. Assigned when the selector option is specified. |
| _wget_code | integer | HTTP response code |
| _wget_error | string | Error message. Assigned the value exceeds-max-size if the response exceeds the maximum size (default 10 MB). |
Error codes
Parsing errors
| Error code | Message | Description |
|---|---|---|
| 14100 | invalid wget method: get, post, put, delete supported. | The method value is not one of get, post, put, or delete |
| 14101 | invalid wget format: form, json, xml supported. | The format value is not one of form, json, or xml |
| 14102 | invalid wget url. reason: [error] | The url value is not a valid URL format |
| 14103 | invalid wget proxy. reason: [error] | The proxy value is not in host:port format |
Runtime errors
N/A
Description
The wget command sends HTTP requests and converts the responses into records.
When the url option is specified, the command acts as a driver query and sends an HTTP request to the specified URL, generating one record. When the url option is omitted, the command acts as a transforming query and reads the URL from the url field of each input record, sending one HTTP request per record. In transforming mode, records where the url field is null are skipped, and records are passed to the next command even if the request fails.
When the selector option is specified, the HTML response is parsed and DOM elements matching the CSS selector are extracted. Each element includes its attributes as key-value pairs, and text content is stored in own_text and text fields. The result is assigned to the elements field as an array of maps.
For POST, PUT, and DELETE requests, specifying the body option sends the value of that field as the request body. If the body option is not specified, the fields of the input record are automatically encoded and sent according to the format option. The form format uses URL encoding, the json format uses JSON serialization, and the xml format does not support automatic conversion.
Server certificate validation is not performed for HTTPS requests.
If the HTTP response size exceeds the maximum size (default 10 MB, configurable via the araqne.logdb.wget_max_size system property), driver query mode raises an error and transforming query mode assigns exceeds-max-size to the _wget_error field.
Examples
-
Fetch a web page
wget url="https://example.com"Fetches the web page at the specified URL and assigns the response body to the
linefield. -
Extract DOM elements using a CSS selector
wget url="https://example.com" selector="h1"Extracts
h1tags from the web page and assigns them as an array to theelementsfield. -
Call a JSON API
wget url="https://api.example.com/data" encoding=utf-8 | parsejsonCalls the JSON API and parses the response.
-
Send data with a POST request
json "{'src_ip': '192.0.2.1', 'action': 'block'}" | wget url="https://api.example.com/report" method=post format=jsonSends the input record's fields as a JSON-formatted POST request body.
-
Send sequential requests using the
urlfield from input recordsjson "[{'url': 'https://api.example.com/info/1'}, {'url': 'https://api.example.com/info/2'}]" | wgetSends an HTTP GET request using the
urlfield value of each record. -
Send a request via an HTTP proxy
wget url="https://example.com" proxy="198.51.100.10:8080"Sends an HTTP request through the specified proxy server.
-
Send a request with custom headers and authentication
json "{'url': 'https://api.example.com/secure'}" | eval headers = dict("X-Custom-Header", "value1") | wget header=headers auth="user:pass"Sends a request using a custom HTTP header and Basic authentication.
-
Set a connection timeout
wget url="https://api.example.com/report" timeout=60Sets the connection and read timeout to 60 seconds when connecting to a slow server.
-
Send a pre-serialized body
json "{'payload': 'action=block&ip=198.51.100.1'}" | wget url="https://api.example.com/block" method=post body=payloadSends the value of the field specified by the
bodyoption as the HTTP request body without modification. If thebodyoption is omitted, all fields in the input record are automatically encoded according to theformatoption. -
Update a resource with a PUT request
json "{'status': 'active'}" | wget url="https://api.example.com/items/42" method=put format=jsonSerializes the input record's fields as JSON and sends them as a PUT request body.
-
Delete a resource with a DELETE request
wget url="https://api.example.com/items/42" method=delete auth="admin:pass"Deletes the specified resource using Basic authentication.
Compatibility
The wget command has been available since before Logpresso Sonar 4.0.