Skip to main content
Let’s say we have http://www.cwi.nl:80/%7Eguido/Python.html;a=2;b=3?c=4,2&d=e#anchor as our URL. Then these components will be the following:
  • scheme: URL scheme specifier (http)
  • domain: Network location part (www.cwi.nl:80)
  • path: Hierarchical path (/%7Eguido/Python.html)
  • params: Parameters for last path element (a=2;b=3)
  • query: Query component (c=4,2&d=e)
  • fragment: Fragment identifier (anchor)
For more information about these components you can check urllib’s description here.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
Use http as default scheme.
extract_url_components(ds.urls, {
  "default_scheme": "http",
}) -> (
  ds.scheme,
  ds.domain,
  ds.path,
  ds.params,
  ds.query,
  ds.fragment
)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
urls
column[url]
required
The list of URLs you wish to decomponse.
scheme
column[category]
required
URL scheme specifier (http).
domain
column[category]
required
Network location part (www.cwi.nl:80).
path
column[category]
required
Hierarchical path (/%7Eguido/Python.html).
params
column[category]
required
Parameters for last path element (a=2;b=3).
query
column[category]
required
Query component (c=4,2&d=e).
fragment
column[category]
required
Fragment identifier, like after hashtag (anchor).

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

default_scheme
[string, null]
default:"http"
URL Default Scheme. If you wish to add a scheme (http, https…) prefix to those urls that don’t have one, do it here. If you wish none to be added, use null instead.
I