Fetch url content¶
NLP • text
Fetch the main text from a web URL, and return its title, author, content, excerpt and domain.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
fetch_url_content(urls: url) -> (
title: text,
author: category,
content: text,
excerpt: text,
domain: url
)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
Since the step has no configuration parameter, it's simply
fetch_url_content(ds.article_url) -> (
ds.article_title,
ds.article_author,
ds.article_content,
ds.article_excerpt,
ds.article_domain
)
Inputs¶
urls: column:url
A column of URLs linkling to articles, blog posts or webpages.
Outputs¶
title: column:text
A text column containing the extracted article's title.
author: column:category
A categorical column containing the extracted article's author.
content: column:text
A text column containing the extracted article's main text.
excerpt: column:text
A text column containing a summary of the extracted article.
domain: column:url
A column containing only the domain of each original URL.