Import data files

Uploading files

In this section, we’ll guide you through the process of uploading files to Graphext, a key step in beginning your project. Our platform supports a wide array of file types, ensuring versatility and compatibility with your data needs. Below, you’ll find a detailed list of the supported file formats.

Step-by-Step Guide to Uploading Your Files:

Start a New Project

Begin by clicking on “Create a New Project”. You can do this within your personal team space or in any of your collaborative teams.

Select Your File

Browse: Navigate through your computer’s directories to find the dataset you wish to use.
Drag and Drop: Alternatively, you can drag your file and drop it into the designated importation box on our platform.

File Upload and Project Creation

Once you’ve selected your file, our system will upload the data and automatically, infer the data types if needed and create a new project containing your file. This process ensures your data is ready for exploration and analysis.

Open and Explore Your Project

After the upload is complete, your project is ready. You can now open it and start your exploratory journey with Graphext.

Additional Notes:

Uploading Multiple Files with the Same Schema: If you have several files with the same schema, there’s no need to upload them one by one. Simply compress them into a ZIP file and upload it. Our system will seamlessly combine these files into one dataset for you.
Need Help with Complex Data Joins?: For more intricate requirements, such as specific joins of different datasets, don’t hesitate to reach out. Our team is on standby to assist you with any preprocessing needs. Contact us.

Supported File Types

In most cases, Graphext will inspect the raw data to try and infer the correct data type for each column (categorical, numeric, date, etc). This is not the case for formats that already have well defined column types, such as Apache Arrow (.arr / .arrow), Parquet (.pqt / .parquet), and SPSS (.sav). In these cases, instead of inferring the data types, we simply map them to the Graphext equivalent.

File Formats

CSV

Excel

JSON

Apache Arrow

Apache Parquet

SPSS SAV

GML & GraphML

ZIP Archives

Additionally, Graphext will automatically detect and convert the following list of strings to missing values (equivalent to no value, or and empty cell):

"#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#INF", "-1.#QNAN", "-NaN",
"-nan", "1.#IND", "1.#INF", "1.#INF000000", "1.#QNAN", "<NA>", "N/A",
"n/a", "NA", "NAN", "NaN", "nan", "NULL", "Null", "null", ""

The conversion will apply only if the whole field corresponds to one of these strings, i.e. if any of these values occurs as a substring inside a longer text, it will be left unchanged.

Correct File Structures

Text-like file formats, like CSV and JSON, may be subject to specific restrictions on how the data is structured inside the file.

CSV Correct Structure

While there is no “official” CSV standard, most implementations follow some common rules. We recommend adhering to the following guidelines adapted from the Internet Engineering Task Force, which you may also access directly here.

First Step

The first line in the file is a header line with the same format as normal record lines. This header contains names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file. For example:

field_1,field_2,field_3
aaa,bbb,ccc
zzz,yyy,xxx

Second Step

Each actual data record is located on a new line, delimited by a line break.

Third Step

The last record in the file may or may not have an ending line break.

Fourth Step

Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and will not be ignored. The last field in the record must not be followed by a comma. For example:✅ GOOD

field_1,field_2,field_3
aaa,bbb,ccc
zzz,yyy,xxx

❌ BAD

field_1,field_2,field_3
aaa,bbb,ccc,
zzz,yyy,xxx,

Fifth Step

Each field may or may not be enclosed in double quotes. If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:

"aaa","bbb","ccc"
zzz,yyy,xxx

Sixth Step

Fields containing line breaks, double quotes, and commas must be enclosed in double-quotes. For example:

"aaa","b
bb","ccc"
zzz,yyy,xxx

Seventh Step

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

"aaa","He said ""Hi!""","ccc"

JSON Correct Structure

We support three different JSON (JavaScript Object Notation) formats, which will be detected automatically by inspecting the beginning of a .json file.Json Lines:In the JSON lines format, each line in the file is a JSON object representing a dataset row. The object in each row contains field names as keys and the corresponding field’s value. For example:

{"field_1": "aaa", "field_2": "bbb", "field_3": "ccc"}
{"field_1": "zzz", "field_2": "yyy", "field_3": "xxx"}

For further details see the official JSON Lines documentation.List of Records:In this format the file contains a JSON list of objects, where each object contains field names and values as key-value pairs. For example:

[
    {"field_1": "aaa", "field_2": "bbb", "field_3": "ccc"},
    {"field_1": "zzz", "field_2": "yyy", "field_3": "xxx"}
]

Notice how the first level represents a list, and that objects within this list are separated by a comma. Line breaks and spaces between fields are not required, so the following is an equivalent but more compact format that is equally valid:

[{"field_1":"aaa","field_2":"bbb","field_3":"ccc"},{"field_1":"zzz","field_2":"yyy","field_3":"xxx"}]

Object of Columns:The last supported JSON format is column-oriented. In this format the file contains at the highest level a JSON object. This object has key-value pairs where each key is the name of a field/column, and each value is a JSON list containing {index: value} objects. For example:

{
"field_1": {0: "aaa", 1: "zzz"},
"field_2": {0: "bbb", 1: "yyy"},
"field_3": {0: "ccc", 1: "xxx"}
}

In this format, line breaks and spaces between fields are also ignored, and so the following is equivalent:

{"field_1":{0:"aaa",1:"zzz"},"field_2":{0:"bbb",1:"yyy"},"field_3":{0:"ccc",1:"xxx"}}

A Note on Automatic DetectionAs can be seen in the examples, each JSON format is easily identified by inspecting the first few lines of the file. We use the following heuristic:

If the file starts with [ - assume the List of Records format.
If the file contains more than 1 line, and each of the first 2 lines starts with { and ends with } - assume the JSON Lines format.
In all other cases - assume the Object of Columns format.

Getting Started

Import and Export Data

Data Exploration

Data Visualization

Data Preparation

Share, Present and Publish

Manage Workspace

Uploading files

Supported File Types

File Formats

Correct File Structures

Getting Started

Import and Export Data

Data Exploration

Data Visualization

Data Preparation

Share, Present and Publish

Manage Workspace

​Uploading files

​Supported File Types

​File Formats

​Correct File Structures

Uploading files

Supported File Types

File Formats

Correct File Structures