What Type of Data is Not Suitable for Upload?

Do not upload to "Magellan" data that is summarized and aggregated, or datasets with few column fields and an insufficient amount of records.

You should not upload the following types of data to "Magellan":

Summarized and aggregated data

Algorithms need detail to be able to make inferences and find patterns. Therefore, summarized data and granularity will not work well with "Magellan", nor with Exploratory Analytics and Advanced Analytics in general.

We know that the types of data that business executives often work with when reporting is heavily aggregated. The data is summarized to enable them to quickly understand how the business is performing.

Table 1: Example of Heavily Aggregated Data
 

Close Won Sales

% of Quota

EMEA

10,000,000

60%

North America

8,000,000

40%

Asia - PAC

6,000,000

110%

"Magellan" requires much more data than the above to be effective. As a rule of thumb, the more data with business context that you upload to "Magellan", the more accurate the analysis will be for you.

Datasets with few fields

"Magellan" is designed to work with thousands of fields. Think of the thousands of business activities your organization does on a daily basis. If these activities are reflected in the data, "Magellan" can use them to see how much influence they have on key performance metrics.

When looking at data in reporting and data discovery tools, users often only export fields that relate to the Key Performance Metrics that they are analysing.

For example, consider the following booking dataset (which has few fields) from the Travel Industry:

Table 2: Booking Dataset

Country

Region

City

Resort Country

Resort

Service_ Line

Service

Service_ Price

Year

Quarter

Month

Revenue

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Mar

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Jun

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Sept

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Dec

3240

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q2

Mar

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q3

Jun

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q4

Sept

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Dec

3240

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q2

Mar

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q3

Jun

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q4

Sep

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q1

Dec

3240

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q2

Mar

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q3

Jun

9270

Germany

Bavaria

Augsburg

France

French Riveria

Accommod.

Hotel Suite

270

FY93

Q4

Sep

9270

For analysis purposes it might be tempting to include only the Service, Revenue and Region fields; after all, these are the typical outputs in which the business executive is interested.

However, when analysing the full dataset above in "Magellan", it shows us that the month has a very strong influence on the price. If the Month field is not included in the analyses, you miss out on learning this insight. This goes to show that it is impossible to tell which fields are important in advance.

Therefore, the best thing is to include all fields. The more fields leads to more accurate insight.

Datasets with insufficient number of records

"Magellan" can work with millions of records. Therefore, include as many records to which you have access. To find accurate patterns, "Magellan" must have enough data. At a basic minimum, it needs 100 records or 10 times the number of fields in the dataset. So, if the dataset has 1000 fields it needs 10,000 records to be accurate. This is a rule of thumb and will vary from dataset to dataset.

Remember if "Magellan" cannot produce a robust insight, it will not display one and instead will give an error message explaining why. One of the key reasons will be an insufficient number of records or fields.

If "Magellan" has an insufficient number of records, the Insight Quality (or accuracy and robustness as a model to use with a larger dataset) will be negatively affected. Likewise, an insufficient number of fields and attributes in a dataset will lead to weak patterns are weak. This will be reflected in the Insight Quality rating.

Datasets with limited analysis scope

"Magellan" enables you to narrow the scope to a particular part of the dataset. For example, you can look at revenue > 50,000. If this results in dataset sizes that are less than the minimum advised then "Magellan" may not find patterns in the data. In this situation, you must provide a broader range of data.