fiesta.non_adaptive_fc

This is the standard approach where all models each round are evaluated once and this continous until at the end of a round one of the models is better than the rest by a certain confidence level. All models are evaluated the same number of times hence non adaptive.

Caveat

Assumes that the evaluations scores produced by the models follow a Gaussian (normal) distribution. For a great guide on knowing what distribution your evaluation metric would produce see Dror and Reichart, 2018 guide.

Note

All models are evaluted at least three times to ensure that we have a larger enough belief to start testing if one model is better than the rest with a certain confidence.

fiesta.fiesta.non_adaptive_fc(data, model_functions, split_function, p_value, logit_transform=False, samples=100000)[source]
Parameters
  • data (List[Dict[str, Any]]) – A list of dictionaries, that as a whole represents the entire dataset. Each dictionary within the list represents one sample from the dataset.

  • model_functions (List[Callable[[List[Dict[str, Any]], List[Dict[str, Any]]], float]]) – A list of functions that represent different models e.g. pytorch model. Which take a train and test dataset as input and returns a metric score e.g. Accuracy. The model functions should not have random seeds set else it defeats the point of finding the best model independent of the random seed and data split.

  • split_function (Callable[[List[Dict[str, Any]]], Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]]) – A function that can be used to split the data into train and test splits. This should produce random splits each time it is called. If you would like to use a fixed split each time, you can hard code this function to produce a fixed split each time.

  • p_value (float) – The significance value for the best model to be truely the best model e.g. 0.05 if you want to be at least 95% confident.

  • logit_transform (bool) – Whether to transform the model function’s returned metric score by the logit function.

  • samples (int) – Number of samples to generate from our belief distribution for each model. This argument is passed directly to fiesta.util.belief_calc() within this function. This should be large e.g. minimum 10000.

Return type

Tuple[List[float], List[float], int, List[List[float]]]

Returns

Tuple containing 4 values:

  1. The confidence socres for each model, the best model should have the highest confidence

  2. The number of times each model was evaluated as a proportion of the number of evaluations

  3. The total number of model evaluations

  4. The scores that each model generated when evaluated.

NOTE

That if the logit transform is True then the last item in the tuple would be scores that have been transformed by the logit function.

Raises

ValueError – If the p_value is not between 0 and 1.