common module¶

The common module contains common functions and classes used by the other modules.

`dataframe_to_dict(df, orientation='columns')` ¶

Convert a DataFrame to a dictionary in various orientations.

df: The DataFrame to convert.
orientation: The dictionary orientation. Options:
"columns": {column -> {index -> value}}
"index": {index -> {column -> value}}
"records": [{column -> value}, ... , {column -> value}]
"list": {column -> [values]}
"split": Dict with keys: index, columns, and data

Dictionary representation of the DataFrame.

Source code in intellikit/common.py

def dataframe_to_dict(df, orientation="columns"):
    """
    Convert a DataFrame to a dictionary in various orientations.

    Parameters:
    - df: The DataFrame to convert.
    - orientation: The dictionary orientation.
      Options:
      - "columns": {column -> {index -> value}}
      - "index": {index -> {column -> value}}
      - "records": [{column -> value}, ... , {column -> value}]
      - "list": {column -> [values]}
      - "split": Dict with keys: index, columns, and data

    Returns:
    - Dictionary representation of the DataFrame.
    """
    # Supported orientations for to_dict() method
    valid_orientations = {"dict", "index", "records", "list", "split"}

    # Check if the given orientation is valid
    if orientation not in valid_orientations:
        raise ValueError(f"Invalid orientation. Choose from {', '.join(valid_orientations)}")

    # Convert DataFrame to dictionary
    return df.to_dict(orient=orientation)

`linearRetriever(df, query, similarity_functions, feature_weights, top_n=1)` ¶

linear retriever performs a K-NN search by computing all similarities between the query and each case one by one sequentially

Parameters:

Name	Type	Description	Default
`df`	`pandas.DataFrame`	DataFrame containing features for case characterization	required
`similarity_function`	`Dictionary`	A dictionary containing similarity functions for each feature in the casebase.	required
`query`	`pandas.DataFrame`	Datathe total similarity or weighted similarity column	required
`feature_weights`	`Dictionary`	A dictionary of weights assigned to each feature	required
`top_n`	`int`	Number of top similar cases to return.	`1`

Returns:

Type	Description
`dataframe`	top k cases from the casebase.

Source code in intellikit/common.py

def linearRetriever(df, query, similarity_functions, feature_weights, top_n=1):
    """
  linear retriever performs a K-NN search by computing all similarities between the query and each case one by one sequentially


  Args:
      df (pandas.DataFrame): DataFrame containing features for case characterization
      similarity_function (Dictionary): A dictionary containing similarity functions for each feature in the casebase.
      query (pandas.DataFrame): Datathe total similarity or weighted similarity column
      feature_weights (Dictionary): A dictionary of weights assigned to each feature
      top_n (int): Number of top similar cases to return.

  Returns:
      dataframe:  top k cases from the casebase.
  """
    # Create a DataFrame to store similarities
    similarities = pd.DataFrame(index=df.index)

    # Iterate over the columns in the DataFrame
    for feature in df.columns:
        # Check if the feature is in the similarity_functions dictionary
        if feature in similarity_functions:
            # Retrieve the similarity function for the feature
            similarity_function = similarity_functions[feature]

            # Check if feature weight is iterable, convert to float if necessary
            feature_weight = float(feature_weights.get(feature, 1.0))  # Default to 1.0 if not provided

            # Apply the similarity function to calculate similarities
            similarities[feature] = similarity_function(df[[feature]].copy(), query[[feature]], feature) * feature_weight
        else:
            # If the feature is not found in the similarity_functions dictionary, set the similarity to 0
            similarities[feature] = 0.0

    # Calculate total similarity as the sum of weighted similarities
    similarities['total_similarity'] = similarities.sum(axis=1)

    # Select top N cases with the lowest total similarity
    top_n_indices = similarities['total_similarity'].nlargest(top_n).index
    top_n_cases = df.loc[top_n_indices]

    return top_n_cases

`macfacRetriever(df, query, mac_features, fac_features, similarity_functions, feature_weights, top_n_mac=2, top_n_fac=1)` ¶

MACFAC Retriever performs a two-staged retrieval where the first phase (MAC) uses a lightweight similarity to remove irrelevant cases for the second phase (FAC) where the final similarity for the filtered cases is evaluated.

Parameters:

Name	Type	Description	Default
`df`	`pandas.DataFrame`	DataFrame containing features for case characterization	required
`similarity_function`	`Dictionary`	A dictionary containing similarity functions for each feature in the casebase.	required
`query`	`pandas.DataFrame`	Datathe total similarity or weighted similarity column	required
`feature_weights`	`Dictionary`	A dictionary of weights assigned to each feature	required
`mac_features`		A list of features to use for the MAC phase (mac_features = ['feature4', 'feature5'])	required
`fac_features`		A list of features to use for the FAC phase	required
`top_n_mac`	`int`	Number of top similar cases to return during the MAC phase.	`2`
`top_n_fac`	`int`	Number of top similar cases to return during the FAC phase	`1`

Returns:

Type	Description
`dataframe`	top k cases from the casebase specified for the FAC phase.

Source code in intellikit/common.py

def macfacRetriever(df, query, mac_features, fac_features, similarity_functions, feature_weights, top_n_mac=2, top_n_fac=1):
    """
  MACFAC Retriever performs a two-staged retrieval where the first phase (MAC) uses a lightweight similarity to remove irrelevant cases for the second phase (FAC) where the final similarity for the filtered cases is evaluated.

  Args:
      df (pandas.DataFrame): DataFrame containing features for case characterization
      similarity_function (Dictionary): A dictionary containing similarity functions for each feature in the casebase.
      query (pandas.DataFrame): Datathe total similarity or weighted similarity column
      feature_weights (Dictionary): A dictionary of weights assigned to each feature
      mac_features: A list of features to use for the MAC phase (mac_features = ['feature4', 'feature5'])
      fac_features: A list of features to use for the FAC phase
      top_n_mac (int): Number of top similar cases to return during the MAC phase.
      top_n_fac (int): Number of top similar cases to return during the FAC phase

  Returns:
      dataframe:  top k cases from the casebase specified for the FAC phase.
  """
    filtered_df = mac_stage(df, query, mac_features, similarity_functions, top_n_mac)
    top_similar_cases = fac_stage(filtered_df, query, fac_features, similarity_functions, feature_weights, top_n_fac)
    return top_similar_cases

`parallelRetriever(df, query, similarity_functions, feature_weights, top_n=1)` ¶

Parallel linear retriever performs a K-NN search by computing all similarities between the query and each case using all available computing cors of the respective CPU.

Parameters:

Name	Type	Description	Default
`df`	`pandas.DataFrame`	DataFrame containing features for case characterization	required
`similarity_function`	`Dictionary`	A dictionary containing similarity functions for each feature in the casebase.	required
`query`	`pandas.DataFrame`	Datathe total similarity or weighted similarity column	required
`feature_weights`	`Dictionary`	A dictionary of weights assigned to each feature	required
`top_n`	`int`	Number of top similar cases to return.	`1`

Returns:

Type	Description
`dataframe`	top k cases from the casebase.

Source code in intellikit/common.py

def parallelRetriever(df, query, similarity_functions, feature_weights, top_n=1):
    """
  Parallel linear retriever performs a K-NN search by computing all similarities between the query and each case using all available computing cors of the respective CPU.

  Args:
      df (pandas.DataFrame): DataFrame containing features for case characterization
      similarity_function (Dictionary): A dictionary containing similarity functions for each feature in the casebase.
      query (pandas.DataFrame): Datathe total similarity or weighted similarity column
      feature_weights (Dictionary): A dictionary of weights assigned to each feature
      top_n (int): Number of top similar cases to return.

  Returns:
      dataframe:  top k cases from the casebase.
  """
    with Pool(cpu_count()) as pool:
        # Use starmap to pass additional arguments to calculate_similarity
        similarity_results = pool.starmap(calculate_similarity, [(feature, df, query, similarity_functions, feature_weights) for feature in df.columns])

    similarities = pd.concat(similarity_results, axis=1)
    similarities['total_similarity'] = similarities.sum(axis=1)

    # Select top N cases with the lowest total similarity
    top_n_indices = similarities['total_similarity'].nlargest(top_n).index
    top_n_cases = df.loc[top_n_indices]

    return top_n_cases

`retrieve_topk(cases, similarity_data, sim_column, k)` ¶

Ranks features in a DataFrame by total similarity and returns top k results.

Parameters:

Name	Type	Description	Default
`cases`	`pandas.DataFrame`	DataFrame containing features	required
`similarity_data(pandas.DataFrame)`		DataFrame containing similarity scores for each feature and a 'total_similarity' column.	required
`sim_column`		the total similarity or weighted similarity column	required
`k`	`int`	Number of top features to return.	required

Returns:

Type	Description
`dataframe`	top k cases from the casebase.

Source code in intellikit/common.py

def retrieve_topk(cases, similarity_data, sim_column, k):
  """
  Ranks features in a DataFrame by total similarity and returns top k results.

  Args:
      cases (pandas.DataFrame): DataFrame containing features
      similarity_data(pandas.DataFrame): DataFrame containing similarity scores for each feature and a 'total_similarity' column.
      sim_column: the total similarity or weighted similarity column
      k (int): Number of top features to return.

  Returns:
      dataframe:  top k cases from the casebase.
  """

  #Combining the sets
  merged = pd.concat([cases, similarity_data], axis=1)

  # Sort DataFrame by 'total_similarity' in descending order
  sorting_df = merged.sort_values(by=sim_column, ascending=False)

  # Get the column to keep from the second DataFrame (assuming there's only one)
  column_to_keep = similarity_data.filter(like=sim_column).columns[0]  # Extract column name containing 'F'
  # Keep only the desired columns using list comprehension
  desired_columns = [col for col in merged.columns if col in cases.columns or col == column_to_keep]
  result_df = sorting_df[desired_columns]


  # Select top k features
  top_k_cases = result_df.head(k)

  return top_k_cases

`stream_text(text, delay=0.02)` ¶

Printing text with a slight delay like ChatGPT

Parameters:

Name	Type	Description	Default
`text`	`plain text`	The text to be printed with a slight delay for example a paragraph or document	required
`delay`	`float`	Amount of time to delay. Defaults to 0.02.	`0.02`

Source code in intellikit/common.py

def stream_text(text, delay=0.02):
    """
    Printing text with a slight delay like ChatGPT

    Args:
        text (plain text): The text to be printed with a slight delay for example a paragraph or document
        delay (float, optional): Amount of time to delay. Defaults to 0.02.
    """
    for char in text:
        print(char, end = "", flush = True)
        time.sleep(delay)

common module¶

dataframe_to_dict(df, orientation='columns') ¶

linearRetriever(df, query, similarity_functions, feature_weights, top_n=1) ¶

macfacRetriever(df, query, mac_features, fac_features, similarity_functions, feature_weights, top_n_mac=2, top_n_fac=1) ¶

parallelRetriever(df, query, similarity_functions, feature_weights, top_n=1) ¶

retrieve_topk(cases, similarity_data, sim_column, k) ¶

stream_text(text, delay=0.02) ¶

`dataframe_to_dict(df, orientation='columns')` ¶

`linearRetriever(df, query, similarity_functions, feature_weights, top_n=1)` ¶

`macfacRetriever(df, query, mac_features, fac_features, similarity_functions, feature_weights, top_n_mac=2, top_n_fac=1)` ¶

`parallelRetriever(df, query, similarity_functions, feature_weights, top_n=1)` ¶

`retrieve_topk(cases, similarity_data, sim_column, k)` ¶

`stream_text(text, delay=0.02)` ¶