Skip to content

split method parameter #633

@shawnwiggins

Description

@shawnwiggins

Issue

The split method currently uses k as the parameter symbol. We noticed that this is causing some confusion since the method is primarily/only used during the $k$-NN section of the FDS.

Proposed Solution

Swap k with first_n (or something similar) in the source:

    def split(self, first_n):
        """Return a tuple of two tables where the first table contains
        ``first_n`` rows randomly sampled and the second contains the remaining rows.

        Args:
            ``first_n`` (int): The number of rows randomly sampled into the first
                table. ``first_n` must be between 1 and ``num_rows - 1``.

        Raises:
            ``ValueError``: ``first_n`` is not between 1 and ``num_rows - 1``.

        Returns:
            A tuple containing two instances of ``Table``.

        >>> jobs = Table().with_columns(
        ...     'job',  make_array('a', 'b', 'c', 'd'),
        ...     'wage', make_array(10, 20, 15, 8))
        >>> jobs
        job  | wage
        a    | 10
        b    | 20
        c    | 15
        d    | 8
        >>> sample, rest = jobs.split(3)
        >>> sample # doctest: +SKIP
        job  | wage
        c    | 15
        a    | 10
        b    | 20
        >>> rest # doctest: +SKIP
        job  | wage
        d    | 8
        """
        if not 1 <= first_n <= self.num_rows - 1:
            raise ValueError("Invalid value of first_n. first_n must be between 1 and the"
                             "number of rows - 1")

        rows = np.random.permutation(self.num_rows)

        first = self.take(rows[:first_n])
        rest = self.take(rows[first_n:])
        for column_label in self._formats:
            first._formats[column_label] = self._formats[column_label]
            rest._formats[column_label] = self._formats[column_label]
        return first, rest

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions