kf.split(X_data) returns a list of tuples (list of tuples) with n_split elements, where each element of the list is a tuple consisting of train and test vectors.
Those. for n_split=2 - kf.split(X_data) will return:
[(train0, test0), (train1, test1)]
for n_split=3 :
[(train0, test0), (train1, test1), (train2, test2)]
where each train* / test* is a vector (1D Numpy array)
Here is an example:
Setup:
In [287]: X = np.arange(24).reshape(-1,3) In [288]: X Out[288]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19, 20], [21, 22, 23]]) In [289]: kf = KFold(n_splits=2, shuffle=True, random_state=1)
Primitive (naive) option - works only for n_splits=2 :
In [302]: (train1, test1), (train2, test2) = kf.split(X) In [303]: X[train1] Out[303]: array([[ 0, 1, 2], [ 9, 10, 11], [12, 13, 14], [15, 16, 17]]) In [304]: X[test1] Out[304]: array([[ 3, 4, 5], [ 6, 7, 8], [18, 19, 20], [21, 22, 23]])
but it is better to do more flexibly:
In [290]: train, test = zip(*kf.split(X)) In [291]: train Out[291]: (array([0, 3, 4, 5]), array([1, 2, 6, 7])) In [292]: test Out[292]: (array([1, 2, 6, 7]), array([0, 3, 4, 5])) In [293]: X[train[0]] Out[293]: array([[ 0, 1, 2], [ 9, 10, 11], [12, 13, 14], [15, 16, 17]]) In [294]: X[test[0]] Out[294]: array([[ 3, 4, 5], [ 6, 7, 8], [18, 19, 20], [21, 22, 23]]) In [295]: X[train[1]] Out[295]: array([[ 3, 4, 5], [ 6, 7, 8], [18, 19, 20], [21, 22, 23]]) In [296]: X[test[1]] Out[296]: array([[ 0, 1, 2], [ 9, 10, 11], [12, 13, 14], [15, 16, 17]])
And now the same for n_splits=3 :
In [297]: kf = KFold(n_splits=3, shuffle=True, random_state=1) In [298]: train, test = zip(*kf.split(X)) In [299]: train Out[299]: (array([0, 3, 4, 5, 6]), array([1, 2, 3, 5, 7]), array([0, 1, 2, 4, 6, 7])) In [300]: test Out[300]: (array([1, 2, 7]), array([0, 4, 6]), array([3, 5]))
X_data(print(X_data.shape))? - MaxU