Why does get_tree_stats method of BallTree give zeros tuple?

Issue

This Content is from Stack Overflow. Question asked by Vovin

I use BallTree structure from scikitlearn in order to find nearest points inside a radius. I have a geometry array consisting of 1203 points. It works well. For example, let’s find nearest points inside 100 km around an isolate from Fujisawa:

ball_tree = BallTree(
    geometry_array,
    metric="haversine",
)

isolate_index = foreign_AVP1.index.get_loc("LC035020.1")
near_indices = ball_tree.query_radius(
    geometry_array[isolate_index].reshape(1, -1), r=kmToRadians(100)
).item()

foreign_AVP1.iloc[near_indices].locale

The head of the output:

taxonlocale
LC037396.1Japan Yokohama
LC035020.1Japan Fujisawa
LC037395.1Japan Yokohama
LC014787.1Japan Tokyo
AB279734.1Japan Tokyo

I was curious to look at the tree structure & used get_tree_stats method to know the leaves number. Unawares, I got zeros tuple: (0, 0, 0). This is strange as there are a default leaf size 40 & the number of samples 1203 that should result in about 16-30 leaves. When I used get_arrays method to get the raw data I saw a normal picture (this is just a part), there are 16 leaves:

array([(   0, 1203, 0, 3.0947303 ), (   0,  601, 0, 2.41933654),
        ( 601, 1203, 0, 0.58490656), (   0,  300, 0, 1.85505229),
        ( 300,  601, 0, 0.89693236), ( 601,  902, 0, 0.49983081),
        ( 902, 1203, 0, 0.40970688), (   0,  150, 0, 1.48087714),
        ( 150,  300, 0, 0.32291627), ( 300,  450, 0, 0.20494666),
        ( 450,  601, 0, 0.81787611), ( 601,  751, 0, 0.55418562),
        ( 751,  902, 0, 0.33448889), ( 902, 1052, 0, 0.01141954),
        (1052, 1203, 0, 0.39229974), (   0,   75, 1, 1.00467692),
        (  75,  150, 1, 1.07852748), ( 150,  225, 1, 0.25955598),
        ( 225,  300, 1, 0.24115038), ( 300,  375, 1, 0.12721095),
        ( 375,  450, 1, 0.1702189 ), ( 450,  525, 1, 0.78454107),
        ( 525,  601, 1, 0.09367431), ( 601,  676, 1, 0.0368483 ),
        ( 676,  751, 1, 0.4762352 ), ( 751,  826, 1, 0.2375712 ),
        ( 826,  902, 1, 0.17776855), ( 902,  977, 1, 0.01107241),
        ( 977, 1052, 1, 0.0118055 ), (1052, 1127, 1, 0.32023704),
        (1127, 1203, 1, 0.09146244)],
       dtype=[('idx_start', '<i8'), ('idx_end', '<i8'), ('is_leaf', '<i8'), ('radius', '<f8')]),

Why does the method get_tree_stats have the unexpected behavior?



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?