Skip to content

_enforce_nested_string_type() takes too much time for Sequence(Value(...)) #153

@NouamaneTazi

Description

@NouamaneTazi

As you can see in the figure below, most of time spent adding a batch is consumed by _enforce_nested_string_type(). Is that necessary?

import logging

import torch
from datasets import Features, Sequence, Value, Array2D

from evaluate.module import EvaluationModule, EvaluationModuleInfo
import time

logging.basicConfig(level=logging.INFO)


class DummyMetric(EvaluationModule):
    def _info(self):
        return EvaluationModuleInfo(
            description="dummy metric for tests",
            citation="insert citation here",
            features=Features(
                {
                    "tensor": Sequence(Value("int64")),
                }
            ),
        )


metric = DummyMetric()
start_time = time.time()
metric.add_batch(
    tensor=torch.randint(0, 10, (1000, 7000)),
)
print(f"time={round(time.time()-start_time, 2)}s") # outputs time=32.41s

image

Related to: #33

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions