Skip to content

Adding Random probability distribution function #1862

@maxime4000

Description

@maxime4000

Clear and concise description of the problem

So I'm seeding a database with faker. I have field that allow array of some type. I want to generate multiple array, but with different size. Some where the array is empty, some where the array has 1 elements and some where the array has multiple elements.

Most of the case will have one element in the array, but I also want to test limit case, so having a way to generate Random distributed data would be nice.

const isEmpty = faker.datatype.boolean(); // ~50%
const isOneElement = faker.datatype.boolean(); // ~25%
const length = faker.datatype.number(100); // ~25%
const array = 
isEmpty ? []
        : isOneElement 
          ? [getFakerFunction(field)] 
          : Array.from({length}, () => getFakerFunction(field));
return array

Let's said that I'm faking an array of value and I want some length to be more common than others. It's common to have an array of length 1 to 3 but it's very rare to have an array of 100. I would like to have a random probability distribution function for this.

Suggested solution

In my case, I'm looking for a random exponential distribution.

  • Length 1 has 40% chance to happen,
  • Length 2 => 30%
  • Length 3 => 20%
  • and so on...

The function would accept an argument like this:

type ExponentialDistributionOptions = {
	min?: number;
	max?: number;
	precision?:number;
	curveSettings: {
		deviation?:number;
		mean?: number;
		// ...
	}
}

And would generate a number using the distribution called.
I would expect to call faker.random.exponentialDistribution({min: 0, max: 100, curveSettings: {...}}) and the number generated from this would have more chance to be closer to 0 than closer to 100. On a scale of 1000 random value generated, we could see few value with a number close to 100.

I wouldn't limit the feature to only exponential distribution, I would also add gaussian distribution, Rayleigh distribution, gamma distribution, etc...

Alternative

No response

Additional context

I'm not sure if what I'm asking is out of scope for faker, but at the same time, faker is generating data from a random value. Why would faker couldn't generate number base on some probability of that number to be generated?

Btw, I'm no mathematician, so I might be incorrect with what I explain, but I still think faker could add some random probability distribution function.

Metadata

Metadata

Assignees

Labels

c: featureRequest for new featurehas workaroundWorkaround provided or linkedm: numberSomething is referring to the number modulep: 1-normalNothing urgents: acceptedAccepted feature / Confirmed bug

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions