Skip to content

Conversation

jakobjakobson13
Copy link
Contributor

I´ll try to break down my draft #17577 into smaller bits, starting by this pull request.

Reference issue

gh-17577

What does this implement/fix?

It slims down the _codata.py file by moving the constants data in their own file. This makes its editing easier.

Additional information

This pull request should not break anything as the changes are only internal.

@lucascolley lucascolley added the maintenance Items related to regular maintenance tasks label Feb 9, 2024
@jakobjakobson13
Copy link
Contributor Author

Could anyone give me a hint why the file scipy/constants/codata_constants_2002.txt can't be found?

@ev-br
Copy link
Member

ev-br commented Feb 9, 2024

Try adding it to the list in meson.build, https://github.com/scipy/scipy/blob/main/scipy/constants/meson.build

@lucascolley
Copy link
Member

relevant CI error seems to be

scipy/constants/_codata.py:106: in parse_constants_2002to2014
    uncert = float(line[77:99].replace(' ', '').replace('(exact)', '0'))
E   ValueError: could not convert string to float: '0.00000029e'
        constants  = {'alpha particle-electron mass ratio': (7294.2995361, '', 2.9e-06), '{220} lattice spacing of silicon': (1.920155714e-10, '2          m', 3.2e-07)}
        d          = '\n             2010 Fundamental Physical Constants --- Complete Listing\n\n\n  From:  [http://physics.nist.gov/constan...1\nWien](http://physics.nist.gov/constan...1/nWien) wavelength displacement law constant                   2.897 7721 e-3           0.000 0026 e-3           m K\n'
        line       = 'alpha particle mass                                         6.644 656 75 e-27        0.000 000 29 e-27        kg'
        name       = 'alpha particle mass'
        uncert     = 2.9e-06
        units      = ''
        val        = 6.64465675e-27

@jakobjakobson13
Copy link
Contributor Author

For the parsing of the 2002 file I came up with the following parsing function but it contains loads of regular expressions:

def parse_constants_2002to2006(d: str) -> dict[str, tuple[float, str, float]]:
    constants = {}
    for line in d.split('\n'):
        if line == "":
            continue
        if line[1] == " ":
            continue
        if line[1] == "-":
            continue        
        name = line[1:61].rstrip()
	significant = re.search(r'\d*.\d*', line[62:95].replace(" ","")).group()
	exponent = re.search(r'e-*\d*', line[62:95].replace(" ","")).group()
	val = float(significant+exponent)
	uncert_non_zeros = re.sub(r'[()]', '',  re.search(r'\(\d*\)', line[62:95]).group())
 	uncert_zeros = re.sub(r'\d', '0', significant).group()
	uncert = float(uncert_zeros.replace(uncert_zeros[-len(uncert_non_zeros):], uncert_non_zeros) + exponent)
	units = line[96:].rstrip()
        constants[name] = (val, units, uncert)
	return constants

I have to look at it again another time and perhaps it´s way to complicated.

@rgommers
Copy link
Member

This sounds like a good idea. While we're at it - what do you think about saving the data into a single .npz file at build time and installing that instead of the .txt files? That will make the wheel and on-disk installed sizes smaller; the text files are almost 200 kb; the binary equivalent is going to be a lot smaller.

We're doing this already in scipy.special. It should be pretty straightforward, the only thing to think about is to write the file to the build dir rather than in-tree. This can be done with a simplified version of https://github.com/scipy/scipy/blob/main/scipy/special/utils/makenpz.py

@lucascolley lucascolley changed the title Reorganize codata constants data MAINT: constants: reorganize codata constants data Mar 14, 2024
@jakobjakobson13
Copy link
Contributor Author

This sounds like a good idea. While we're at it - what do you think about saving the data into a single .npz file at build time and installing that instead of the .txt files? That will make the wheel and on-disk installed sizes smaller; the text files are almost 200 kb; the binary equivalent is going to be a lot smaller.

We're doing this already in scipy.special. It should be pretty straightforward, the only thing to think about is to write the file to the build dir rather than in-tree. This can be done with a simplified version of https://github.com/scipy/scipy/blob/main/scipy/special/utils/makenpz.py

I finally got the time to look into it again but the problem seems more complicated than I inititally thought:
The current proposal as far as it stands should work fine but using building the npz files is more complicated.
makenpz.py

data[key] = np.loadtxt(fn)
uses np.loadtxt within its process to create archives but that function is originally intended for arrays.
So if you want np.loadtxt to read a file with a header like


             2010 Fundamental Physical Constants --- Complete Listing


  From:  http://physics.nist.gov/constants



  Quantity                                                       Value                 Uncertainty           Unit
-----------------------------------------------------------------------------------------------------------------------------
{220} lattice spacing of silicon                            192.015 5714 e-12        0.000 0032 e-12          m
alpha particle-electron mass ratio                          7294.299 5361            0.000 0029               
alpha particle mass                                         6.644 656 75 e-27        0.000 000 29 e-27        kg

you have to consider various things: The first lines have to be skipped till you are at the data and you´ll need converters for the quantities and the unit as they are strings and you´ll need converters for the value and uncertainty as the scientific notation with the spaces are a mixture of characters and digits. Additionally, the formatting of the codata files changed over the years.

So long story short: For the moment, it seems like a bad trade to put much more effort into it. Splitting out the "codata raw files" from the main file seems quite easy but creating npz files needs more work than just using makenpz.py. The reg expressions I came up with could be useful for this but I´ll leave to another volunteer, if there is one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.constants
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants