[Solved] Python: how to get the associated value in a string?

Question

You can use a regular expression to look for _cell_length_a (or any other key), followed by some spaces, and then capture whatever comes after that until the end of that line.

>>> import re
>>> re.findall(r"_cell_length_a\s+([0-9.]+)", metadatastructure)
['2.51636378', '2.51636378']

Or using a list-comprehension with splitlines, startswith and split:

>>> [line.split()[-1] for line in metadatastructure.splitlines() if line.startswith("_cell_length_a")]
['2.51636378', '2.51636378']

Note that both ways, the result is still a string (or list of strings) and has to be converted to float:

>>> [float(x) for x in _]
[2.51636378, 2.51636378]

About your follow-up question from comments: “Here _cell_formula_units_Z is 2. Now, I need to extract the next 2 lines after the line with _atom_site_occupancy”, try this:

lines_iter = iter(metadatastructure.splitlines())
for line in lines_iter:
    if line.startswith("_cell_formula_units_Z"):
        z = int(line.split()[-1])
    if "_atom_site_occupancy" in line:
        for _ in range(z):
            print(next(lines_iter))

Accepted Answer

You can use a regular expression to look for _cell_length_a (or any other key), followed by some spaces, and then capture whatever comes after that until the end of that line.

>>> import re
>>> re.findall(r"_cell_length_a\s+([0-9.]+)", metadatastructure)
['2.51636378', '2.51636378']

Or using a list-comprehension with splitlines, startswith and split:

>>> [line.split()[-1] for line in metadatastructure.splitlines() if line.startswith("_cell_length_a")]
['2.51636378', '2.51636378']

Note that both ways, the result is still a string (or list of strings) and has to be converted to float:

>>> [float(x) for x in _]
[2.51636378, 2.51636378]

About your follow-up question from comments: “Here _cell_formula_units_Z is 2. Now, I need to extract the next 2 lines after the line with _atom_site_occupancy”, try this:

lines_iter = iter(metadatastructure.splitlines())
for line in lines_iter:
    if line.startswith("_cell_formula_units_Z"):
        z = int(line.split()[-1])
    if "_atom_site_occupancy" in line:
        for _ in range(z):
            print(next(lines_iter))