You can use a regular expression to look for _cell_length_a
(or any other key), followed by some spaces, and then capture whatever comes after that until the end of that line.
>>> import re
>>> re.findall(r"_cell_length_a\s+([0-9.]+)", metadatastructure)
['2.51636378', '2.51636378']
Or using a list-comprehension with splitlines
, startswith
and split
:
>>> [line.split()[-1] for line in metadatastructure.splitlines() if line.startswith("_cell_length_a")]
['2.51636378', '2.51636378']
Note that both ways, the result is still a string (or list of strings) and has to be converted to float
:
>>> [float(x) for x in _]
[2.51636378, 2.51636378]
About your follow-up question from comments: “Here _cell_formula_units_Z is 2. Now, I need to extract the next 2 lines after the line with _atom_site_occupancy”, try this:
lines_iter = iter(metadatastructure.splitlines())
for line in lines_iter:
if line.startswith("_cell_formula_units_Z"):
z = int(line.split()[-1])
if "_atom_site_occupancy" in line:
for _ in range(z):
print(next(lines_iter))
14
solved Python: how to get the associated value in a string?