This uses goro awase, which is essentially wordplay based on phonetics.
Think of the grid as a 9x9 grid with each row/ column being a number and each number having different pronunciations:
1 (ichi, i, wan)
2 (ni, ji, fu, bu, pu, tsū)
3 (san, sa, za, mitsu, mi)
4 (yon, yo, shi, fō)
5 (go, ko)
6 (roku, ro, ru, ra, mu)
7 (shichi, nana, na)
8 (hachi, ha, ba, ya)
9 (kyū, ku, gu)
If you break the word that the pictures represent apart, you get
Doctor = Ishi which can be read as 14 (1 (i) and 4 (shi))
Meat = Niku which can be read as 29 (2 (ni) and 9 (ku))
Spices = Shichimi which can be read as 73 (7 (shichi) and 3 (mi)
With that said, look at where the pictures point to on the grid:
Doctor = first (1; i) column, fourth (4; shi) row
Meat = second (2; ni) column, ninth (9; ku) row
Shichimi Spice = seventh (7; shichi) column, third (3; mi) row