I am referring to fisa's second drawing where datum A is shown on the bottom, not the first, where A is shown on top.
Surface A is identified as Datum A in this drawing, and that is an important distinction. Parallelism is defined as all points on the other surface must fall within a tolerance band relative to datum A. In geometric tolerance terms, Datum A is perfectly flat. In practical terms, for gaging purposes, Datum A is located by the 3 highest points on surface A. Therefore the plane defined by the flatness of surface A (a best fit plane) is not the same plane as the virtual plane of Datum A (a plane touching the three highest points). GD&T can be confusing because a single drawing represents both geometrically-ideal virtual aspects (such as datums) and instructions how those ideal geometries are accessed on the physical part.
If the top surface, not datum A, is held within 0.02 parallelism, it will out of necessity be flat relative to itself within 0.5. Also, there is an implied flatness due to the +0.1/-0 tolerance on the thickness dimension which controls flatness tighter than the flatness callout of 0.5 on the top surface. So the flatness callout on the top surface is unnecessary IMO. The flatness callout on surface A is appropriate, because parallelism only applies to the top surface relative to the 3 highest points on surface A.