Deep neural networks (DNNs) have recently achieved great success in many machine learning tasks including computer vision and speech recognition. However, existing DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. In addition, such memory and computational resources may often vary in time due to other processes. Thus, there is a need of resource-adaptable or flexible models that may provide near optimal performance under various resource constraints. Several flexible models were recently proposed and almost all of them have a Matryoshka dolls-like representation: so-called sub-models of different sizes are included one into another, and, given current resource constraints, a suitable model may be extracted and deployed. Inplace knowledge distillation (IPKD) became a popular method to train those models and consists in distilling the knowledge from the larger model (teacher) to all other sub-models (students). However, it is known that knowledge distillation is less efficient when the gap in size between the student and the teacher is big. To overcome this issue, we introduce in this work a novel training method called inplace knowledge distillation with teacher assistant (IPKD-TA), where sub-models themselves become teacher assistants teaching smaller sub-models, thus reducing the gap between a teacher and a student. This method is generic in the sense that it can be applied to any flexible architecture. We compared the IPKD-TA method with baselines on two existing state-of-the-art flexible models (MSDNet and Slimmable ResNet-50) within three popular image classification benchmarks (CIFAR-10, CIFAR-100 and ImageNet-1k). Our results demonstrate that our framework improves the existing state-of-the-art.