There is a project on the stack spring-boot-2.0.3 , spring-data-jpa-2.0.3 and postgresql-9.5.13 . There are three classes: @Entity , @Repository and the main class. In the database there is one table with two columns: id and name .

I add to this table 100 000 records in a loop and measure the time it takes to execute a query, including or disabling the flush() method of the EntityManager class every 100 entries.

Expected result: when the flush() method is enabled, the query execution time should be significantly less than when it is off.

Actual result: opposite indications.


Question: What am I doing wrong?


Project structure:

project structure


application.properties

 spring.datasource.driver-class-name=org.postgresql.Driver spring.datasource.url=jdbc:postgresql://localhost:5432/twofold spring.datasource.username=postgres spring.datasource.password=postgres 

User.java

 package twofold.data; import javax.persistence.*; @Entity @Table(name = "users", schema = "public") public class User { private Long id; private String name; public User() {} public User(String name) { this.name = name; } @Id @SequenceGenerator(name = "users_id_seq", sequenceName = "users_id_seq", allocationSize = 1) @GeneratedValue(strategy = GenerationType.TABLE, generator = "users_id_seq") @Column(name = "id", nullable = false) public Long getId() { return id; } public void setId(Long id) { this.id = id; } @Column(name = "name", nullable = false, length = 50) public String getName() { return name; } public void setName(String name) { this.name = name; } @Override public String toString() { return "id: " + id + "; name: " + name + ";"; } } 

UserRepository.java

 package twofold.data; import org.springframework.data.jpa.repository.JpaRepository; import org.springframework.stereotype.Repository; @Repository public interface UserRepository extends JpaRepository<User, Long> { } 

Application.java

 package twofold; import twofold.data.User; import twofold.data.UserRepository; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.CommandLineRunner; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.context.annotation.Bean; import org.springframework.transaction.annotation.Transactional; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; @SpringBootApplication public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } @Autowired private UserRepository userRepository; @PersistenceContext private EntityManager entityManager; @Bean public CommandLineRunner addUsers() { return new CommandLineRunner() { @Transactional public void run(String... args) throws Exception { long incoming = System.currentTimeMillis(); for (int i = 1; i <= 100000; i++) { userRepository.save(new User(i + "_name")); if (i % 100 == 0) { entityManager.flush(); entityManager.clear(); } } System.out.println("Time: " + (System.currentTimeMillis()-incoming)); } }; } } 
  • Check out my answer at stackoverflow.com/a/50882952/5380322 - should help ... - Cepr0
  • @ Cepr0, looked - the result is the same, i.e. without flush() works faster than with it, but the difference in readings has decreased. - Drakonoved
  • add the following property to application.properties: hibernate.jdbc.batch_size = 100 - aleshka-batman
  • @ aleshka-batman, in the previous comment it was suggested to add the line spring.jpa.properties.hibernate.jdbc.batch_size . I tried both options - the effect described above. - Drakonoved
  • Where does the information come from, what should flush work in this case faster? In my opinion, this causes extra operating expenses compared to immediately commit commit the entire data set. Not quite sure that the expected result was valid ... it would be interesting to find out where the infa came from, that it should be so. - Mikita Berazouski

1 answer 1

In a nutshell, the flush() method writes the stored data directly to the database.
With normal execution (without flush ), you do not get the data in the database immediately, but after the transaction is completed (at default settings).
That is, your code for these 100,000 iterations generates about 1000 trips to the database to save data, which increases the execution time of the method.

  • @Drakonoved, yes, because at the end of the method there is a COMMIT transaction, but this does not mean that the data was not inserted before, it just became visible to others only after the commit - Roman Danilov